Storage Hierarchy

A typical computer has several different levels of storage. Each level of storage has a different speed, cost, and size. The levels form a storage hierarchy, in which the topmost levels (those nearest the processor) are fastest, most expensive and smallest.

Levels typically include processor registers, possibly some levels of cache(1), main memory, and possibly some levels of backing store.

Each level is commonly used as a cache(2) for the next level. For instance, virtual memory systems use main memory as a cache for backing store.

MORE EXPENSIVE / FASTER / SMALLER

| CENTRAL PROCESSING UNIT
| – Functional Units
| | PHYSICAL MEMORY
|Registers
| |Internal Cache
| – External Cache
| – Main Memory
– Backing Store

CHEAP / SLOW / LARGE

Static Allocation

Static allocation means allocation of memory(1) before the program starts and retention until the end.

The locations of objects are basically decided at compile-time, although they might be relocated at load-time. This implies the sizes of the objects must be known then.

Using only static allocation is restrictive, as sizes of data structures can’t be dynamically varied, and procedures cannot be recursive. However, it is also fast and eliminates the possibility of running out of memory. For this reason, this scheme is sometimes used in real-time systems.

Heap Allocation

Heap allocation or dynamic allocation means run-time allocation and deallocation of memory(1) in arbitrary order.

Dynamic allocation is usually for objects whose size, quantity, or lifetime could not be determined at compile-time. It is necessary to implement modern data structures, such as recursive trees and full closures.

Objects on the heap can be managed manually, as in C, or automatically, as in Lisp and Java.

Stack Allocation

Stack allocation means run-time allocation and deallocation of memory(1) in last-in/first-out order.

Typically, stack allocation is performed on top of the main stack, but one can have a separate data stack for this purpose as well, as in Forth, or even multiple ones, as in the PostScript language.

Allocation and deallocation are typically fast, since they can be done simply by adding or subtracting the size of the block from the stack pointer.

Using only stack allocation, without heap allocation, is somewhat restrictive, as only objects whose size is known at compile-time can be returned from a procedure.

Some programming languages (such as some versions of Lisp and C) provide program-controlled stack allocation and deallocation of dynamic extent objects for efficiency, despite its being unsafe.

Garbage Collection

Garbage collection (GC), also known as automatic memory management, is the automatic recycling of dynamically allocated memory. Garbage collection is performed by a garbage collector which recycles memory that it can prove will never be used again. Systems and languages which use garbage collection can be described as garbage-collected.

Garbage collection is a tried and tested memory management technique that has been in use since its invention in the 1950s. It avoids the need for the programmer to deallocate memory blocks explicitly, thus avoiding a number of problems: memory leaks, double frees, and premature frees. The burden on the programmer is reduced by not having to investigate such problems, thereby increasing productivity.

Garbage collection can also dramatically simplify programs, chiefly by allowing modules to present cleaner interfaces to each other: the management of object storage between modules is unnecessary.

It is not possible, in general, for a garbage collector to determine exactly which objects are still live. Even if it didn’t depend on future input, there can be no general algorithm to prove that an object is live (the Halting Problem). All garbage collectors use some efficient approximation to liveness. In tracing garbage collection, the approximation is that an object can’t be live unless it is reachable. In reference counting, the approximation is that an object can’t be live unless it is referenced. Hybrid algorithms are also possible. Often the term garbage collection is used narrowly to mean only tracing garbage collection.

There is a large body of published work on particular and general garbage collection algorithms.

Garbage collection was first invented by John McCarthy in 1958 as part of the implementation of Lisp.

Other significant languages offering garbage collection include Java, ML, Modula-3, Perl, Prolog, and Smalltalk. Major applications using garbage collection include Emacs and AutoCAD; usually, you can’t tell whether an application does or not, but these have extension languages that expose the fact.

Conservative Garbage Collection

In conservative garbage collection, the layout of objects and roots is not known, instead the collector assumes that any field that looks like a pointer might be a reference.

Conservative collectors can work with programs where information about the memory layout is not available, because, for example, the language doesn’t support garbage collection.

A conservative collector doesn’t need to know the format of the objects, it just needs some idea of where the object boundaries are. It regards any field value that looks like a pointer to an object (or, sometimes, into the middle of one), as preventing the recycling of that object. It can’t move objects, because then the references to the moved objects would need to be updated, and such ambiguous references must not be modified, in case they weren’t pointers after all. Therefore, conservative collectors are usually mark-sweep collectors.

Because references are ambiguous, some objects may be retained despite being actually unreachable. In practice, this happens rarely, and refinements such as black-listing can further reduce the odds.

Static Storage Duration

In C and C++, the static keyword applied to a file scope variable or function means it is local to the file; the static keyword applied to a function or a block scope variable means it is allocated and initialized once only.

Objects declared locally in blocks with the static keyword are allocated in static memory(2), and initialized once (usually by the compiler/linker) instead of each time the block is entered.

Static variables within functions retain their value between function invocations, and therefore must form part of the root set of any collector(1).