Garbage Collection

Garbage collection (GC), also known as automatic memory management, is the automatic recycling of dynamically allocated memory. Garbage collection is performed by a garbage collector which recycles memory that it can prove will never be used again. Systems and languages which use garbage collection can be described as garbage-collected.

Garbage collection is a tried and tested memory management technique that has been in use since its invention in the 1950s. It avoids the need for the programmer to deallocate memory blocks explicitly, thus avoiding a number of problems: memory leaks, double frees, and premature frees. The burden on the programmer is reduced by not having to investigate such problems, thereby increasing productivity.

Garbage collection can also dramatically simplify programs, chiefly by allowing modules to present cleaner interfaces to each other: the management of object storage between modules is unnecessary.

It is not possible, in general, for a garbage collector to determine exactly which objects are still live. Even if it didn’t depend on future input, there can be no general algorithm to prove that an object is live (the Halting Problem). All garbage collectors use some efficient approximation to liveness. In tracing garbage collection, the approximation is that an object can’t be live unless it is reachable. In reference counting, the approximation is that an object can’t be live unless it is referenced. Hybrid algorithms are also possible. Often the term garbage collection is used narrowly to mean only tracing garbage collection.

There is a large body of published work on particular and general garbage collection algorithms.

Garbage collection was first invented by John McCarthy in 1958 as part of the implementation of Lisp.

Other significant languages offering garbage collection include Java, ML, Modula-3, Perl, Prolog, and Smalltalk. Major applications using garbage collection include Emacs and AutoCAD; usually, you can’t tell whether an application does or not, but these have extension languages that expose the fact.

Conservative Garbage Collection

In conservative garbage collection, the layout of objects and roots is not known, instead the collector assumes that any field that looks like a pointer might be a reference.

Conservative collectors can work with programs where information about the memory layout is not available, because, for example, the language doesn’t support garbage collection.

A conservative collector doesn’t need to know the format of the objects, it just needs some idea of where the object boundaries are. It regards any field value that looks like a pointer to an object (or, sometimes, into the middle of one), as preventing the recycling of that object. It can’t move objects, because then the references to the moved objects would need to be updated, and such ambiguous references must not be modified, in case they weren’t pointers after all. Therefore, conservative collectors are usually mark-sweep collectors.

Because references are ambiguous, some objects may be retained despite being actually unreachable. In practice, this happens rarely, and refinements such as black-listing can further reduce the odds.