Garbage Collector Memory Management Interview Questions

MEMORY MANAGEMENT QUESTIONS

References:
Memory Management Glossary
Memory Pool System Manual

Q: Why do I need to test the return value from malloc? Surely it always succeeds?
For small programs and during light testing this is usually true. BUT there are all kinds of unpredictable reasons why malloc might fail one day:
Examples:
1. Someone uses your program for a far larger data set than you anticipated;
2. Your program is running on a machine with less memory than you expected;
3. The machine your program is running on is heavily loaded.
In this case malloc will return NULL, and your program will attempt to store data by resolving the null pointer. This might cause your program to exit immediately.
You must check all error or status codes returned by functions you call, locally or in other libraries, such as the C run-time library.
You can also wrap malloc in something like this:


#include 
#include 

void *my_malloc(size_t size)
{
    void *p = malloc(size);

    if (p == NULL) {
        fputs("Out of memory.\n", stderr);
        exit(EXIT_FAILURE);
    }

    return p;
}

Q: What’s the point of having a garbage collector? Why not use malloc and free?
Manual memory management (malloc and free) force you to keep track of which memory is still required, and who is responsible for freeing it. This works for small programs but can cause bugs in larger programs and a serious problem for interface abstraction. Automatic Memory Management will free you from these concerns.

Q: What’s wrong with ANSI malloc in the C Library?
Malloc is a very basic manual memory management service. A good memory manager provides the following:
– high performance for specified block sizes;
– tagged references;
– simultaneous frees;
– locality of reference hints;
– formatted objects;
– garbage collection;
– deallocation of partial blocks;
– multi-threading without synchronization;
– inlined allocation code;
– finalization.

Q: Can I use garbage collection in C++?
Yes, the C++ specification has always permitted garbage collection.

Q: Why is delete so slow?
Often DELETE must perform a more complex task than simply freeing the memory associated with an object, this is known as finalization. Finalization typically involves releasing any resources indirectly associated with the object, such as files that must be closed or ancillary objects that must be finalized themselves. This may involve going back over memory that has been unused and is paged out. Deallocating blocks in address order can result in poor performance.

Q: What happens if you use class libraries that leak memory?
In C++ it may be that class libraries expect you to call delete on objects they create, to invoke the destructor. Failing this, if there is truly a memory leak in a class library and you don’t have the source then the only option is to try a garbage collector.

Q: Can’t I get all the benefits of garbage collection using C++ constructors and destructors?
Garbage collection has the advantage, because it can determine when a given object is no longer of interest to anyone (or at least when there are no more references to it). This neatly avoids the problems of having multiple copies of the same data or complex conditional destruction. The program can construct objects and store references to them anywhere it finds convenient; the garbage collector will deal with all the problems of data sharing.

Q: What languages use garbage collection?
Java, C#, Python, Lisp, ML, the list goes on. Many implementations of BASIC use garbage collection to manage character strings efficiently. The idea of automatic memory management has stood the test of time and is becoming a standard part of modern programming environments. Some might say it isn’t ALWAYS necessary and opt out of automatic memory management in some cases. Most will say there is a place for garbage collection among tools of the modern programmer.

Q: What’s the advantage of garbage collection?
Garbage collection frees you from having to keep track of which part of your program is responsible for the deallocation of which memory. This freedom from tedious and error-prone bookkeeping allows you to concentrate on the problem you are trying to solve, without introducing additional problems of implementation.
This is particularly important in large-scale or highly modular programs, especially libraries, because the problems of manual memory management often dominate interface complexity. Additionally, garbage collection can reduce the amount of memory used because the interface problems of manual memory management are often solved by creating extra copies of data.
In terms of performance, garbage collection is often faster than manual memory management. It can also improve performance indirectly, by increasing locality of reference and hence reducing the size of the working set, and decreasing paging.

Q: Isn’t it much cheaper to use reference counts rather than garbage collection?
No, updating reference counts is quite expensive, and they have a couple of problems:
They can’t cope with cyclic data structures; that is, sets of objects that are referred to only by objects in that set, but that don’t have a zero reference count.
Reference counting gets more expensive if you have to allow for the count overflowing.
There are many systems that use reference counts, and avoid the problems described above by using a conventional garbage collector to complement it. This is usually done for real-time benefits. Unfortunately, experience shows that this is generally less efficient than implementing a proper real-time garbage collector, except in the case where most reference counts are one.

Q: Isn’t GC unreliable? I’ve heard that GCs often kill the program
Garbage collectors usually have to manipulate vulnerable data structures and must often use poorly-documented, low-level interfaces. Additionally, any garbage collection problems may not be detected until some time later. These factors combine to make most garbage collection bugs severe in effect, hard to reproduce, and difficult to work around.
On the other hand, commercial garbage collection code will generally be heavily tested and widely used, which implies it must be reliable. It will be hard to match that reliability in a manual memory manager written for one program, especially given that manual memory management doesn’t scale as well as the automatic variety.
In addition, bugs in the compiler or run-time (or application if the language is as low-level as C) can corrupt the heap in ways that only the garbage collector will detect later. The collector is blamed because it found the corruption. This is a classic case of shooting the messenger.

Q: I’ve heard that GC uses twice as much memory
This may be true of primitive collectors (like the two-space collector), but this is not generally true of garbage collection. The data structures used for garbage collection need be no larger than those for manual memory management.

Q: Doesn’t garbage collection make programs slow?
No. The CPU overhead of conservative garbage collection is comparable to that of explicit storage management techniques. Conservative garbage collection performs faster than some explicit algorithms and slower than others, the relative performance being largely dependent on the program.

Q: Manual memory management gives me control—it doesn’t pause.
It is possible for manual memory management to pause for considerable periods, either on allocation or deallocation. It certainly gives no guarantees about performance, in general.
With automatic memory management, such as garbage collection, modern techniques can give guarantees about interactive pause times, and so on.

Q: Why does my disk read so much?
When you are using a virtual memory system, the computer may have to fetch pages of memory from disk before they can be accessed. If the total working set of your active programs exceeds the physical memory(1) available, paging will happen continually, your disk will rattle, and performance will degrade significantly. The only solutions are to install more physical memory, run fewer programs at the same time, or tune the memory requirements of your programs.
The problem is aggravated because virtual memory systems approximate the theoretical working set with the set of pages on which the working set lies. If the actual working set is spread out onto a large number of pages, then the working page-set is large.
When objects that refer to each other are distant in memory, this is known as poor locality of reference. This happens either because the program’s designer did not worry about this, or the memory manager used in the program doesn’t permit the designer to do anything about it.
Note that copying garbage collection can dynamically organize your data according to the program’s reference patterns and thus mitigate this problem.

Q: Where can I get a garbage collector?
The Memory Pool System and the Boehm–Demers–Weiser collector are suitable for C or C++. The best way to get a garbage collector, however, is to program in a language that provides garbage collection natively.

Q: Why does my program use so much memory?
If you are using manual memory management (for example, malloc and free(2) in C), it is likely that your program is failing to free memory blocks after it stops using them. When your code allocates memory on the heap, there is an implied responsibility to free that memory. If a function uses heap memory for returning data, you must decide who takes on that responsibility. Pay special attention to the interfaces between functions and modules. Remember to check what happens to allocated memory in the event of an error or an exception.
If you are using automatic memory management (almost certainly garbage collection), it is probable that your code is remembering some blocks that it will never use in future. This is known as the difference between liveness and reachability. Consider clearing variables that refer to large blocks or networks of blocks, when the data structure is no longer required.

Q: I use a library, and my program grows every time I call it. Why?
If you are using manual memory management, it is likely that the library is allocating data structures on the heap every time it is used, but that they are not being freed. Check the interface documentation for the library; it may expect you to take some action when you have finished with returned data. It may be necessary to close down the library and re-initialize it to recover allocated memory.
Unfortunately, it is all too possible that the library has a memory management bug. In this case, unless you have the source code, there is little you can do except report the problem to the supplier.

It may be possible to add a garbage collector to your language, and this might solve your problems.
With a garbage collector, sometimes objects are retained because there is a reference to them from some global data structure. Although the library might not make any further use of the objects, the collector must retain the objects because they are still reachable.
If you know that a particular reference will never be used in future, it can be worthwhile to overwrite it. This means that the collector will not retain the referred object because of that reference. Other references to the same object will keep it alive, so your program doesn’t need to determine whether the object itself will ever be accessed in future. This should be done judiciously, using the garbage collector’s tools to find what objects are being retained and why.
If your garbage collector is generational, it is possible that you are suffering from premature tenuring, which can often be solved by tuning the collector or using a separate memory area for the library.

Q: Should I write my own memory allocator to make my program fast?
If you are sure that your program is spending a large proportion of its time in memory management, and you know what you’re doing, then it is certainly possible to improve performance by writing a suballocator. On the other hand, advances in memory management technology make it hard to keep up with software written by experts. In general, improvements to memory management don’t make as much difference to performance as improvements to the program algorithms.
In four of the programs investigated, the programmer felt compelled to avoid using the general-purpose storage allocator by writing type-specific allocation routines for the most common object types in the program. The general conclusion is that programmer optimizations in these programs were mostly unnecessary. Simply using a different algorithm appears to improve the performance even more. The conclusion, programmers, instead of spending time writing domain-specific storage allocators, should consider using other publicly-available implementations of storage management algorithms if the one they are using performs poorly.

Q: Why can’t I just use local data on the stack or in global variables?
Global, or static, data is fixed size; it cannot grow in response to the size or complexity of the data set received by a program. Stack-allocated data doesn’t exist once you leave the function (or program block) in which it was declared.
If your program’s memory requirements are entirely predictable and fixed at compile-time, or you can structure your program to rely on stack data only while it exists, then you can entirely avoid using heap allocation. Note that, with some compilers, use of large global memory blocks can bloat the object file size.
It may often seem simpler to allocate a global block that seems “probably large enough” for any plausible data set, but this simplification will almost certainly cause trouble sooner or later.

Q: Why should I worry about virtual memory? Can’t I just use as much memory as I want?
While virtual memory can greatly increase your capacity to store data, there are three problems typically experienced with it:
It does not provide an unlimited amount of memory. In particular, all memory that you actually allocate (as opposed to reserve) has to be stored somewhere. Usually you must have disk space available for all pages containing allocated memory. In a few systems, you can subtract the available physical memory from the disk space required. If the memory contains images of program or data files, then file mapping, or assigning existing files to regions of the virtual address space, can help considerably.
In most computers, there is a large difference in speed between main memory and disk; running a program with a working set that does not fit in physical memory almost always results in unacceptable performance.
An additional problem with using unnecessary quantities of memory is that poor locality of reference can result in heavy paging.