Next | Prev | Up

Simple Mark and Sweep Collection

In mark and sweep collection, each object has a bit or bits which are used to indicate the state of the object. Objects start off as unseen, and when they are visited they are set to be live. After the first or mark phase then all reachable objects will have been visited and these "mark bits" will be set to indicate they are live.

Consequently all the objects still with state unseen must be unreachable. A second or sweep phase is then embarked on - this has to scan all the objects in the heap, live or not, and convert unseen objects into free-blocks, and reset the mark-bits on live objects ready for the next garbage collection.

Mark phase implementation

There is more than one technique for visiting all the live objects in the mark-phase. The simplest is a depth-first traversal of all objects via their pointers, starting at the roots. This is essentially a recursive function of the form:
define visit (object)
  if (seen (object))
    return ;
    mark_object (object) ;
    for each_pointer_in (object)
      visit (pointer) ;
However there is a big problem with this - the visit function might recurse extremely deep, and require prohibitive amounts of stack memory to run on. Various approachs help combat this, but the most elegant is a technique called pointer-reversal in which the graph of pointers between objects is temporarily overwritten to encode the stack of objects currently being visited.

The basic idea is that when you visit an object you use one of its pointer fields to point back to where you came from. You then effectively do the same as before (recursing over the pointers in the object), and when that is over, you pick up the reversed pointer again, replace it by whatever was there before, and step back to the object you came from.

Having stepped back you have to find where you were in the processing of that object's fields, and if you find you have finished, step back again. In practice the pointer-reversal algorithm is complicated by the details of where you store the pointers, how you discover where you are in the scanning loop, but the principle (at least) is very neat, and no extra stack memory at all is required. The extra work in maintaining and re-reversing the pointers, in theory, exactly counterbalances the storing and retrieving of values that would have been pushed on the stack.


Sweeping is simpler than marking, but it does require that there be a way of identifying all objects in the heap without following pointers. Usually this means that objects need a header-word at their start which enables the sweeper to find out if it is an object or free-block, and find out how long it is.

Sweeping can then simply march along consecutive objects and free-blocks, freeing unseen objects and coallescing them with any adjacent free-blocks.

Typically there might be a more complex allocation-structure into which the free blocks are linked (often this structure includes a mapping from object size to a free list of thusly sized objects)

Last updated by Mon 16 January 1996