Garbage Collection In .NET
Garbage Collection is a technique introduced in Microsoft .NET that manages memory automatically. This article discusses the concepts of Garbage Collection and the strategies adopted by Microsoft .NET for handling managed memory efficiently. It also discusses the methods and properties of the System. GC class, the class that is responsible for controlling the garbage collector in the .NET environment.
What is Garbage Collection?
The Common Language Runtime (CLR) requires that you create objects in the managed heap, but you do not have to bother with cleaning up the memory once the object goes out of the scope or is no longer needed. This is unlike the strategies adopted in programming languages like C and C++ where you needed to cleanup the heap memory explicitly using a free function of C and delete operator of C++. Garbage collection refers to the strategy adapted by Microsoft .NET to free unused objects or objects that go out of the scope automatically.
A "garbage" object is one that is no longer needed, is unreachable from the root or goes out of the scope in which it is created. Microsoft .NET uses the information in the metadata to trace the object graph and detect the objects that need to be garbage collected. Objects that are not reachable from the root are referred to as garbage objects and are marked for garbage collection. It is to be noted here that there is a time gap between the time when an object is identified as garbage and the time when the object is actually collected. It is also to be noted that objects in the managed heap are stored in sequential memory locations. This is unlike C and C++ and makes allocation and de-allocation of objects faster.
About Garbage Collection:
Every program uses the resources like - memory buffers, network connections, database resources and so on. To use these resources memory must be allocated to represent the type.
Steps required to access the resource:
Allocate the memory for the type the represents the resource.
Initialize the memory to set initial state of the resource for making the resource available to use.
Now use the resource by the instance of the resource.
Tear down the state of the resource for clean up.
Free the memory by some mechanism.
The Garbage Collector of .Net will do the above steps automatically. So the developer need not to write any code or concentration on allocating memory to the resources and when to free the memory. .Net CLR has to allocate all the resource from mamaged heap. You never free the objects from managed heap and they will be automatically freed from the heap when those objects are not needed by the application.
Memory is not infinate. The GC must perform the task of freeing the memory and Garbage Collectors Optimized Engine will determines the best time for collection based on the allocation being made.
When GC performs the collection, It checks the heap for the objects which are not needed by the application and performs necessary actions to reclaim the memory.
However for automatic memory management, the garbage collector has to know the location of the roots i.e. it should know when an object is no longer in use by the application. This knowledge is made available to the GC in .NET by the inclusion of a concept know as metadata. Every data type used in .NET software includes metadata that describes it. With the help of metadata, the CLR knows the layout of each of the objects in memory, which helps the Garbage Collector in the compaction phase of Garbage collection. Without this knowledge the Garbage Collector wouldn't know where one object instance ends and the next begins.
Garbage Collection Algorithm:
Application Roots:
Every application has a set of roots. Roots identify storage locations, which refer to objects on the managed heap or to objects that are set to null.
For example:
All the global and static object pointers in an application.
Any local variable/parameter object pointers on a thread's stack.
Any CPU registers containing pointers to objects in the managed heap.
Pointers to the objects from Freachable queue
The list of active roots is maintained by the just-in-time (JIT) compiler and common language runtime, and is made accessible to the garbage collector's algorithm.
Implementation:
The most commonly used strategy involves the mark and compact algorithm. This occurs in two phases, Mark and Compact.
Mark:
When the garbage collector starts running, it makes the assumption that all objects in the heap are garbage. In other words, it assumes that none of the application's roots refer to any objects in the heap.
The following steps are included in Phase I:
The GC identifies live object references or application roots.
It starts walking the roots and building a graph of all objects reachable from the roots.
If the GC attempts to add an object already present in the graph, then it stops walking down that path. This serves two purposes. First, it helps performance significantly since it doesn't walk through a set of objects more than once. Second, it prevents infinite loops should you have any circular linked lists of objects. Thus cycles are handles properly.
Once all the roots have been checked, the garbage collector's graph contains the set of all objects that are somehow reachable from the application's roots and any objects that are not in the graph are not accessible by the application, and are therefore considered garbage.
Compact:
Move all the live objects to the bottom of the heap, leaving free space at the top.
Phase II includes the following steps:
- The garbage collector now walks through the heap linearly, looking for contiguous blocks of garbage objects (now considered free space).
- The garbage collector then shifts the non-garbage objects down in memory, removing all of the gaps in the heap.
- Moving the objects in memory invalidates all pointers to the objects. So the garbage collector modifies the application's roots so that the pointers point to the objects' new locations. In addition, if any object contains a pointer to another object, the garbage collector is responsible for correcting these pointers as well.
After all the garbage has been identified, all the non-garbage has been compacted, and all the non-garbage pointers have been fixed-up, a pointer is positioned just after the last non-garbage object to indicate the position where the next object can be added.
Finalization:
.NET Framework's garbage collection implicitly keeps track of the lifetime of the objects that an application creates, but fails when it comes to the unmanaged resources (i.e. a file, a window or a network connection) that objects encapsulate.
The unmanaged resources must be explicitly released once the application has finished using them. .NET Framework provides the Object.Finalize method: a method that the garbage collector must run on the object to clean up its unmanaged resources, prior to reclaiming the memory used up by the object. Since Finalize method does nothing, by default, this method must be overridden if explicit cleanup is required.
It would not be surprising if you will consider Finalize just another name for destructors in C++. Though, both have been assigned the responsibility of freeing the resources used by the objects, they have very different semantics. In C++, destructors are executed immediately when the object goes out of scope whereas a finalize method is called once when Garbage collection gets around to cleaning up an object.
The potential existence of finalizers complicates the job of garbage collection in .NET by adding some extra steps before freeing an object.
Whenever a new object, having a Finalize method, is allocated on the heap a pointer to the object is placed in an internal data structure called Finalization queue. When an object is not reachable, the garbage collector considers the object garbage. The garbage collector scans the finalization queue looking for pointers to these objects. When a pointer is found, the pointer is removed from the finalization queue and appended to another internal data structure called Freachable queue, making the object no longer a part of the garbage. At this point, the garbage collector has finished identifying garbage. The garbage collector compacts the reclaimable memory and the special runtime thread empties the freachable queue, executing each object's Finalize method.
The next time the garbage collector is invoked, it sees that the finalized objects are truly garbage and the memory for those objects is then, simply freed.
Thus when an object requires finalization, it dies, then lives (resurrects) and finally dies again. It is recommended to avoid using Finalize method, unless required. Finalize methods increase memory pressure by not letting the memory and the resources used by that object to be released, until two garbage collections. Since you do not have control on the order in which the finalize methods are executed, it may lead to unpredictable results.
Limits of the Garbage Collection:
Unused objects that are still referenced:
The biggest limitation of the garbage collector in .NET is a subtle one: while it is said as being able to detect and remove unused objects, it actually finds unreferenced objects. This is an important distinction: an object might never be referred to by a program ever again; but, while there is some path from it leading to an object that might still be used, it will never be released from memory. This leads to memory leaks; in .NET these occur when an object that will not be used again remains referenced.
Fragmentation of the heap:
A less widely known limitation in .NET is that of the large object heap. Objects that become part of this heap are never moved by the runtime, and this can lead to a program running out of memory prematurely. When some objects live longer than others, this causes the heap to form holes where objects used to be - this is known as fragmentation. The problem occurs when the program asks for a large block of memory but the heap has become so fragmented that there is no single region of memory big enough to accommodate it. A memory profiler can estimate the largest object that can be allocated by a program: if this is declining then this is likely to be the cause. An OutOfMemoryException caused by fragmentation will typically happen when the program apparently has a lot of free memory - on a 32-bit system, processes should be able to use at least 1.5Gb, but failures due to fragmentation will often start to occur before it is using that much memory.
The System.GC class
The System.GC class represents the garbage collector and contains many of methods and properties that are described in this section.
GC.Collect Method
This method is used to force a garbage collection of all the generations. It can also force a garbage collection of a particular generation passed to it as a parameter. The signatures of the overloaded Collect methods are:
public static void Collect();
public static void Collect(Integer int);
GC.GetTotalMemory Method
This method returns the total number of bytes that is allocated in the managed memory. This method accepts a boolean parameter. If the parameter is true, it indicates that it should wait for the garbage collector to finish.
GC.KeepAlive Method
This method extends the life time of an object passed to it as a parameter. The signature of this method is as follows:
public static void KeepAlive(object objToKeepAlive);
GC.ReRegisterForFinalize Method
This method re-registers an object for finalization, i.e., makes an object eligible for finalization. The method signature is as follows:
public static void ReRegisterForFinalize(objectobjToRegister);
GC.SupressFinalize Method
This method suppresses the finalization on an object. The prototype of this method is:
public static void SupressFinalize(object obj);
GC.GetGeneration Method
This method returns the current generation of an object or the same of the target of the weak reference. The signature of this overloaded method is:
System.GC.GetGeneration(object obj);
System.GC.GetGeneration(WeakReferenceweakReference);
GC.MaxGeneration Property
This property returns the maximum number of generations available.
GC.WaitForPendingFinalizers Method
This method blocks the current thread till the execution of all the pending finalizers is over. The signature of this method is:
public static void WaitForPendingFinalizers();