Wednesday, July 25, 2012

Comparative Memory Management, Part 1

When Apple first announced Automatic Reference Counting (ARC) at WWDC in 2011, the fanfare was considerable. Developers took it be the best thing since sliced bread. Some called it "innovative". The whole idea that iOS and now MacOS X developers (since Mountain Lion) could be freed from error-prone manual memory management without the runtime pauses of conventional garbage collection was certainly played up by the Apple machine. However, when I saw the words "reference counting" and "no overhead" in the same sentence, I cringed. Reference counting isn't a new concept at all. Automatic reference counting is used in a number of languages including Python and Delphi, though neither of those languages can claim to be the sole pioneer. More importantly, reference counting implies the presence of a reference count for each object (a considerable runtime memory overhead especially if lots of small objects are allocated) and the constant maintenance of the reference count (i.e., reference counters for each object must be updated when new owners lay claim (retain) or old owners release).

All around the web, so many have lauded ARC as some miraculous leap forward beyond garbage collection -- "magical" automatic memory management at zero runtime cost. It sounded too good to be sure. Blog posts and mail list posts were asking why their favorite alternative language wasn't ditching the garbage collector for the shiny ARC. Admittedly, ARC is certainly an improvement over manual memory management. It can potentially save developers a whole lot of error-prone accounting. However, where I was caught off-guard was the zero-cost part of the deal. It turns out to be all about the baseline. Objective-C actually already folds in the overhead of reference counts long before ARC came on the scene. Before ARC, the reference counting was manual, requiring developer inserted retain and releases. That was the memory model. Thus, it is fair to say that ARC didn't introduce any additional runtime overhead over Objective-C's pre-existing reference count overhead. The big gotcha is that other languages/runtimes such as C/C++ and your favorite garbage-collected managed language don't have this pre-existing reference count overhead. Compared to C/C++'s raw pointers, keeping reference counts for all objects takes space and updating them imposes a runtime overhead. Sure, it is a deterministic runtime overhead, but it remains a performance hit.

When the application in question is multithreaded, reference counts become even a bigger issue. NSObject.m implements reference count maintenance using a mutex allocationLock to keep the reference counters correct. Multiple owners retaining or releasing the same object at the same time? Overhead.

The reference count in Objective-C is an unsigned int called retained that is at the head of every NSObject. In the naive case, that is 4-bytes of memory overhead for 32-bit systems and 8-bytes for 64-bit ones. You might say to yourself that 4-bytes isn't much compared to object size. Well, Cocoa Foundation has NSValue, which consists of a value and a pointer to a string denoting the type of the value. An NSLock is just a pointer. In each of these cases and many like it, the reference count can introduce 50-100% overhead. However, all this operates under the assumption of naive reference counting. This overhead (both space and allocation time) can be improved by cutting the reference count representation to a few bits and handling the overflow by using an auxiliary accurate count data structure or sticky refcount that stops counting after reaching a max relying on a tracing GC to deal with maxed out refcounts. Recent research (Shahriyar et al 2012 [PDF]) has improved on this considerably, roughly reaching parity to basic mark-sweep garbage collection. The catch is that mark-sweep is hardly the gold standard of tracing garbage collection technology.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.