Wednesday, December 7, 2011

Substructural Type Systems and Performance, Part 1

Why is the name of this blog "substructural"? In turns out that in the realm of advanced type systems, there is a family of type systems derived from a family of logical systems called substructural logics. The chief distinction of substructural type systems is that one or more of the "structural properties" of type environments are eliminated. These rules are exchange, weakening, and contraction. Exchange allows arbitrarily reordering bindings in the type environment. Weakening enables adding new bindings. A type derivation is deemed "weakened" in the sense that if a program already typechecks using a certain type environment, adding new bindings to that environment won't negate the well-typing of the program.

There are four substructural type systems: linear, affine, relevant, and ordered. The "strictest" are ordered type systems where none of the structural properties hold.

In practice, most substructural type systems have not seen the light of day. At one point, many researchers were excited about the applicability of linear type systems to memory management. In particular, linear type systems enables a compiler to infer regions for region-based memory management, a particularly fast and powerful form of memory management. Explicit regions are known as arenas or sometimes memory pools (though one has to be careful because this is an overloaded term). Several large software systems have adopted explicit regions including Realtime Java, Apache, nginx, Prolog, and PostgreSQL. Some recent iterations have partially automated regions in Realtime Java with ownership types. The payoff here is fairly intuitive. With region-based memory management, contiguous blocks of memory (regions) are allocated and deallocated in bulk, as opposed to deallocating each object in a region individually. Each region may contain a variety of dynamically sized objects. Deallocating one region containing 15 objects at once should be faster than deallocating each of the 15 individually over time. The lifetime of a region may extend beyond that of a lexical block, which is the chief limitation of stack allocation which necessitated heap allocation. This is why regions fit very well with domains with realtime and high performance requirements. Manually identifying regions, however, can be error-prone and time-consuming. Linear type systems facilitate automatic compile-time checking of safe use of regions. Some type inference schemes attempt to further automate this by inferring where regions should begin and end for you. This is where things start breaking down. Such a type inference problem is in general undecidable. Consequently, researchers have used various approximations. The experience is that if written in a region-friendly way, region inference works pretty well. If not, then region inference may need to be coupled with garbage collection to ensure efficient use of memory.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.