Thursday, July 5, 2012

What is the ML Module System?, Part 1

Browsing through the heated Scala mail list discussion from 2009 on OOP versus functional programming, I ran into a number of messages which indicate some confusion as to the role and mechanism of the ML module system. The natural question that was raised is how is a module system different from a "low-level" object system. I am not entirely sure what a "low-level" object system is but in this post I would like to compare and contrast the ML module system and object systems, principally class-based object system but also classless object systems. First, what role do these language features serve? There is some overlap in purpose but also significant differences. The ML module system was originally designed to serve to roles:

  1. To support programming-in-the-large organization of large system architecture
  2. To facilitate the construction of a type-safe container library
In contrast, OOP is typically defined as supporting design goals of encapsulation, inheritance, and subtype polymorphism. On the surface, this suggests some overlap. Programming-in-the-large may sound like encapsulation, but as I will discuss, the reality is considerably more nuanced.

Apart from the differing roles, the mechanism of the module system is substantially different from that of any object system. Simply put, the ML module system is fundamentally a mechanism for associating types with values (which includes functions). In ML, instead of achieving information hiding by changing the visibility of data, the language hides the definition of types. The idea is that once the definition of a type name goes out of scope, the static semantics will reject any operation on values of that type which relies on identity of that type. Meanwhile, operations that are type agnostic (aka "parameteric") can be applied. For example, a symbol table may implement a symbol index as an int. This index can be incremented to produce a fresh symbol index, but obviously this implementation detail does not need to be exposed to clients of the symbol table implementation. Thus this detail of the implementation of the symbol index type can be hidden in the scope of clients. However, that isn't to say that symbol indices themselves are hidden. Clients obviously need to have symbol indices around in order to lookup the symbol table. So symbol indices exist outside of the symbol table implementation, but they exist only as black boxes. Clients can freely pass around symbol indices and aggregate them into collections (which is usually a parameteric operation). The language semantics will guarantee that the symbol index abstraction will not be violated throughout this.

The programming-in-the-large role on the surface appears to be vague and potentially encompassing a wide variety of programming language features. However, the module system's notion of programming-in-the-large is fairly specific. The canonical example is partitioning a compiler into reusable parts such as the backend code generation, the optimizer, and front-end. The backend is a particularly interesting example. Because compilers can target multiple architectures, the backend should be parameterized on architecture. In practice, this means an instruction set and perhaps some details on memory layout. This granularity of programming-in-the-large takes various forms. It ranges from the notion of compilation units (i.e., separate files for separate compilation units) to the ML module system. In a large system written in C, parameterization of a compiler backend would be largely handled by some sophisticated preprocessing system. In C++, the module system takes the form of the namespace feature. Namespaces can span multiple compilation units and manage the scopes of function names, variables, and type names (including classes). Similarly, the ML module also manages the scopes of function names, variables, and type names. In fact, ML modules can also manage the scopes modules through hierarchical nesting of modules. This is the first big difference between an object system and the module system. Object systems were meant primarily to encapsulate data (i.e., fields/variables) and operations (i.e., methods/member functions). Java supports inner classes thus enabling a degree of management of types, but this capability is seldom cited as a core feature of OOP.

The second purpose of the module system, facilitating a type-safe container library, is usually considered the role of the C++ template system rather than the object system. Similar to the ML module system, the C++ template system was also originally intended to support a type-safe container library. In C++'s case, the type-safe library was the Standard Template Library (STL). To achieve this, the template language enabled parameterization on types, both user-defined classes and primitive types such as int. Similarly, the ML module system achieves type parameterization by virtue of functors which map structure to structure such that any structure may include type definitions as components.

1 comment:

  1. Looking forward to follow-up posts. I would love to have a better understanding of the ML module system. It seems to be little understood and often misunderstood.


Note: Only a member of this blog may post a comment.