Programming languages, Cloud, and Financial Markets: Supervisor trees

Thursday, December 4, 2014

Supervisor trees

In my past few posts, I have focused on fault tolerant distributed systems as implemented through cluster managers. Apache Mesos, Kubernetes, and many others all attempt to support fault tolerance by auto-restarting and other self-healing techniques at the cluster manager level. As such, they rightly claim that they are the new operating systems of the cloud. It turns out, however, cluster managers certainly do not have a monopoly on fault tolerance features. Long before Mesos, Kubernetes, and possibly even University of Wisconsin's Condor, a distributed processing system with considerable more pedigree, Erlang had supervisor trees and supervisor behaviors (a kind of language interface) in the runtime thus supporting large, highly fault tolerant distributed systems decades ago.

A supervisor tree consists of supervisor and worker processes where supervisors themselves may have supervisors (i.e., a supervisor can be over both subordinate supervisors or workers). Erlang supervisors three process restart strategies:

one-for-one: when a process fails or quits, it is restarted
one-for-all: when a child process terminates, all its sibling processes are terminated and restarted
rest-for-one: when a child process terminates, all younger siblings (i.e., sibling processes that started after) are terminated and the original child process and its younger siblings are restarted

Supervised processes (children) may also be specified as one of three kinds:

permanent: which are always-on services
temporary: should not restart under any circumstance
transient: should restart only for abnormal termination

Programming languages, Cloud, and Financial Markets

Thursday, December 4, 2014

Supervisor trees

No comments:

Post a Comment