Monday, December 29, 2014

Understanding Cloud Host Pricing, Part 2

In the past few years, there has been a movement to standardize cloud compute resource measurements in order to make way for public trading of compute resources. The idea is simple, but execution may be complicated: each company can run something like OpenStack and rent off underutilized compute resources and these resources can be further trading on public exchanges to enable companies to hedge for price spikes. Along these lines, Amazon was quite early in introducing the Reserved Instance Marketplace. A public trading of standardized compute units will enable smaller organizations to monetize underutilized assets. This model is not without its challenges. Compute resources have many aspects that distinguish them. Performance may vary dramatically. In this post, I investigate some of the smaller cloud hosts and their prices.

ProviderMinimum Unit ($/hr)Memory (GB)Instance Storage (GB)Persistent Block Storage ($/GB/mo)
HP Helion0.031100.10
IBM SoftLayer0.041250.10
Oracle Cloud1.8
CloudSigma0.0319110.14 SSD
GoGrid0.02-0.030.525-
DreamHost DreamCompute0.0264225-
Internap0.04120 SSD0.30

Saturday, December 27, 2014

Top 5 Gotchas When Running Docker on a Mac

Running Docker on Mac is meant to be a convenience but the fact that Docker on Mac is a 2nd class citizen shows up every now and then. Since Docker is based on Linux cgroups, it cannot and does not run natively on MacOS X. Instead, Docker runs on Macs by using boot2docker, a shim that boots up a whole VirtualBox VM on which one will actually run Docker. Running Docker inside of a VM on Macs complicates things quite a bit.

Friday, December 26, 2014

Understanding Cloud Host Pricing, Part 1

The pricing schemes of the top cloud infrastructure-as-a-service (IaaS) providers are rather complicated. They cannot be compared directly since their performance characteristics vary. Moreover, the differences in costs of instances, storage, and bandwidth may offset each other. For example, one provider A's storage costs may be greater than provider B but B's instance costs may be greater. Some providers bill for instances by the minute (Azure and Google Computer Engine) whereas others bill by hour (rounded up), such as AWS. Some providers charge for storage by the TB (Azure) whereas others charge by GB/month or sometimes GB/hour (Rackspace). Consequently, what constitutes to the best deal in terms of cloud hosting depends on your specific workloads and storage needs. In this series, we will investigate the various aspects of cloud host pricing from the major providers: Amazon AWS, Microsoft Azure, Google Compute Engine, DigitalOcean, and Rackspace.

Wednesday, December 17, 2014

Retry Pattern

A common design pattern in fault-tolerant distributed systems is the retry pattern. A given operation may experience a variety of failures:

  1. rare transient failures

    (e.g., corrupted packet) can be recovered from immediately and thus should retry immediately
  2. common transient failures

    (e.g., network busy) can retry after waiting for a period of time (possibly with exponential backoff
  3. permanent failure

    should not retry, bail out and clean up
Of course, the final case is that the operation succeeds and the function must do some work to address that. This is an interesting design pattern not only for distributed systems but also fault-tolerant systems in general. For example, high-performance JavaScript engines have parsers and tokenizers that must be robust to various failures. In fact, it is one example where large systems have used multiple exit points, more complicated control flow for which C-based programs may use gotos.

Wednesday, December 10, 2014

Docker Image for SML/NJ

The various Linux distros package repos carry very outdated versions of the SML/NJ compiler. This Docker image builds the latest official SML/NJ release.

Tuesday, December 9, 2014

Apache Mesos and Hadoop YARN Scheduling

Mesos and YARN are two powerful cluster managers that can play host to a variety of distributed programming frameworks (Hadoop Map-Reduce, Dryad, Spark, and Storm) as well as multiple instance of the same framework (e.g., different versions of Hadoop). Both are concerned about optimizing utilization of cluster resources especially in terms of data locality of data distributed around the cluster. Google's paper on Omega, their own cluster scheduling system, dubs Mesos a two-level scheduler, which provide some flexibility by having a single resource manager offer resources to multiple parallel, independent schedulers. YARN is considered a monolithic scheduler since independent Application Masters are only responsible for job management and not scheduling. Scheduling is the essence of efficient Big Data processing. However, where do these two systems differ?

Monday, December 8, 2014

Alternatives to Docker: LXD and Rocket

Two recently announced alternatives to the Docker Linux container runtime LXD and Rocket aim to offer some interesting value propositions. Before I get to the details, let's first identify what use cases and aspects of the Docker container runtime of interest here. Docker identifies a few major categories of use cases: continuous integration, continuous delivery, scaling distributed applications, and Platform-as-a-Service. The former two are DevOps use cases. At this point, Docker pretty much has a lock on DevOps use cases. Moreover, neither of the would-be competitors truly target DevOps. It becomes obvious when you consider that LXD was intended to run on OpenStack Server environments and Rocket on CoreOS/fleet (though it is not necessarily tied to CoreOS). Docker runs on workstation environments and even on top of VirtualBox via boot2docker to support DevOps functionality on MacOS X. The more competitive aspect is the cloud infrastructure one. Here, Docker is competing with a wider range of technologies to support scaling on the cloud and providing PaaS functionality. This is the market where LXD and Rocket would operate. This is also the area where hypervisors have reigned.

Sunday, December 7, 2014

How to search for great programmers, Part 2

Aline Lerner of Trialpay recently posted statistics on an experiment about resume review. The conclusion was that recruiters, engineers, and just about everyone score resumes all over the place and therefore resumes have weak signal value. The claim was that the strongest signal in the resume was the number of typos. Although the study seems extensive, I think there are a number of weaknesses in the experimental design. One weakness was acknowledged: the ground truth is the author's own subjective evaluation of the candidates. Another weakness was how the survey questions were somewhat misleading in the first place. The questionnaire asks "would you interview this candidate" and yet this was compared with the ground truth of "will the candidate perform well on the job or technical interview". As I alluded to in an earlier post in this series, the role of a resume is to help filter for red flags and to guide the formal interview, not to determine whether a candidate is a star performer by itself. The fact of the matter is, a resume is a self-reported synopsis of a candidate's track record. To evaluate a candidate, I would think track records are important, as is potential.

Saturday, December 6, 2014

What is really interesting about Quantitative Behavioral Finance

photo by Stuck in Customs via PhotoRee

Quantitative behavioral finance has not been with out for a very long time. As a relatively recent development and area of discourse, it has only begun to gain a following. One very interesting aspect of this field is the use of experimental asset markets. These studies are based on experiments conducted on a small group of people (but with real money hence a real market) to examine where rational expectations and classical game theory fails to explain human behavior. This is basically small-scale version of the prediction markets such as the Iowa Electronic Markets, Intrade, and Betfair. However, unlike prediction markets where the ultimate objective is to predict an external event, experimental asset markets are more interested in the mechanics and patterns of the market itself. Caginalp, Vernon Smith (Nobel Memorial Economics Prize recipient of 2002), and David Porter have a couple of papers on experiments in this mode. Both experiments examine how financial bubbles can happen. Some of the take-aways are that excess cash and information asymmetry due to lack of an open book may exacerbate bubbles. I think the matter of information asymmetry is very salient. Despite all the effort and money invested in improving information infrastructure by banks and hedge funds, ultimately information is distributed non-uniformly to market participants. This is most obvious in the case of the retail investor who neither has the time nor resources to obtain and analyze all the market information.

Friday, December 5, 2014

Container Virtualization Options

photo by sioda via PhotoRee

Looks like the container virtualization space is becoming a little more interesting this week. Previously, Docker was the only more or complete standard container implementation (with definition of image, image creation, and container start/stop management). There was Canonical's LXD, it didn't seem to be garnering nearly as much attention and support since it was only announced a month ago. However, with the Docker and CoreOS organizations starting to encroach on each other's territory, the CoreOS community has released an early version of their own container runtime, Rocket. On the balance, Docker has moved into the container cluster orchestration and management space with Docker Swarm and Docker Compose, the latter being still in the design stage.

Containers versus Virtual Machines

Containers-based systems (e.g., Docker, LXC, cgroups) and virtual machines (VMWare, Xen) both seek to bring the benefits of virtualization to the data center and developer workflow. They have considerable overlap in benefits. Although both do some kind of virtualization to enable better utilization of physical hardware, there are also some key differences. Containers do virtualization at the OS kernel-level. Hence isolation is limited to what the kernel can enforce. Containers do a lot of sharing of layers of file systems, courtesy of AuFS, which potentially makes better use of disk space and image space than virtual machines which commit the entire contents of a VM's disk to the disk image.

Thursday, December 4, 2014

Supervisor trees

In my past few posts, I have focused on fault tolerant distributed systems as implemented through cluster managers. Apache Mesos, Kubernetes, and many others all attempt to support fault tolerance by auto-restarting and other self-healing techniques at the cluster manager level. As such, they rightly claim that they are the new operating systems of the cloud. It turns out, however, cluster managers certainly do not have a monopoly on fault tolerance features. Long before Mesos, Kubernetes, and possibly even University of Wisconsin's Condor, a distributed processing system with considerable more pedigree, Erlang had supervisor trees and supervisor behaviors (a kind of language interface) in the runtime thus supporting large, highly fault tolerant distributed systems decades ago.

Wednesday, December 3, 2014

Security in Containerization Technology

One lingering worry with containerization is security. Previously, with conventional type 0 and type 1 (native, bare-metal) hypervisor technology, we greatly limited our trusted based to small hypervisors (e.g., Xen is < 150kloc). Some were so small (seL4 core was 7.5kloc) that they were amenable to mechanized formal verification. OSes supporting containers, in contrast, are much larger. Even CoreOS, intended as a slimmed down version of the Chrome OS Linux kernel that just supports modern bare-metal architectures for containers, that is fundamentally more challenging to vet, not to mention verify, than a simple hypervisor. Etcd and fleet alone add up to 44k sloc of Go. So for all the great inroads we were making in verification, the move towards containerization in the data center brings new challenges and potentially resets some of the progress the community has made in mechanically verifying security and functional correctness of the lowest layers of software systems and infrastructure.

Tuesday, December 2, 2014

ClusterHQ Flocker

Flocker does multi-host orchestration for Docker containers. It is intended mainly as a means for containerizing and orchestration distributed data stores and databases although in principle it can deploy any app. Unlike some of the other solutions out it, Flocker aims to support checkpointing of stateful and stateless containers to support migration of (running) containers across nodes. This seems like a great feature if one wanted to do work stealing rescheduling of containers as the execution profile changes and other nodes become available. Flocker provides its own NAT layer for mediating communication between containers across nodes. It also supports ZFS persistent volumes to maintain state. Flocker itself does not aim to do any particularly sophisticated scheduling (c.f. Kubernetes) but instead relies on the user to supply scheduling.

Monday, December 1, 2014

Kubernetes and Decking Container Cluster Managers

Fig. 1: Kubernetes Architecture

Kubernetes manages user-defined collections of containers called pods. Note that "pod" refers to the running container and not a static image (in Docker terminology). Besides containers, a pod can also have persistent storage attached as a volume and also define custom container health checks. Pods themselves can be organized together into "groups", a kind of "API object", which in turn can be referenced by label. There are two other main forms of API objects: replication controllers and services. The former produces a fixed number of replicas of a pod template. The latter defines internal and external ports for establishing connectivity across pods.

Sunday, November 30, 2014

Top 5 Container Cluster Managers: Containers and Cloud Management

photo by OneEighteen via PhotoRee

The rapid ascent of containers is nothing less than breathtaking. In the space of a year and a half, Docker has went from initial release to being adopted by every major IaaS cloud host (Google, Amazon, Microsoft, Rackspace's On-Metal CoreOS), VM software vendor, and a few toolchain developers. As a point of reference, VMWare ESX took over 5 years before ever getting near such market penetration. It is interesting to consider at this point what the management tool options are for containers and clusters/networks of containers. There are two main categories of use cases for Docker: devops (e.g., continuous integration) and virtualization infrastructure (e.g., improve utilization of server resources using lightweight containers in place of hypervisor-based virtualization).

This is a series of evaluations of container cluster managers: Part 2 (Kubernetes and Decking) and Part 3 (Flocker).

Saturday, November 29, 2014

Parallel Data Types

True parallelism can be achieved via a wide variety of avenues. Instruction-level parallelism arguably offered a huge value at one point when processor architectures bulked up to execute machine instructions in parallel whenever possible including when instructions can be reordered or delay slots can be filled. All this could have been done without an application programmer's involvement or knowledge, hence it promised the benefits of parallelism for most programs out-of-the-box without modifications. The onus, instead, was on the processor architects and compiler developers to take advantage of such parallelism, which sometimes turned out to be a tall order. Task parallelism, in contrast, typically demands a lot more from application programmers since exposing and taking advantage of parallelism becomes very domain-specific. The focus of parallelism for task parallelism depends on balancing the loads of execution threads. Runtime support can help fill in the gaps here by applying advanced scheduling techniques such as work stealing, but ultimately what constitutes a thread (i.e., how to partition the program) and how to mediate communication and sharing between threads is often up to the programmer. Another way to achieve parallelism at a fine-grain level is to implement and expose a parallel data type in a language, runtime, or distributed computing framework. Apache Spark achieves this using its RDD (Resilient Distributed Dataset) abstraction which can copy data from Scala Seq, Java Collection, and Python iterables. Hadoop uses Java Iterable for MapReduce. For modern distributed computing environments, the pipeline itself might be any DAG (more general than map-reduce), but ultimately parallelism stems from the data representation, hence data parallelism. Note that of all the forms of parallelism it is data parallelism that ultimately has demonstrated straightforward and efficient scaling to truly large problems.

Friday, November 28, 2014

How to search for great programmers, Part 1

photo by super-structure via PhotoRee

A running theme these days is that Github is the new resume and a great source for learning. I am skeptical of these claims. If Github is the new resume, then it is one that is potentially even more difficult to evaluate than before and will greatly limit the candidate pool and not necessarily to a high quality one. The bigger question is what role should Github play in the hiring process. After all, many if not most of the famous hero programmers out there (e.g., Brian Kernighan, Herb Sutter, Alex Stepanov) do not have Github profiles. Linus Torvalds does, but he seems to be an exception. A Github profile is also difficult to evaluate. Do we consider only superficial metrics such as commit/contribution frequency or dive deep into a random project? Either of those options do not seem particularly attractive. At most, they will provide an imperfect measure for enthusiasm. It seems to me if code samples is what we are after, then we should ask for code samples. A real resume provides a highly condensed means to evaluate for relevant experience and education that can be evaluated in seconds.

Wednesday, November 26, 2014

Who really drives high quality open source?

The current Cambrian explosion of software we are witnessing today is due in no small part to a number of great open source projects. Open source software has the lion's share of web browsers (Webkit, Firefox, and Chrome), web servers (Apache httpd and nginx), mobile and server OSses (Android and Linux), web frameworks (too many to summarize), distributed computing and deployment frameworks (Hadoop, Spark, Mesos), and databases/distributed data stores (PostgreSQL, MariaDB, Cassandra, Riak, Mongo). By "great" here, I mean a combination of widespread use and high quality as reflected (imperfectly) in terms of defect density. But what really drove those projects to the great success and influence we are witnessing now? Just as there is the rags to riches mythos, so there is the mythos of Joe Everybody contributing to open source. Consider an analogy with replication and research. In the research community, there is also the ideal that research results can be validated by replication of experimental results by others. However, in research, the incentives do not align to promote such replication activities since an academic cannot stake tenure on replication. I wonder if there is a similar gap between ideal and practice in open source organization and contribution. Another related question I would like to answer is what and how can one learn high quality software engineering practices from top open source projects.

Monday, November 10, 2014

What is New in Android Lollipop

It is exciting times with the new Android Lollipop (Android 5.0) release hitting a late model Android handset near you. It is apparently already available on Motorola Droid X and G and slated for Nexus phones on November 12th. There is a lot of interesting features in this release, from the switching on of the Android Runtime (ART) which enables ahead-of-time compilation of Android apps (instead of the more traditional Dalvik just-in-time compilation) to a new garbage collector.

ART was introduced in KitKat (Android 4.4) but it is now the default in Lollipop.

Wednesday, February 19, 2014

The Real Power Behind the Curtain (WhatsApp Edition)?

By now, you probably heard all about Facebook's $16B acquisition of WhatsApp. To put that in perspective, that is more than the total GDP of hundreds countries such as the likes of Iceland and ~30% the market cap of GM or ~10% that of Facebook itself. Consider how it was just a few years ago when Oracle bought the former behemoth Sun Microsystems (whose former campus is now Facebook's) for $7.4B in 2010. An interesting tidbit is behind the curtain is a modern, low-latency messaging backend powered by Erlang, a functional language optimized for concurrency and availability, which originally gained a following in the telecom community, especially from the likes of Ericsson. This is what ultimately powers the slick, reliable user experience of the messaging system that serves hundreds of millions around the world. Rick Reed, one of the engineers at Whatsapp, gave a talk on the use of Erlang at Whatsapp. One specialty of Erlang, which is also used by Facebook itself, is hot sliding, a kind of on-the-fly code module loading to enabling updating a system and pushing code without ever taking the system down.