Monday, April 9, 2012

High Frequency

Wissner-Gross and Freer has a 2010 paper in Physical Review on Relativistic Statistical Arbitrage [official version, pdf]. It makes the observation that as the propagation of prices and trades reach relativistic velocities, that very relativistic limitation of propagation induces some intermediate locations (midpoints between each pair of the 52 world exchanges weighted by turnover velocity) between exchanges from which profitable arbitrage can slow or even stop information propagation. The subject of the study is a Vasicek model of cointegrating instruments based an Ornstein-Uhlenbeck process and an Ehrenfest urn model of diffusion. One hypothesized strategy has to do with offsetting positions in geographically separated exchanges. Because proving one has the correct offsetting position in a distant exchange takes too much time, a low-latency system placed in the intermediate point between two exchanges can certify offsetting positions for both exchanges.

This does remind me of another article on the proposed London-Tokyo fiber link to run under the Arctic circle. The fiber infrastructure isn't always laid according to great circle distance. Many pairs of exchanges probably are not directly connected with each other at all. Moreover, turnover varies, perhaps this means one would need a mobile or at least a chain of load-balanced low-latency systems spanning the distance between each pair of exchanges to adapt to changing conditions. Perhaps there will have to be some dynamic hedging strategies too. Regardless, the paper is an interesting look into unintended consequences of the race to zero.

Friday, April 6, 2012

The Scala Ecosystem

Scala certainly has a lot going for it these days. They have the enthusiasm of at least a couple of the hottest tech companies out there in Twitter and Foursquare. Even Sony is using Scala in some of its systems. There are at least two fairly usable web frameworks, Lift and Play. Akka middleware provides a scalable lock-free concurrency abstraction. XML support is built-in. Interoperability with Java is standard, thus giving Scala access to important systems and APIs such as Hadoop, JSoup, Mahout, Pig, and the numerous Apache Java projects. There is even a linear algebra package Scalala. What is going to take it to the next level?

I've read posts about people recommending Python over Ruby due to the comprehensiveness of the platform given Scipy/Numpy/matplotlib and Django. Ruby, of course, has Rails to really turbocharge its audience. Objective-C remains the lingua franca of iOS due to fiat. The interesting observation is that the software development world has become very diverse. The proliferation of web services certainly helped because instead of having to interface with binary libraries and structured foreign function interfaces, new languages and platforms only have to interface with HTTP, XML, and JSON to achieve harmony with the Web 2.0 ecosystem. This lowers the legacy compatibility bar for new programming languages considerably. Considering how difficult developing good foreign function interfaces and runtime support for mediating between garbage collected (aka managed) languages and the previous lingua franca C was, this must be quite a relief to language designers. However, harmony with Web 2.0 isn't enough to achieve widespread acceptance.

Thursday, April 5, 2012

Probabilistic Counter

The High Scalability blog has a recent post on probabilistic algorithms for approximate counting. The author unfortunately coined the term "linear probabilistic counter", which corresponds to no results if you do a Google search. The more usual terminology is linear counting (a paper from KAIST gives the original definition from 1990) and closely related, approximate counting. Linear counting is a hashing-based sorting algorithm that approximates column cardinality. The name comes from the fact that the count estimator should be approximately proportional (linear to) the true count. The study of probabilistic algorithms is indeed a growing field. It is also tied to that oft forgotten technique of sampling in this modern age of big data. There are a lot of related concepts here. The common theme to these kind of problems is massive amounts of entries where each entry is constrained to be from some finite set.