Friday, November 11, 2011

Genetic Programming and Trading Strategies

From an anecdotal sampling of Linkedin and web discussions, people have applied various machine learning techniques to the creation of trading strategies. The most popular techniques for trading strategies mirror that of the general applied machine learning community, meaning SVM and neural networks are frequently mentioned. One relatively lesser known technique is genetic programming (GP), a variant of genetic algorithms for tree data structures, normally abstract syntax trees. Curiously, the academic community has studied genetic programming quite a lot. Indeed, there are many papers on genetic programming for trade strategies in GECCO (a conference on genetic and evolutionary computation) and other conferences/workshops. Some people from State Street have even contributed their study. The Quant Stack Exchange has a discussion on genetic algorithms where the problem of data snooping is frequently mentioned as a shortcoming, a problem addressed by regularization in conventional learning techniques. I informally survey some of the studies of the effectiveness of GP in a follow-up post.

My cursory interest in this area is the potential impact of programming language design and especially static type systems on the performance of genetic programming for trading strategies. The nice thing about genetic programming is that the results produced can be theoretically unrestricted unlike basic machine learning techniques that make assumptions such as linearity about the underlying model. Practically, however, the results produced by genetic programming are usually small and possibly uninteresting. This is where modern programming languages theory comes in. In the past decade or so, a number of researchers have studied the problem of the generation of random test cases (of which QuickCheck is a prime example). One of the main engineering challenges there is to ensure that the test cases generated do not all fall into a trivial category. The test cases here can range from instances of simple primitive types (e.g., ints) to nontrivial data structures (e.g., syntax trees or data packet queues). Usually it falls to the developer to write functions that generate a good distribution of test cases to ensure, for example, that the probability of getting a list of length greater than 3 is not negligible. In practice, QuickCheck supplies an assortment of combinators that help produce a good distribution. The developer's domain knowledge is still necessary, but the test generation problem reduces down to a choice of combinators.

1 comment:

Note: Only a member of this blog may post a comment.