Programming languages, Cloud, and Financial Markets: 2012

Thursday, November 8, 2012

Railroads and Homes: What Railroad Stock is Most Correlated with Homebuilders

With homebuilders slowly, slowly on the mend, I was curious just what the satellite beneficiaries of any lasting recovery in homebuilding are. There are the obvious ones such as appliance makers (Whirlpool, etc.) and home improvement stores (Lowes and Home Depot). It was a shot in the dark but I wanted to see if there was anyone in transportation that was correlated with homebuilders. As a starting point, I considered railroads, which haul commodities and raw materials across the country. Calculating the correlations between leading homebuilders (TOL, DHI, PHM, LEN) and railroads (CSX, KSU, NSC, UNP) over the past 10-year span. It turns out if there is a connection, it is not at all clear. For TOL, the luxury homebuilder operating on both coasts, the correlation coefficient is very, very low and may either be positive or negative. Lennar, Pulte, and DH Horton appear to have a significant negative correlation to all four railroads.

The strongest correlation is between KSU and TOL. For NSC and TOL, note that about 1/10th of Norfolk's revenues come from hauling metals and construction materials including brick and cement.

"CSX", "KSU", "NSC", "UNP", "LEN", "PHM", "DHI", "TOL"
[[ 1.        ,  0.9227771 ,  0.96305861,  0.95654626, -0.51462037,
        -0.60367747, -0.36129913, -0.1042525 ],
       [ 0.9227771 ,  1.        ,  0.93120173,  0.96344464, -0.28884962,
        -0.44944239, -0.16605286,  0.05288716],
       [ 0.96305861,  0.93120173,  1.        ,  0.93354555, -0.38596351,
        -0.45196275, -0.20670427,  0.04849845],
       [ 0.95654626,  0.96344464,  0.93354555,  1.        , -0.44989026,
        -0.59985824, -0.3203473 , -0.08247585],
       [-0.51462037, -0.28884962, -0.38596351, -0.44989026,  1.        ,
         0.92695554,  0.93860533,  0.73458592],
       [-0.60367747, -0.44944239, -0.45196275, -0.59985824,  0.92695554,
         1.        ,  0.93068006,  0.77841425],
       [-0.36129913, -0.16605286, -0.20670427, -0.3203473 ,  0.93860533,
         0.93068006,  1.        ,  0.88519053],
       [-0.1042525 ,  0.05288716,  0.04849845, -0.08247585,  0.73458592,
         0.77841425,  0.88519053,  1.        ]])

Wednesday, November 7, 2012

Financials of a University System

University Bonds

Universities are interesting creatures financially speaking. They are relevant because they often float general obligation and revenue bonds to finance themselves. These bonds are typically tax exempt and are part of the greater municipal bond universe. As an asset, they are infrequently traded. As a business, they derive a considerable amount of revenues from their medical center operations, government research contracting, government appropriations, and finally tuition. Consider the University of California system, the largest of its kind 302k students and 136k faculty spread across 10 campuses.

University Revenues

The lion's share of the revenues come from the medical centers and government research grants/contracts. Actual state appropriation funding and tuitions contribute only a quarter of the total revenue.

Over the past 5 years, the tuition/fees category grew by about 61%. Medical center and government contracts revenue grew by 38% and 26% respectively. State appropriations and DOE lab contributions actually declined by 6% and 55% respectively.

University Expenditures

The chief risk in to the business is the defined-benefit pension program (amounting to a $2B/year payout). The number of retirees receiving payments may have only increased by 13% over the past 5 years, but the amount of payments has increased by 25.6% over the same period. Two years (2008 and 2009) of massive depreciation of the assets in the plan certainly did not help. Over the same period, the income part of the plan investments (interest and dividends) declined by 41% at least partially due to the low-interest rate environment of recent years.

The California state government budget crisis led to some deferred payments to the university system in 2010-2012, amounting to around $500MM each year.

Tuesday, November 6, 2012

Election Day Yields

What does the fixed income market look like after a horrible superstorm and on the an election day which may or may not be decided quickly? Since the announcement of QE Unlimited, mortgage-backed securities prices have come down quite a bit as has their commercial MBS cousin. In fact, MBSes have been pulling back pretty much all day. Emerging market debt has pulled pulled back. The following table shows the 1 week max drawdown, max drawdown as a % of principal, and standard deviation of prices for several fixed income ETFs. I bond (savings bond) rates have been reset to a mere 1.76% due to persistently low CPI numbers.

Sym	MDD	(%MDD)	STD	Yield
AGG	0.39	0.35%	0.15	3.54
SHY	0.03	0.04%	0.01	1.13
LQD	0.78	0.63%	0.31	4.81
CSJ	0.17	0.16%	0.06	2.79
CIU	0.34	0.30%	0.12	4.20
JNK	0.13	0.32%	0.05	10.16
HYG	0.33	0.36%	0.13	8.68
MUB	0.28	0.25%	0.11	3.53
MBB	0.18	0.17%	0.07	3.58
CMBS	0.09	0.17%	0.07
TIP	0.5	0.41%	0.19	2.48
EMB	0.75	0.62%	0.27	4.93
ELD	0.16	0.30%	0.06

Thursday, September 27, 2012

QE3 Fixed Income Aftermath

So what happens to fixed income after QE3? I thought it would be interesting to look at the max drawdowns versus the current 30-day SEC yields for a select cross-section of the fixed income ETF space for the past 52-weeks. All yields are quite compressed as is expected. Yields are so compressed that only MBB (US agency-based mortgage bonds) has a 30-day SEC yield exceeding the max drawdown. TIPs have negative yield as was true for quite a while now. It seems the contagion of return-free risk has spread to most of the fixed income ETF universe at this point.

Seeking Yield

Prolonged ZIRP has risk-averse conservative investors to a lot unlikely places. Mike Ashton, the Inflation Trader, who seldom recommends specific investments, has given a low-down on series I savings bond as a last bastion of "risk-free" inflation matching yield. Since TIPs are negative and nominal Treasuries have horrendously negative real yields, there aren't a lot of safe havens that make sense anymore. Investors have piled on corporates, emerging market debt, preferred shares, and even dividend-growers to bump up the scant yields they are seeing everywhere. Today, dividend-payers and growers (Schwab's dividend ETF (based on DJ US Dividend 100 Index) has a 30-day SEC yield of 2.99%) are looking like a relatively good deal when compared to Treasuries. But is all this risk worth it and what about the opportunity cost of sitting on short-term instruments? I bonds are interesting in that they have relatively high current yield (2.2%, the same as CPI-U) and are exempt for state income taxes (2.48% taxable equivalent yield for high income tax states). In fact, when the proceeds are used for educational purposes, it is also exempt for federal income taxes (meaning 3.8% taxable equivalent at maximum income tax rates). The term structure of these instruments are standardized: 30-years but redeemable penalty-free from 5-years. These instruments are not transferrable so there is no secondary market. The downside is that the excess fixed rate of return (set by Treasury) is guaranteed to be zero (which is better than TIPs right now) and each person can buy at most $10k worth of these.

Tuesday, September 25, 2012

Compiler Intermediate Representations

Recently I have been studying up on various intermediate representations for compilers. Heads and shoulders above the rest in popularity is of course LLVM. Interestingly, the more I read about and work with LLVM, the more I see the parallels with the FLINT intermediate representation used in Standard ML of New Jersey (SML/NJ). One of LLVM's key features is the so-called language-independent type system. This type system enables overloading of LLVM instruction opcodes to keep the instruction set small. The type system is also supposed to help debug optimizers. At least in terms of this application, FLINT's typed intermediate representation was also intended to help debug optimizer phases.

Apart from garbage collection, the other common facility intermediate representations and virtual machines must support is exception handling. LLVM supports exception handling and other forms of non-standard control flow through something evocative of delimited continuations in the form of two instructions invoke and unwind.

Wednesday, September 12, 2012

Political Apathy

It is very interesting that although the citizens in many other countries still look to America as a model in civil political discourse, we ourselves are growing increasingly disenchanted and cynical. At some level, I am not sure whether public discourse as a whole has taken the wrong road rather than just political discourse. Moreover, the apathy is even more startling. As we as a society become more polarized, we are simultaneously becoming more apathetic. The pollyanna in me tells me not to give up on democracy. Though we are constantly bombarded by talk shows and sound bytes, we have to recognize oratory hyperbole when we see it. As an oratory device, there is nothing inherently moral or amoral about hyperbole. One just has to be more discerning and careful about interpreting such claims and statements. Truly, the device is as old at public speaking itself. Although modern living has afforded with a multitude of means to get connected with our representatives, government, and each other, far more so than 100 years ago, can we really say that we are that much more engaged in public life? Take a step back. This country, although relatively young, has been through a whole lot of everything, from civil war, world wars, economic panics, and political scandals. Are politicians truly that much more untrustworthy when compared to that of the Gilded Age to warrant our disengaging from public life? In the days leading up to the Civil War, the country was truly divided. A President who was despised by whole swathes of the country was elected. Instead of disengaging from public life, whole states seceded from the Union. Truly, it is in these trying times that democracy must prove itself.

The Federal Reserve in Pop Culture and Mainstream Politics?


photo by wallyg	via PhotoRee

It is really quite interesting how the Federal Reserve ended up so squarely in the limelight in recent years. Sure, US Presidents, candidates, and other politicians have argued for and against national banks since the days of Andrew Jackson and Alexander Hamilton, but monetary policy has not been so close to the mainstream of political discourse in recent years as it is today.

Central Bank Intervention Risk: ECB Outright Monetary Transactions and Federal Reserve Quantitative Easing

It looks like all sorts of risk markets have taken off for the sky with all such markets looking at brand new multi-year highs or at least looking to reach there. Even today's underwhelming Nonfarm payroll jobs number could not dent the continued enthusiasm for central bank action past, present, and future. How long this will last is the question. In regards to the ECB, it is no longer hope but a known quantity, the newly announced and oftentimes leaked Outright Monetary Transactions (OMT) for "unlimited" bond buying on the short end of Euro sovereign debt. The bond buying program is said to be sterilized. Despite that claim, gold prices have shot up dramatically these past few weeks and continue to outperform risk markets. Next on tap is the FOMC decision. Some anticipate QE3 sooner than later, especially since Bernanke has mostly focused on the employment part of the Fed mandate as of late.

Central Bankers at Jackson Hole

Well, the much anticipated Jackson Hole Bernanke speech has come and gone. Everyone from bloggers to the big fund managers have taken drastically different interpretations of the speech. Some argue that the speech was even more bullish than announcing a definite QE3 right then and there. Others interpret this as definitely indicating that there will be no QE3 soon and definitely not before the election. On Twitter, PIMCO's Bill Gross claims

"#Bernanke to go out with his guns blazing. #QE3 a near certainty. It will be open-ended but increasingly impotent."

Most of the I-bank analysts interpret the speech as calling for more easing and on fairly quick order.

Thursday, August 30, 2012

The Race to Zero: The Real Niche for the Realtime Web

A recent Hacker News discussion centered around whether realtime is detrimental or at least unnecessary. Many of the commenters were quite disparaging of the realtime web, claiming that slow web is the way to go. They go on to make an arguable claim that HFT is an example of where "realtime" leads to more trouble than its worth. The arguments against the realtime web aren't all that new. The same arguments have been recycled from the contention that the Internet itself with its constant barrage of communication through emails, IMs, and Google queries have negatively affected our lives and compromised our concentration. In fact, at least two celebrity authors have come down on opposite sides of a related issue: Tim Ferris in The 4-Hour Workweek: Escape 9-5, Live Anywhere, and Join the New Rich (Expanded and Updated) argues that elements of technology such as email detract from life and really ought to be outsourced if possible whereas Doug Merrill in Getting Organized in the Google Era: How to Get Stuff out of Your Head, Find It When You Need It, and Get It Done Right argues that tech becomes an extended brain enhancing our abilities and increasing our capacity. Another more nuance take in Is Google Making Us Stupid? comes from Nicholas Carr, a Pulitzer-winning writer.

High-Level Programming Languages for Embedded Systems: Garbage Collection and FPGAs

Just when it looked like the big guys (well, basically Apple, that big chunk of the NASDAQ) were moving away from tracing garbage collectors, researchers from IBM Research has taken the leap to bring garbage collection to FPGAs [PDF]. Though hardware-assisted garbage collection isn't new, this degree of implementation of a complete concurrent garbage collector on an FPGA is. The authors consider this a first step in bringing high-level languages to the FPGA and embedded computing realm. It should be interesting to see where this goes.

Wednesday, July 25, 2012

Comparative Memory Management, Part 1

When Apple first announced Automatic Reference Counting (ARC) at WWDC in 2011, the fanfare was considerable. Developers took it be the best thing since sliced bread. Some called it "innovative". The whole idea that iOS and now MacOS X developers (since Mountain Lion) could be freed from error-prone manual memory management without the runtime pauses of conventional garbage collection was certainly played up by the Apple machine. However, when I saw the words "reference counting" and "no overhead" in the same sentence, I cringed. Reference counting isn't a new concept at all. Automatic reference counting is used in a number of languages including Python and Delphi, though neither of those languages can claim to be the sole pioneer. More importantly, reference counting implies the presence of a reference count for each object (a considerable runtime memory overhead especially if lots of small objects are allocated) and the constant maintenance of the reference count (i.e., reference counters for each object must be updated when new owners lay claim (retain) or old owners release).

Scala Gotcha?

There is a weird behavior where the Scala compilers scalac and fsc take argument classpaths that contain shell expansions (such as ~/Documents) in MacOS X but not the Scala REPL scala. The REPL will still execute, but any classpaths with ~ in them will be omitted.

Wednesday, July 18, 2012

How Top Hedge Fund Managers Started Out: What environments and opportunities gave rise to these financial business titans

The rarefied world of hedge funds normally does not like to stay public limelight too long. When they end up in the limelight, it is usually more trouble than it is worth. So how did these titans of finance get started? In particular, how many years of experience did they have and how much initial capital did they start out with?

David Einhorn of Greenlight Capital started in 1996 with $900K ($500K from his parents) after 2 years at SC Fundamental Value Fund. Cite. He graduated from Cornell in 1991 (Wikipedia).

Ray Dalio graduated from Long Island University and Harvard (1973) source, and founded Bridgewater Associates in 1975 after a few years working as a futures and equities trader. In the beginning, Bridgewater was for the most part an advisory outfit. It wasn't until 1987 that it started managing $5MM from the World Bank. Source

Glenn Dubin and Henry Swieca founded Highbridge Capital Management in 1992 with $35MM. Dubin graduated from Stony Brook University in 1978 with a degree in Economics, upon which he started as a retail stock broker at E.F. Hutton in the same year. Swieca graduated from Stony Brook University and Columbia. He started at Merrill Lynch and moved to E.F. Hutton eventually.

Daniel Och founded Och-Ziff Capital Management in 1994 with $100MM from the Ziff publishing family. He is a graduate of UPenn and an 11-year alumnus of Goldman.

Seth Klarman founded Baupost Group in 1982. He is a graduate of Cornell and Harvard.

David Tepper founded Appaloosa Management in 1993 with $57MM in initial capital after spending 8 years at Goldman and earning degrees from Univ. of Pittsburgh and CMU.

Steven Cohen founded SAC Capital in 1992 with $20MM of his own money. He started as an options arbitrage trader at Gruntal & Co. in 1978, giving him 14 years of experience before starting his own fund. He is a graduate of UPenn.

Friday, July 13, 2012

JavaScript Anti-Patterns

An unfortunate side-effect of JavaScript becoming pretty much the most widely used programming language of all time is that the community has expanded in such a way that there is a lot of bad advice going around. The distribution of skills in JavaScript programmers likely varies widely (anyone have any hard data on this?). The more I study JavaScript JITs and the community, the more I realize that JavaScript is truly the assembler of our time. The big catch is that back in the days of assembler, you didn't have millions of inexperienced people putting out code.

The problem is this. So-called JavaScript patterns nearly all have to do with micro-optimizations. So many micro-optimizations have turned into folklore in the JavaScript community. Due to the inefficiency of early JavaScript interpreters, things such as hoisting array range/bounds checks have become quite widespread. If all these micro-optimizations actually made things better, no one would question the whole endeavor. However, I dare say that most beginning JavaScript programmers don't have a good handle on all the caveats and provisos which come along with each of the micro-optimizations. Micro-optimizations are being marketed as something that will always make your code faster and implicitly never breaking code. This is how beginners will understand micro-optimizations. This is unfortunately quite a bit removed from the truth. Optimizations have provisos. Modern JITs are actually quite good at squeezing every little bit of performance out of JavaScript code. It those cases where a construct cannot be optimized in general, it may be due to a lack of information, especially domain knowledge, on the part of the JIT compiler. But when the compiler doesn't optimize, the programmer cannot blindly optimize either. In the range check example, the programmer must be certain that the body of the for loop does not modify the size of the array. If this weren't the case, then checking the array length just once won't cut it.

for (var i=0, len=arr.length; i<len; i++) {
  console.log(arr[i]);
  arr.pop();
}

The above code will print out a lot of undefineds. Leaving the range check optimization to the JIT compiler would avoid this error. Fortunately, threading is not in JavaScript. If your typical JavaScript program were multi-threaded, this so-called optimization may cause even more deleterious behavior since the value of the property arr.len is no longer clear from looking only at the body of the loop.

Monday, July 9, 2012

Fixed Income Update

While equities have certainly entered summer doldrums, bonds have recently mounted an impressive rally. It's no surprise at this point that high-grade corporates are very popular for fixed income allocations since they are one of the few remaining bastions where one can obtain an inflation-beating yield. Since the May 22 near term trough, high-grade corporates (as represented by LQD), high-yield corporates (JNK), and intermediate term Treasuries (IEF) have rallied 3.49%, 3.76%, and 2.03% respectively. I am choosing to compare LQD to IEF because they have similar durations, 7.69 and 7.55 respectively. Though Treasuries have yet to reclaim their highs for the year (achieved on Jun. 1), high-grade corporates are looking at a multi-year high (the highest at least as far back as 2002). The correlation coefficient between LQD and IEF during the span between May 22 to July 9 stands at 0.737. Historically, for the nearly 10-year period from July 31, 2002 to July 6, 2012, the correlation coefficient between LQD and IEF stood at 0.952. The whole Treasury complex is quite richly valued at this point. Have we reached the point where high-grade complex are even more richly valued?

Thursday, July 5, 2012

What is the ML Module System?, Part 1

Browsing through the heated Scala mail list discussion from 2009 on OOP versus functional programming, I ran into a number of messages which indicate some confusion as to the role and mechanism of the ML module system. The natural question that was raised is how is a module system different from a "low-level" object system. I am not entirely sure what a "low-level" object system is but in this post I would like to compare and contrast the ML module system and object systems, principally class-based object system but also classless object systems. First, what role do these language features serve? There is some overlap in purpose but also significant differences. The ML module system was originally designed to serve to roles:

To support programming-in-the-large organization of large system architecture
To facilitate the construction of a type-safe container library

In contrast, OOP is typically defined as supporting design goals of encapsulation, inheritance, and subtype polymorphism. On the surface, this suggests some overlap. Programming-in-the-large may sound like encapsulation, but as I will discuss, the reality is considerably more nuanced.

Formal Semantics for Top 5 Programming Languages

A recent blog post on undefined behavior in C got me thinking. Being from the ML community, I have a certain appreciation for rigorous formal semantics that can be machine checked. Though practical machine checking is largely a new development, rigorous formal semantics has been with us for decades. Standard ML is the epitome of this approach to language design. The 48-page (128-pp total when including appendices and TOC/index) The Definition of Standard ML - Revised formally specifies the entirety of the language¹. Don't get me wrong. The Standard isn't perfect. Indeed, it has some bugs of its own [PDF]. Nevertheless, for the most part, the Standard has raised the level of discourse and enabled succinct yet precise descriptions of a powerful higher-order typed language possible. This approach to modeling, defining, and evolving programming languages has in recent years taken a life of its own having been applied to both new experimental languages as well as time-tested existing languages.

Divergence

Recently, I was looking around for a nice quick and light nonfiction reading. NPR is usually a great source for hearing about new nonfiction. There are many NPR programs where the host invites an author to peddle his or her wares. Alas, this time I was looking for a more concentrated list of potential light reading (where my definition of light reading may differ from yours). Furiously flipping through the NY Times bestseller list yielded a couple of candidates, but quite matched my craving for something similar to Alan Abelson's weekly Up and Down Wall Street column in Barron's, always dangerously witty yet informative. It was then that I stumbled on Timothy Noah's The Great Divergence: America's Growing Inequality Crisis and What We Can Do about It. The Great Divergence is a terrific read for our times. It is actually quite well grounded, citing numerous research papers from economists such as Paul Krugman, Emmanuel Saez, and many others. The book itself is an expansion of Noah's 10-part Slate column on "The United States of Inequality" from 2010. Noah looks at all the usual suspects (demographics, immigration, automation/computers, government policy, globalization, and education) and grills each against the long history of research findings and data.

Garbage Collection in JavaScript, Part 2

Of the Chrome V8 JavaScript engine's 198kloc source (excluding comments), about 19kloc comprise the garbage collector. It is a generational Cheney copying collector with mark sweep. In October 2011 the V8 team added an incremental garbage collector to the mix. Incremental garbage collection contrasts with traditional stop-the-world garbage collection in that it is more amenable to low-latency applications requiring minimizing garbage collection pauses at the cost of reduced total throughput. The V8 garbage collector actually has a whole assortment of configurable flags (see shell --help). If these options can be tweaked, one can provide a customized JavaScript experience (especially garbage collector experience) tailored to the particular performance needs of the app.

The Scientific Method and Epistemology


photo by widdowquinn	via PhotoRee

Listening to a past Intelligence Squared discussion on whether "Science will have all the answers", or as interpreted by one of the panelist "science is the only route to knowledge." This question lies in the purview of epistemology, the study of the nature of knowledge and the limits of that knowledge. Curiously enough, all this question is ultimately wrapped up in mathematical logic. Taking a step back, consider for a moment when you were first introduced to the scientific method. Hypothesize, predict, and test. It sounds all nice and intuitively rigorous, but where does it come from? Why does it supposed to intrinsically make sense? It turns out that there is nothing really fundamental about this formulation of investigation. My sense of the matter is that people still treat science very superstitiously as if it were some kind of magic or dogma. From grade school on, we were taught the one true answer as to how the natural world works. We revere Galileo and Darwin, and their respective scientific discoveries as if they received the discoveries from the heavens and did not require experimentation, hotly debated peer review, and eventual validation by other work. Moreover, we were taught at least implicitly that these great scientists were never wrong. Their theories never had to be revised or be subject to independent validation. Moreover, it seems as if we think that their theories are and were always the only reasonable answer. There is a kind of survivor's bias in science education. We laugh off Lamarck and others as being so obviously wrong and even as absurd forgetting that they were real scientists too despite not having directly produced the prevailing theory of our time. In fact, I dare say that it would be difficult to say what our understanding of the world would look like today if not for these other scientists upon whose work still contributed to the development of our current understanding of the world. Even the scientific method itself is treated as received knowledge and not as an amalgam of multiple reasonable routes of scientific investigation. Science itself is presented as a winner-takes-all activity.

Skills Shortage

The Time Magazine has a recent article on Skills Shortage written by a Wharton professor. The key take-away is that businesses are leaving vacancies open due to lack of interest in training employees, matching pay with market demands, and other reasons unrelated to candidates' actual lack of knowledge. I've considered this phenomenon at length. One example of lack of interest in training the article had was an opening for a cotton candy machine operator which demanded considerable experience in successful cotton candy machine operation. My personal favorite, from my own observations, was a Craigslist job ad from 2007 for an iPhone app developer with 5+ years of iPhone dev experience. Considering that the iPhone had been released for barely a few weeks, I presume that 5 years would be hard to come by even from the original iPhone engineers themselves. This brings me to the main subject: job training is a real problem these days. Businesses don't want to invest in training because the days of stable, long-term employment is long gone. Young employees move from job to job. Employers don't feel to responsibility to employees they used to when mass layoffs and restructuring are the norm.

Near and close (Facebook IPO)

I read some interesting posts on HN that argue the advertising merits of social media. With the Facebook IPO just past us, there has been a bevy of posts and articles on how social media advertising did not work for some particular organization. It isn't only Facebook this time. At least one poster reports disappointing results from Twitter's paid tweets. IIRC, Twitter's advertising platform is still technically in "beta" at this point. In any case, despite the frenzy over social media as the magic lamp for advertising, there are at least some who realize that it isn't a turnkey solution. Companies hire social media experts to get out the tweets, retweets, and likes. Newcomers come with a budget and a lot of faith in the seemingly magically benefits of viral channels. Alas, the viral coefficient has to be maintained. Moreover, as demonstrated in some of those posts, viral promotion does not guarantee sales.

Notes from the MacQueenFest

I had the excellent opportunity to attend the MacQueenFest a couple of weekends ago in honor of David MacQueen. The venue provided interesting insight into what the alumni of the ML community was up to these days. Many of the slides are now up on the website. The talks were scheduled roughly chronologically based on Dave's contributions. Most were looking forward as much as they were considering the historical significance of the contributions.

Artists and Scientists

Recently, I finished reading the acclaimed tome by Eric Reis, The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. The ending was the most impressive part to me. Reis is willing to submit his own ideas of innovation accounting and the like to rigorous testing in startup research labs in universities. A footnote mentions that Nathan Furr of BYU and Thomas Eisenmann of Harvard Business School are already studying lean startup management practices. In a startup world full of dogma, this is thoroughly refreshing. This has also helped me see that there is a continuum of startup ideation and management practices. On one end, there is the thoroughly scientific crowd, focused on meaningful metrics and rigid adherence to the rule of data, however interpreted. On the other, there are those who rely on epiphanies, intuition, and instinct. I certainly don't claim that these are mutually exclusive nor that one is necessarily superior to the other. Context matters.

Flash Crash Research, Part 2

A couple of papers I spotted a while ago:
Easey et al study a measure of order flow toxicity called Volume-Synchronized Probability of Informed Trading in The Microstructure of the ‘Flash Crash’: Flow Toxicity, Liquidity Crashes and the Probability of Informed Trading. Johnson et al considers a large number of mini-flash crashes from 2006 to 2011 in Financial black swans driven by ultrafast machine ecology [PDF].

Tuesday, May 1, 2012

Mobile App Builders

In this day and age of mobile ubiquity and ever shorter attention spans, it seems that everyone and his uncle are jumping on the mobile app bandwagon. I was doing a little investigating to see how much does it typically cost to build one of these apps. The anecdotal range floating about appears to be in the low tens of thousands. Unsurprisingly, a number of entrepreneurial people have sought to capitalize on this market.

High Frequency

Wissner-Gross and Freer has a 2010 paper in Physical Review on Relativistic Statistical Arbitrage [official version, pdf]. It makes the observation that as the propagation of prices and trades reach relativistic velocities, that very relativistic limitation of propagation induces some intermediate locations (midpoints between each pair of the 52 world exchanges weighted by turnover velocity) between exchanges from which profitable arbitrage can slow or even stop information propagation. The subject of the study is a Vasicek model of cointegrating instruments based an Ornstein-Uhlenbeck process and an Ehrenfest urn model of diffusion. One hypothesized strategy has to do with offsetting positions in geographically separated exchanges. Because proving one has the correct offsetting position in a distant exchange takes too much time, a low-latency system placed in the intermediate point between two exchanges can certify offsetting positions for both exchanges.

This does remind me of another article on the proposed London-Tokyo fiber link to run under the Arctic circle. The fiber infrastructure isn't always laid according to great circle distance. Many pairs of exchanges probably are not directly connected with each other at all. Moreover, turnover varies, perhaps this means one would need a mobile or at least a chain of load-balanced low-latency systems spanning the distance between each pair of exchanges to adapt to changing conditions. Perhaps there will have to be some dynamic hedging strategies too. Regardless, the paper is an interesting look into unintended consequences of the race to zero.

Friday, April 6, 2012

The Scala Ecosystem

Scala certainly has a lot going for it these days. They have the enthusiasm of at least a couple of the hottest tech companies out there in Twitter and Foursquare. Even Sony is using Scala in some of its systems. There are at least two fairly usable web frameworks, Lift and Play. Akka middleware provides a scalable lock-free concurrency abstraction. XML support is built-in. Interoperability with Java is standard, thus giving Scala access to important systems and APIs such as Hadoop, JSoup, Mahout, Pig, and the numerous Apache Java projects. There is even a linear algebra package Scalala. What is going to take it to the next level?

I've read posts about people recommending Python over Ruby due to the comprehensiveness of the platform given Scipy/Numpy/matplotlib and Django. Ruby, of course, has Rails to really turbocharge its audience. Objective-C remains the lingua franca of iOS due to fiat. The interesting observation is that the software development world has become very diverse. The proliferation of web services certainly helped because instead of having to interface with binary libraries and structured foreign function interfaces, new languages and platforms only have to interface with HTTP, XML, and JSON to achieve harmony with the Web 2.0 ecosystem. This lowers the legacy compatibility bar for new programming languages considerably. Considering how difficult developing good foreign function interfaces and runtime support for mediating between garbage collected (aka managed) languages and the previous lingua franca C was, this must be quite a relief to language designers. However, harmony with Web 2.0 isn't enough to achieve widespread acceptance.

Thursday, April 5, 2012

Probabilistic Counter

The High Scalability blog has a recent post on probabilistic algorithms for approximate counting. The author unfortunately coined the term "linear probabilistic counter", which corresponds to no results if you do a Google search. The more usual terminology is linear-time probabilistic counting (a paper from KAIST gives the original definition from 1990) and closely related, approximate counting. The former paper describes that linear counting is a hashing-based sorting algorithm that approximates column cardinality. The name comes from the fact that the count estimator should be approximately proportional (linear to) the true count. The study of probabilistic algorithms is indeed a growing field. It is also tied to that oft forgotten technique of sampling in this modern age of big data. There are a lot of related concepts here. The common theme to these kind of problems is massive amounts of entries where each entry is constrained to be from some finite set.

Garbage Collection in JavaScript, Part 1

A recent post on Scirra claimed that reusing long-lived objects was an ingredient to good JavaScript garbage collection behavior. That made me curious. This claim is generally true when the garbage collector in question is a generational one. Generational garbage collectors split the heap (memory from where all non-stack allocated things are allocated from) into several "generations". The "generational assumption" is that short-lived objects tend to be collected more frequently than long-lived objects in the heap. Upon each garbage collection, any objects which aren't collected can be promoted to the long-lived generation. The idea is that by running garbage collection less frequently on longer-lived generations, garbage collection can feel more responsive.

The problem is that generational garbage collection isn't universal. It isn't the most responsive form of garbage collection or memory management either (concurrent and incremental garbage collectors are more advanced and optimized for realtime/low-latency applications). Furthermore, although generational garbage collectors for the major browsers are in the works [Firefox, WebKit/Safari], it appears that currently only Chrome deploys one in the release browser. In fact, Chrome V8 cites generational collection as one of the main features of V8 enabling high-performance JavaScript. Around 7/20/2012, Mozilla added an Incremental Garbage Collector to Firefox 16, which reduces GC pause times by incrementalizing collection rounds, but this says nothing of and does not rely on the generation assumption. In Lisp (the first implementations of "generation" garbage collectors by Lieberman and Hewitt [PDF] and Moon [PDF]), Smalltalk (one of the first languages with generational collectors [David Ungar's paper Generation Scavenging: A Non-disruptlve High Perfornmance Storage Reclamation Algorithm, PDF]) and the higher-order typed language work (Haskell, OCaml, Standard ML, etc.), we have been using generational collectors for quite a while.

The Scirra post suggests attempting to avoid garbage collection, but that is a very tricky matter. I think the realtime performance of such an approach would be very sensitive to the design of the garbage collector and any related heuristics. Moreover, if JavaScript interpreters move beyond generational collectors at some point, programming styles exploiting the generational assumption won't have the same payoff as before.

Continue to Part 2 in this series where I introduce profiling and benchmarking features in the Chrome V8 garbage collector.

Tuesday, March 13, 2012

Preferred Stocks

Preferred stocks have risen considerably from their lows last year. In fact, PFF (US Preferreds) outperformed the S&P 500 by 130 basis points excluding dividends. Given that the lion's share of the preferreds universe is financial, this isn't too surprising. The flip side of this rise is that preferreds are getting close even not having already exceeded their call price. Of the top 5 holdings in the PFF portfolio, I could find call information for only three of them. The top holding, GM series B preferreds trades at 42.23 but has no call feature. It is, however, convertible. The 2nd top holding HCS series B (HSBC Bank) trades at 27.33 and also has no call feature. It is not convertible. The 4th largest holding is WFC series J (Wells Fargo) which is callable at 25 but currently trades at 29.63. That amounts to a potential 15.6% loss right off the bat from a call. WFC.PFJ yields 6.78%, so you are no where near compensated for the call risk. That said, I was surprised that 2 out of the top 5 holdings did not have the call feature, so PFF, though still quite risky, may be more resilient than meets the eye. In the very least, we cannot say that its entire portfolio is callable at a moment's notice.

Monday, March 12, 2012

Asset Classes

Given the past few eventful weeks, I thought it would be interesting to review the state of the various asset classes. Below is a table with the 3-months returns for some ETF proxies for various asset classes.

Security	3 mos return
GLD	0%
OIL	7.4
JJC	7.7
GSC	9.1
LQD	4.4
JNK	3.7
EMB	4.4
LAG	-0.2
DIA	6.1
SPY	9.1
VWO	12
MDY	10.7
PFF	10.4

Notably, gold and bonds (LAG) are underperforming the other classes after tremendous outperformance last year (2011). With the good job numbers coming out in the past couple of months, the prospective for real growth is looking a little better, thus denting the appeal of gold despite the massive central bank easing through the world. Bonds, which have been looking a negative real yields for a while now, are still holding up pretty well all things considered. Of course, Operation Twist is probably still helping quite the bit.

Permanent Portfolio

Amidst the past few years of turmoil, Harry Browne's Permanent Portfolio is an interesting case. Browne is an investment manager, author, and two-time Libertarian Presidential candidates (for some reason a whole lot of investment managers are Libertarian). The Permanent Portfolio is composed of four equally weighted components: stocks, bonds (long-dated Treasuries), cash (short-term Treasuries), and gold (bullion). The Permanent Portfolio is currently available to the public in two forms: Michael J. Cuggino's Permanent Portfolio mutual fund (PRPFX) and Global X's Permanent ETF (PERM). Of course, one can certainly build one's own permanent portfolio using individual securities and ETFs. As least from PRPFX's perspective, the lost decade was hardly a lost decade. It returned a respectable 164% for the period 2002-2011. That isn't to say that there wasn't some volatility along the way. During the depths of the crisis from June 2008 to February 2009, the portfolio experienced the maximum drawdown of 19.6% (though max quarterly loss was only 6.86%). Cuggino's execution emphasized Swiss sovereigns (in Swiss francs so there is some currency exposure) alongside US Treasuries. For the growth component, he overweights energy, miners, and REITs.

Bid-Ask Spread

One reason for preferring highly liquid versus illiquid products is the bid-ask spread. This is a cost built into a trade when the trade is a market order. This spread also happens to be the market maker's profit for providing liquidity, whether it be a traditional specialist or a high-frequency trader taking the other side of your trade. Though I usually trade using limit orders to cut down on the spread I pay, for illiquid issues, it is really difficult to get the spread down by much if I want any timely execution.

One case I stumbled on is that of specialized ETFs. With so many specialized ETFs on the market, slicing, dicing, and grafting all different kinds of securities, some of them are bound to be too exotic to gain the trading volume of a typical S&P 500 stock. Moreover, ETF share creation and redemption mechanisms usually work to minimize costs, but this is not always the case. A case in point are the ETFs issued by brokers specifically for their own customers. Whereas SPDRS, Vanguard, and iShares ETFs trade millions of shares a day, some of these broker-tied ETFs only trade tens of thousands. These ETFs can be traded by anybody via any broker, but the customers of the issuing broker receive incentives such as commission-free trading. The price, however, is greatly reduced liquidity and large bid-ask spreads. I frequently see spreads of 0.5%+ for some of these products. For a large number of trades and a good amount (> $2000), paying the spread quickly becomes more expensive than paying a hefty commission. Large bid-ask spreads certainly aren't restricted to certain low volume equity-like products. Spreads in forex are quite lucrative for forex brokers, some of whom do not charge commission at all in favor of profiting for spreads. For currency-pairs such as the New Zealand Dollar (NZD), large spreads are quite common. In contrast, EUR-USD and JPY-USD usually maintain 1-2 pip spreads. However, during major events and announcements, even these high-volume instruments may see spreads widen considerably.

Tuesday, February 28, 2012

Market Manipulation, Part 1

If the investment advice and forums are any indication, a lot of people consider the markets rigged and manipulated. But what is a good working definition of manipulation? There is price manipulation. That entails the defining what an "artificial" price is. Price manipulation is probably one of those things with a long, long history. One of the most famous cases is that of the Hunt brothers' attempted cornering of the silver market in the late 1970s and early 1980s. Market bubbles may also contribute to "artificial" prices, but bubbles are more of a natural psychological reaction of market participants, not some insidious conspiracy. This difficulty works both ways. Government and industry regulatory organizations cannot be too specific about what signals of fraudulent market activity they are scanning for lest the perpetrators simply work around those signals to escape detection. Besides straight price manipulation, regulatory agencies also consider order flow and spread manipulation.

Johan and Cumming did a study on Market Surveillance regimes around the world in a 2008 paper Global Market Surveillance.

Rosa Abrantes-Metz has written a number of papers on the subject including a 2007 paper Is the Market Being Fooled? An Error-Based Screen for Manipulation.

Wednesday, February 15, 2012

Flash Crash Research, Part 1

Unsurprisingly, given the interest of regulators and trading firms, there is a growing literature on the May 6, 2010 Flash Crash and "mini-flash crashes" throughout recent history. Unofficial high-speed liquidity providers, free from any contractual obligations, can become high-speed liquidity takers. Filimonov and Sornette investigate prediction of endogenous-feedback loops in the research paper Quantifying reflexivity in financial markets: towards a prediction of flash crashes.

Tuesday, February 14, 2012

Behavioral Finance and Sentiment/Tone Mining

There are two prominent claims about the stock market: one is that machines are driving the entire market these days. The other is that behavioral finance works and even sentiment-based trading can produce excess profits.

Dividends and Capital Gains Taxes

It probably won't garner much fanfare this time through, but the Bush-era (Jobs and Growth Tax Relief Reconciliation Act of 2003) qualified dividend and long-term capital gains tax cuts are set to expire by the end of 2012. This would potentially more than double the tax on qualified dividends. For non-tax sheltered accounts, this could dramatically affect the net returns of dividend income-oriented portfolios especially if there ends up to be a discrepancy between qualified dividend and capital gains taxes. It will be interesting to see how this affects investment strategies. Tax-free muni-bonds have appreciated considerably during last year. The relative tax efficiency of different investment instruments may change in the decline of special treatment of qualified dividends and long-term capital gains.

Richard Shaw studied this issue a few years ago before Obama and Congress extended the tax cuts.

Thursday, February 9, 2012

Functional Languages and Call-By-Value, Part 1

A whole lot of work goes into avoiding unnecessary copies in C++. Much of this responsibility lies in the programmer with manual optimizations such as passing large objects by pointer or by reference rather than by value. This concern is pervasive in the language, from the choice of increment/decrement operators to the implementation of constructors. Many compiler optimizations are also target copying such as the return value optimization (RVO) and, more recently, the move semantics in C++11. The reasoning behind the emphasis on copying seems intuitive enough: pass-by-value of any data larger than a primitive type (i.e., larger than a pointer) gets too expensive especially if the data in question is in a temporary which must be destroyed anyway. C++'s machinery to work-around this inefficiency isn't representative of imperative and object-oriented languages. For example, Java makes references the default way for manipulating objects even though the primitive calling semantics is pass-by-value. However, in functional programming, copying is seldom the overriding emphasis. Why isn't copy as big of a deal in functional languages? For the sake of argument, I will take ML as the canonical functional language. Lazy evaluation in languages such as Haskell adds another layer of complexity which I won't address here.

Indicators and Timeframes

MarketSci has a short post on what they consider long, intermediate, and short term indicators and strategies. Most indicators can vary sampling periods in order to smooth them out to avoid false alarms. As indicated by MarketSci, the relevancy of different indicators does depend on the amount of data as well as the timeframe. However, with the advent of practical high-frequency data, it is no longer the case that smaller time frames necessarily implies sparser data. One could certainly be sampling weekly for a multi-year indicator and end up with less data than sampling ticks for a few minutes in high-frequency data. The guidelines seem to be that long-term is ruled by trend-following (e.g., moving averages), intermediate-term by mean reversion, and short-term by "noise" (their example is RSI).

Monday, February 6, 2012

Quantitative Behavioral Finance

Though Fama's Efficient Market Hypothesis (EMH) and variants thereof reigned unchallenged for several decades, during the 1980s and 1990s, behavioral finance gained significant credibility. Behavioral finance has since evolved and given rise to subfields such as Quantitative Behavioral Finance. The hypothesis remains the same: markets are not completely efficient because the human participants are not completely rational. Cognitive biases are the main culprit. Quantitative Behavioral Finance takes this idea a little further by rigorously modeling market prices given such biases (which manifest themselves in terms of underreaction and overreaction) exist by combining differential equations and statistical time series analysis.

Richard Thaler's article, "The End of Behavioral Finance", is an excellent survey of the first decade.

Friday, February 3, 2012

Saving Up for College Tuition and Hedging, Part 4

Are prepaid tuition programs a great investment? One is right to be skeptical. The programs vary considerably from state to state. Bankrate.com has an article about some of the programs. It turns out that one can be paying anything from 41% to a slight discount to current tuition. Two states, Pennsylvania and Texas offer programs which do not ask for a premium as long as you use the tuition vouchers for state schools. In the Texas case, you receive fund performance if you elect to go to a non-state school. Virginia's program appears to be offering tomorrow's tuition at a slight discount even compared to today's tuition rates if one goes to the most expensive state school. Otherwise, you would be paying a premium. For most of the state programs, tuition inflation will have to accelerate considerably for the programs to be worthwhile. Still, though purchasers of prepaid tuition vouchers pay a premium, the states are still on the hook if tuition inflation does get out of hand.

Saving Up for College Tuition and Hedging, Part 3

Ever since the government permitted it, many private colleges have been hopping onto the tuition prepayment plan bandwagon. Unlike the College Board's IC 500 index, tuition prepayment often does not include room and board increases. Colleges including some of the Ivies (e.g., Dartmouth, Penn, Princeton), MIT, Stanford, UChicago, and USC tout the private college prepaid plan. For a complete list of the 270+ private schools using this plan, see the consortium's website (managed by OppenheimerFunds, which also happens to manage many of the state 529 plans). States sponsor their own, but some are portable and can be used to fund tuition at out-of-state private schools or even select foreign ones.

Saving Up for College Tuition and Hedging, Part 2

To answer the question of college tuition hedging, we need to determine the amount of tuition increases and the variability in that change. Generally, higher education revenues come from federal and state aid, alumni giving, endowment returns, and tuition. For research universities, a big chunk comes from research grants. Thus, changes in funding levels for each of these components must be compensated by the others. How have these factors evolved in the past few decades? Can we explain tuition inflation in terms of these other factors?

This is second in my on-going series of posts on college tuition and investment. See the first post.

C++11 Compiler Support

Compiler support for C++11 is quite uneven at this point. Both Visual C++ and the Intel compiler only support 31 of the features. GCC has significantly better support at this point (48 features).

Monday, January 30, 2012

Trade Frequencies, Part 2

One of the main challenges in high-frequency finance is the irregular arrival of prices. Once you get down to the tick by tick prices, you can't count on normal interval time series. There are autoregressive models which support irregular arrival of prices. Engle and Russell's autoregressive conditional duration (ACD) model comes to mind (described in Econometric Modeling of Multivariate Irregularly-Spaced High-Frequency Data which extends the Engle-Russell Autoregressive Conditional Duration (ACD) model for the multivariate case).

Stock Price Limits and ADRs

Some foreign stock markets have daily price limits. For example, in Taiwan, the daily price limit is 7%. How often do we hit these limits? The US ADR for Taiwan Semiconductor (TSM) exceeded the daily price limit ceiling 114 times and dropped below the daily price floor 47 times during a 3597 trading day period from 1997 to yesterday (using Yahoo's adjusted prices). There turns out to be a few papers on price limits and international markets.

Finance Research Papers

Here are some papers of interest I ran into recently:

Academic Programming Languages in Mainstream Use

It is said that once a programming language is born, it never really dies. That may be so, but popularity can still be a fickle thing. I think it is safe to say that most academic programming languages never make it out of the lab in a serious way. However, there are a few prominent exceptions to this observation, some of which were and are wildly successful:

Finance Research Papers

Here are a few hot-off-the-press research abstracts which caught my eye:

Need for Speed: An Empirical Analysis of Hard and Soft Information in a High Frequency World studies news sentiment-based high-frequency trading using an extension of Hasbrouck's market microstructure model (and Chaboud and Tookes) and NASDAQ high-frequency data from 2008 to 2009. The paper seeks to answer whether we can effectively do machine-driven trading based on news sentiment and whether speed matters. The author distinguishes between "hard" and "soft" information events where hard events are price data and soft events news ticker items, blog posts, and Twitter messages (though this paper focuses on news ticker alone). This study only uses Reuters News Sentiment software. The conclusion is that at this point, non-high frequency traders have an advantage in processing the "soft" information events. I think this area is gaining quite a bit of attention with both bigger firms (Thomas Reuters, Dow Jones) and smaller ones (a startup named Stockr.com (invite link) for example) are investing in news sentiment and social media-based trading (respectively).
Market Making Under the Proposed Volcker Rule
No Strings Attached: When Giving it Away Versus Making Them Pay Leads to Negative Net Benefit Perceptions in Online Exchanges

New C++ Features and C++ Texts

Without pretty fundamental changes such as concepts, it will only be a matter of time before we get a smattering of books on C++11. Some existing C++ books have been updated with a chapter to two on C++11 features. They will of course vary in quality considerably. David Vandevoorde is behind at least a couple of the new features. He has a book C++ Templates: The Complete Guide on template metaprogramming.

Writer's Market

A couple of days ago, I encounter 2012 Writer's Market Deluxe Edition (Writer's Market Online) at a neighborhood bookstore. The book contained a catalog of freelancing opportunities in a wide assortment of magazines and journals. What I found interesting was that financial and technology freelancers did not receive an appreciably higher rate than other areas. There were certainly exceptions, but for the most part, there was no real discrepancy between rates in different fields. The main trading-oriented publication the book listed was www.traders.com (Technical Analysis, Inc., publisher of Stock and Commodities, Working Money, and Traders.com Advantage magazines), where freelancers apparently contribute a good deal of the articles.

According to the Editorial Guidelines, the top 5 hot topics are relative strength indicator (RSI), average directional movement rating (ADXR), Fibonacci ratios, MACD and moving averages, and volume trends. I am a bit surprised to see RSI so prominent. In contrast, stochastics and chart reading are much further down on the list.

Monday, January 23, 2012

Saving Up for College Tuition and Hedging, Part 1

One of the biggest expenses for many American families is college tuition. In fact, it is a component of the Consumer Price Index (CPI), though not a very big component. The other overwhelming expenses are transportation (typically cars) and housing. For transportation and housing costs, you can at least partially hedge against further price increases (however imperfectly) by investing in appropriate securities (crude or RBOB futures and Case-Schiller housing futures). According to the December 2011 CPI report, tuition has increased by almost 7-fold since 1984 (the baseline of the CPI). For comparison, tuition increases have dwarfed even growth in hospital services expenses (only 6.5x). The only component of the CPI that grew more was tobacco (8.4x). Thus, not only is tuition a big expense, it is also one of the fastest growing. According to the Bureau of Labor Statistics (BLS)'s analysis, the college tuition inflation rate averages about 6.7% annually for the past 10 years (with a low of 4% and a high of 9.8%), even amidst recession. Recession exacerbated the increases as governments cut funding. Now what kind of investment can give an 8% annual return even in the midst of a massive downturn? A tax-sheltered education savings account such as a Coverdell or 529 Plan helps, but even then a steady 8% pre-tax return from index or mutual fund investing is quite challenging. MyMoneyBlog puts everything in perspective, showing that tuition increases dwarf that of the housing bubble.

In this series of posts, I will be looking into the cause of tuition inflation and the different possibilities for dealing with the phenomenon in investment terms.

C++ and GCC

Now that C++11 has been live for a little while, I was curious of the state of support from the major compilers. GCC/G++ is the major one. As of GCC 4.7, it appears to support of good deal of C++11 features which actually made it into the standard (concepts being the big omission). Variadic templates, auto-type declarations, initializer lists (vector<int> v {1,2,3}; works), unions supporting all non-reference member types, strongly-typed enums and many others made it into GCC. There is even a range for (for i : collection), something Java incorporated a while ago and the scripting languages had before then. The omissions (stuff in the standard but not in GCC 4.7) include garbage collection and most of the concurrency features.

Thursday, January 19, 2012

Algorithmic Finance

There is a brand new journal combining Computer Science and Finance, Journal of Algorithmic Finance. Both the advisory board and editorial team are ensemble casts with some of the biggest names in computer science and finance. They just published their inaugural issue in December 2011. The top paper (as indicated by number of SSRN downloads) is one on Markets are Efficient if and Only if P = NP, which perhaps is very reassuring to all the traders out there, at least at a theoretical level. For those unfamiliar with the terminology, P means solvable in polynomial time (i.e., we have reasonably smart, efficient algorithms to solve them) and NP means verifiable in polynomial time (i.e., given a solution, we can verify that it works efficiently). NP-complete is a class of problems such that all problems in NP can be (efficiently) reframed in terms of any NP-complete problem. For problems in NP but not in P, we generally do not have efficient algorithms for exact solutions. Instead, we pretty much have to fall back on exhaustive search (i.e., try all the possible answers and see what works). If one can solve an NP-complete problem efficiently in the general case, then one can solve any problem in NP efficiently, which potentially saves some astounding sum of money and enables a lot of previously impractical technologies. The more interesting implication of the paper is that the market can serve as an oracle to solve NP-complete problems. Recently DARPA has decided to invest in crowd-sourcing and gamification to solve difficult formal verification problems. Instead of potentially awkward game embeddings of difficult computational problems, I wonder if market strategies may be more effective and direct.

Earnings versus Dividend Yields

Given the good Bank of America earnings announcement today, I thought it is fitting to look at the earnings yield versus dividend yield of the S&P over a 50 year stretch of time. The ratio has been growing for a long, long while. Back in the 1960s, companies paid out a lot more of their earnings in dividends. Even given the rising trend, the ratio seems over-extended at this point. We have pretty high earnings yield at 7.7% but only a 2.1% dividend yield. The ratio seems to fall whenever we have some recession or slowdown, indicating that dividend cuts may lag earning declines. Assuming that earnings yield at least hold the line (a big IF), the long term trend suggests an overall dividend yield of 2.57%.

Net Worth, Income, and Media

In New York, everywhere you look is CNBC and Bloomberg. Both tout how their viewership has high net worth and income. With all the captive audiences on trading floors around the world, that isn't a far-fetched claim. For the most part, though, all I ever see on those two channels were GEICO commercials. The interesting part is that as a percentage of viewership, seekingalpha.com does better than both CNBC and Bloomberg at attracting people with household incomes 100k+ (44% versus 40% and 38% respectively). One can do better than that if you can convince kitcometals.com, Neiman-Marcus, or ThinkOrSwim.com to let you advertise on their sites which attract 56%, 41%, and 52% respectively of 100k+ household income viewership as a percentage of total viewership.

I would have included MoneyScience.com in this comparison but the data source I used was quite US-centric.

Microstructure Modeling

Jonathan Kinlay has an interesting list of papers on market microstructure including his synopsis of each. Many of them applied a vector autoregression model to quote and trade data. The focus, however, is always the impact of the limit order book and the strategies for generating new bids and asks on the limit order book. All of the studies focused on the stock markets. Perhaps the most interesting of the papers is a recent one on Price Dynamics in a Markovian Limit Order Market from Rama Cont of Columbia, which provides a mostly analytical model for high frequency dynamics of prices and order flow with endogenous relationship between durations and price changes. Most of the data in this study came from 2008. According to that data, for certain DJ stocks, ~1.2% of observed bid-ask spreads were more than 1 tick, thus not as pertinent to the model. The average lifetime of such a spread appears to be only a couple of milliseconds. The model does consider order flow in the presence of market orders and cancellations, including the case when they dominate limit orders.

I also found a number of books on the subject (more after the jump).

Memory Management and Concurrency in C++

Despite the flurry of activity from the Boost and C++11 standards committee, two crucial things remain quite challenging: efficient memory management and concurrency. What started me down this train of thought was some good old code reading. I encountered plain old vectors of pointers. There are apparently a handful of alternatives to this. Boost implements ptr_vector that keeps track of the lifetimes of all its elements as a unit. vector<shared_ptr> is another possibility. The core of the problem is that since vectors' underlying representation is a pointer, and one cannot get a pointer to a reference (the assumption being one wanted the efficiency of reference-passing but with the convenience of STL containers).

More after the jump

Airport Reading

When I am stuck in the airport I often visit the bookstores there. Airports are apparently one of the last bastions where bookstores still exist whereas they have completely disappeared in some towns. Most of the time, airport bookstores do not carry anything of interest to me except the latest Barron's and a few finance magazines. Consequently, I was surprised pleasantly to see at least two of Michael Lewis' books at one airport bookstore, Boomerang: Travels in the New Third World (about the roots of financial crises in Greece, Ireland, Germany, and the US) and Moneyball (about the business and economics of baseball), both being NY Times bestsellers. Lewis the the author of Liar's Poker, a sardonic expose on the unreal bigger-than-life world of Wall Street in the 1980s.

Tuesday, January 10, 2012

Earnings season is upon us

It's earnings season again. Alcoa started us on a sour note with a multi-cent miss even after excluding one-time items. Tiffany's warned. But benign retail numbers were out this morning (with Ross and Costco among others doing fairly well). This has given the market a lift.

I revisit a trade I was watching but did not make after the jump.

Trade Frequencies, Part 1

Typically what people refer to as high-frequency trading occurs on the stock markets. However, firms have also branched out to a whole assortment of other markets in attempt to reap the alpha from high-frequency strategies. Moreover, data such as futures prices sometimes serve as a signal for equity trading. Different kinds of markets, however, have considerably different characteristics. Moreover, the volume and trade frequency varies among each individual security. On one side, you have the SPY ETF which is easily among the most actively traded securities in the stock market with daily volumes in the hundreds of millions. During August 2011, SPY's volume even spiked to as much as 717 million, though 100-200 million is more typical. To put that into perspective, the total daily volume of the NYSE is about a couple billion in recent times. In contrast, some securities are barely traded at all.

Garbage Collection and C++

C/C++ had the potential for automatic garbage collection in the form of the conservative mark-sweep Boehm-Demers-Weiser garbage collector for decades (at least from 1988 onwards). Although an early version of the garbage collector is even included in the GCC distribution, only a few groups actively use it. Some have used it as a leak detector. Boehm himself explains this as due to lack of standardization. With C++11, the hope is that this will change with official sanctioning of garbage collected implementations. The slide presentation you can find in the above link summarizes the design. The argument for garbage collection in C++ is the move to multicore, where the claim is that GC parallelizes better than manual management and that some parallelization techniques are simplified in the presence of GC.

POPL 2012 Papers

It's just a few weeks until POPL 2012 (Symposium on Principles of Programming Languages, one of the top programming language theory research conferences) in Philadelphia. The program has been out for a while. This time, there are three papers on C/C++ semantics. Apparently we have gotten to the point where researchers have (mechanically) formalized interesting chunks of that large, complicated edifice. The one coming out from INRIA [PDF] has a mechanized (Coq) semantics for C++ construction and destruction including C++11 features. The authors actually give a formal proof of the RAII (resource acquisition is initialization) pattern, specifically that in a terminating program every construction of a subobject is indeed matched by a destruction. The process of developing a formal semantics also helped identify issues in the C++ standard both C++03 and C++11. On the type systems side, there is also a paper from Adobe Research on gradual type systems, especially a type inference scheme with support for dynamic types. One paper describes a type system and language design supporting probability density functions for custom probability distributions and continuous probability. Apparently probability distributions are monads too!

Wednesday, January 4, 2012

Market Indexing

Fundamental indexing, an idea promulgated by Rob Arnott and his firm, caused quite a stir when it was introduced in 2004 with many questioning the whole idea. It has since given rise to a number of ETFs. The original is PRF, an ETF launched in December 2005. According to PowerShares' site, they are benchmarking PRF against the Russell 1000. Powershares followed on this product with 5 more, covering the developed markets ex-US (PXF), Asia Pacific ex-Japan (PAF), emerging markets (PXH), developed ex-US small cap (PDN), and US small-mid cap (PRFZ).

Forward Looking

Now that the holidays are over, what is going on in the markets? A couple of European debt auctions went fairly well last week. This does not change the overall long-term picture, but looks like nothing collapsed over the holidays. Moreover, the Iran-Strait of Hormuz situation could have spun out of control early on, but it did not (yet).

I was curious what news items were affecting the companies on my equity watch lists (both long and short). On the growth side of things, the basic materials and energy side lead the market (APA, EOG, ROSE, APC, HAL, NE, MOS, FCX). FCX is up ~6%. The rest are up more than 2% for the most part. HAL is suffering from some headline shock due to its spate with BP (BP is asking HAL to foot the cleanup bill plus lost profits), but the damage is mild since nothing is decided yet.

Thursday, November 8, 2012

Wednesday, November 7, 2012

University Bonds

University Revenues

University Expenditures

Tuesday, November 6, 2012

Thursday, September 27, 2012

Wednesday, September 26, 2012

Tuesday, September 25, 2012

Wednesday, September 12, 2012

Saturday, September 8, 2012

Friday, September 7, 2012

Friday, August 31, 2012

Thursday, August 30, 2012

Wednesday, August 29, 2012

Wednesday, July 25, 2012

Thursday, July 19, 2012

Wednesday, July 18, 2012

Friday, July 13, 2012

Monday, July 9, 2012

Thursday, July 5, 2012

Wednesday, June 27, 2012

Tuesday, June 26, 2012

Wednesday, June 20, 2012

Tuesday, June 19, 2012

Monday, June 4, 2012

Tuesday, May 22, 2012

Monday, May 21, 2012

Thursday, May 3, 2012

Wednesday, May 2, 2012

Tuesday, May 1, 2012

Monday, April 9, 2012

Friday, April 6, 2012

Thursday, April 5, 2012

Thursday, March 15, 2012

Tuesday, March 13, 2012

Monday, March 12, 2012

Friday, March 2, 2012

Thursday, March 1, 2012

Tuesday, February 28, 2012

Wednesday, February 15, 2012

Tuesday, February 14, 2012

Friday, February 10, 2012

Thursday, February 9, 2012

Wednesday, February 8, 2012

Monday, February 6, 2012

Friday, February 3, 2012

Thursday, February 2, 2012

Wednesday, February 1, 2012

Tuesday, January 31, 2012

Monday, January 30, 2012

Friday, January 27, 2012

Thursday, January 26, 2012

Wednesday, January 25, 2012

Tuesday, January 24, 2012

Monday, January 23, 2012

Friday, January 20, 2012

Thursday, January 19, 2012

Wednesday, January 18, 2012

Tuesday, January 17, 2012

Monday, January 16, 2012

Tuesday, January 10, 2012

Friday, January 6, 2012

Thursday, January 5, 2012

Wednesday, January 4, 2012

Tuesday, January 3, 2012