It's not often one hears about vendor wins at publicity-shy Goldman Sachs, so it was intriguing to learn how they're making use of hardware acceleration. Low-Latency.com caught up with Mark Skalabrin, CEO of Redline Trading Solutions, to find out more about the deployment, and why Cell processors rather then FPGAs are in vogue at Redline.
Q: Redline recently announced a new customer in Goldman Sachs. What is it that you are doing for them?
A: Goldman Sachs is using our InRush Ticker Plant as the source of ultra-low latency NBBO (National Best Bid/Offer) quotes for order matching on their Sigma X dark pool.
InRush calculates the NBBO from direct exchange connections in a single server using hardware acceleration. Many industry solutions use consolidated feeds or the SIPs (Securities Industry Processors) as NBBO quote sources. Quotes from these sources can lag those provided by InRush by one to 100 milliseconds. By leveraging InRush, Goldman gets more accurate order matching while protecting their customers from latency arbitrage.
The InRush solution also reduces operational costs by significantly reducing the number of servers required to support Sigma X.
Q: With regard to that project, can you expand a bit on the challenges/complexities of calculating the NBBO, and why InRush is well suited to it?
A: InRush calculates the NBBO using full depth direct exchange feeds. These feeds provide the lowest latency source of quotes from the exchanges. The challenge is bringing all of the information from these feeds together at the same place without adding distribution overhead or falling behind when market data rates spike. InRush achieves this with consistent latency of five microseconds from when a network packet arrives until the application has the updated NBBO.
Because of the throughput and latency characteristics of our hardware accelerated solution, InRush is able to calculate the NBBO in a single server while leaving the majority of the x86 cores on the server available for applications.
Q: InRush is a hardware accelerated ticker plant. What elements/functions make use of hardware acceleration, and which elements run on the traditional CPU?
A: We accelerate the vast majority of the problem including message normalisation and a full depth book processing for every symbol on the wire. We use a single hardware thread on the server for managing the interface to the client application.
Q: You've adopted Cell Processors instead of the more common FPGAs for acceleration. Why was that choice made? Why are Cell Processors better than FPGAs?
A: We are very focused on choosing the right technology for the job. In the case of managing market data we chose a specialised stream processor over FPGAs because it allows us to accelerate more of the problem and because it is a better match for the operational requirements of the market.
FPGAs are good at normalising market data messages but are not a good match for managing full depth books. Normalisation is a small part of delivering a composite book and accelerating this alone is insufficient to get good results. Our solution allows us to accelerate both the normalisation and book building while freeing the server resources for the client application.
In addition, with FPGAs it is difficult to quickly identify and fix reported issues. The market demands updates within hours of a reported issue. Our solution is fully software programmable which allows us to more quickly identify, correct, qualify, and deliver once an issue is reported.
Q: Does Redline's Execution Gateway also leverage hardware acceleration? If not, why not? Might it in the future?
A: The Execution Gateway is a great example of our looking for the best technology and deciding that hardware acceleration is not required.
While low latency in market data is a lot about processing many messages in parallel, low latency in execution is about generating a single message fast. Since the source of most execution requests are tasks running on an x86 core, it is faster to form the order before the request leaves the processor and push it directly to the wire rather than add additional pipeline stages in specialised hardware.
This does not mean "acceleration" is not required, in this case the acceleration is extracting every ounce of performance from the x86 architecture.
Q: Can you say something about how your other customers are using your products, and for what kind of applications? What are the hot use cases? And is low latency always the focus?
A: Our customers include leading high frequency trading firms that are seeking the lowest possible latency as well as leading banks that are using us for algorithmic trading, crossing engines, and smart order routers where latency is critical but so is bullet proof data and system reliability.
While the majority of the use cases highly value our ultra-low latency, operational cost savings by reducing server count has become an important selling point. As an example, this has led to our deployment to feed thousands of screen traders where latency is not the driving need.
Q: How do you expect to be enhancing or expanding your products in the next few months? Functional enhancements? Performance improvements? New market focuses and/or data feed support?
A: We are aggressively improving performance and adding functionality. We have just released support for all Canadian equity exchanges and will be releasing our first support for Europe and Asia over the next few months.
In addition to geographic expansion we are expanding support across more asset classes. We are expanding our options support by filling out support for all of the direct options exchanges and will be adding support for new asset classes by the first of next year.