Most agree that most of the effort to date to reduce the latency of trading systems has been focused on the 'easy' areas - lowering propagation latency, speeding up network stacks, using faster processors. The 'hard' part - tackling application design - is yet to come, and parallelisation of business logic - aka concurrency - is the key to it. Technologists at SunGard have been working on making it easier to build low-latency and high-throughput concurrent applications as part of the Stealers initiative. Low-Latency.com spoke to SunGard's Aditya Yadav to find out more.
Q: First off, can you tell us what you - and your group - do at SunGard?
A: I head the India R&D unit of the Advanced Technology Services (ATS) business for SunGard’s consulting services. The purpose of our group is to incubate emerging technology, applied research and consulting. What makes us different from other companies’ R&D groups is that we extensively consult with our product teams and consulting customers to apply technologies to solve their business problems. We have more than 400 individual technologies in our current portfolio grouped into major areas like Low Latency, Big Data, Analytics, Data Science, Statistics, Cloud Computing, Hardware Acceleration, Natural User Interfaces and Mobile, as well as some other practices like Agile and Infrastructure Management.
The group was created to leverage technology in order to solve some of the toughest problems the industry faces. We work across financial sectors and create horizontal technology tools, frameworks and platforms around these technologies, while also creating vertical technology solutions around media, energy and finance, for example.
Q: You have created something called Stealers. What is it?
A: Moore’s Law says that the power of computing chips will double every two years. And that is the way most of our applications were designed until now: to use more powerful processors and doing mostly one thing at a time. But due to the power consumption of processors, they cannot be made to go any faster, so the only way to create better software now is to make use of multiple processors on a computer. This means we have to make software that can run multiple things in parallel, which requires sharing data across parallel threads and relying on mutual exclusion algorithms. This is generally termed as the shared memory design of concurrent applications. It is very difficult to develop, test and debug, extend and evolve. The other approach is to use a message passing style of developing applications where there is no global shared memory and inside the system all processing happens using messages that are passed around. These systems are easier to design and debug. This is the approach that “Stealers” takes.
Stealers is basically a general purpose, concurrent programming framework that gives us the ability to create a high throughput or low latency system without having to deal with the shared memory design problems of concurrent applications. Currently implemented in Java, the Stealers framework is proven to offer high throughput and the lowest latency possible without using any specialised hardware. The advantage of Stealers is its ability to separate the problems of designing a concurrent system and the real application development into two parts, leaving Stealers to handle the high throughput and low latency concurrency framework and letting you focus on building an application around it. While there are other frameworks that do this, Stealers outperforms everything we have come across until now yet doesn’t use any hardware specific optimisations or features. This means Stealers will work on physical hardware, virtualised hardware and clouds. It also doesn’t use bounded memory data structures to deliver low latency, which means you’re not limited in terms of the number of processors used or the amount of RAM you want to throw at it, without requiring any configuration, parameters or tuning.
Q: What kind of applications can benefit from Stealers?
A: Stealers can be used to create both low latency applications and high throughput applications. For example, on the low latency side it can be used to create algorithmic trading applications that can process feeds from tens of thousands of symbols and process them using thousands of trading strategies using a distributed cluster of a hundred or even five hundred computers. Or one can build faster trading exchanges that process millions of transactions per second (TPS) compared to current ones that do about 200,000 TPS.
An example of a high throughput application would be processing 10 events per second from 200 million smart meters across the U.S. - in effect, two billion events per second. Or you may be monitoring 400,000 racks across 20 data centers in real time and raising alerts for someone to take pre-emptive failure measures. If each machine is sending 500 metrics, then you could do 200 million events per second, which is a great use case for Stealers.
Q: Is Stealers a product in itself, or is it being used in SunGard's products?
A: Stealers is a production-ready technology framework. We hope to use Stealers to help solve large scale challenges our customers have, and also support the next generation of SunGard’s financial and energy products and platforms.
Q: How does Stealers compare with other similar developments - such as the open-source Disruptor (pioneered by brokerage LMAX), BackType's Storm or Yahoo S4?
A: Both Yahoo S4 and Storm are distributed systems, while Stealers and Disruptor are just the core around which we would need to build a distributed system and then build an application on top of that. But considering the value that Disruptor and Stealers bring, the effort of having to build a distributed system around it is well worth it.
Stealers’ throughput surpasses that of Disruptor. Disruptor delivers a throughput of about 40 to 80 million events per second at latencies of 1200 nanoseconds, while Stealers does 2.7 billion events per second and latencies of 4200 nanoseconds. Disruptor uses mechanical sympathy, thread affinity while Stealers doesn’t use any hardware specific optimisations and is able to perform better solely based on a superior algorithm. Disruptor also uses a bounded cyclic buffer while Stealers uses unbounded system memory limited by the physical memory installed on the machine. Also, Stealers is a configuration-less system, so to build an application you just need to add the Stealers library and build around it without worrying about affecting anything else.
Storm and Yahoo S4 can be categorised as stream processing systems while Stealers is a general purpose concurrent programming framework. However, Stealers is not an actor framework like akka, and is not just a concurrent programming core implemented in Java. It is a high performance algorithm that doesn’t use anything specific from the Java runtime and can be easily implemented in any language.
Q: What's next for Stealers? And for your group?
A: Stealers has been divided into two API compatible frameworks, in which they have the same API but different back-end implementations. The first high throughput implementation does 2.7 billion events per second at 20-100 milliseconds 99.99% latency. The second low latency implementation does 100 million events per second at 4300 nanoseconds 99.99% latency.
Stealers as a core framework and algorithm is fully tested and stable. The next steps are to build a distributed system around it like Storm, Dempsy, S4 and HStreaming, which can then be quickly used by engineers to build applications on top of it. In addition, we would implement the core Stealers framework in other languages like C/C++, C#, Ruby, Python and others, depending on what our product teams and their customers require.
Stealers provides a low latency and high throughput general purpose concurrency framework. It is the building blocks of concurrent, low latency and high throughput applications. Where it will go next is to build a distributed, failover, reliable and an elastic dynamically configurable system around it.