Kolab Enterprises is doing some really cool and unique things leveraging Riak KV. We were able to sit down with Aaron Seigo, the Senior Technologist and Evangelist at Kolab Systems to discuss their Riak implementation.
Kolab, with its focus on security and scalability, is used by fortune 50 companies, governments and SMEs alike around the world for email, calendaring, file cloud access and a host of collaboration focused applications. Kolab Systems is the company behind the open source Kolab Enterprise collaboration suite. They are also the primary sponsor for the Kolab community edition and other open source software such as the Roundcube webmail package.world for email, calendaring, file cloud access and a host of collaboration focused applications. Kolab Systems is the company behind the open source Kolab Enterprise collaboration suite. They are also the primary sponsor for the Kolab community edition and other open source software such as the Roundcube webmail package.
Kolab Enterprises are doing some really unique things with Riak KV. The primary use case with Riak KV is for integrating data loss prevention systems to store groupware object histories in real-time for later auditing and rollback . We asked Siego to tell us a bit about the details of Kolab’s Riak KV deployment.
Kolab is using Riak KV as the NoSQL database for the new data loss prevention (DLP) system we rolled out this year for Kolab Enterprise. Unlike most data loss prevention systems tacked onto groupware servers, Kolab’s DLP system is designed for security, auditing and business intelligence. This translates into requirements that the date layer is robust, scalable and can be deployed in a cluster with relative ease. Riak KV fit that bill very well.
In a standard deployment, Kolab’s DLP uses a Riak KV cluster with a minimum set of five nodes spread across physical servers. This gives us a high degree of resilience against individual systems failing as well as the scalability we require. Unlike traditional databases where clustering can present significant development hurdles, Riak KV delivers this out of the box with amazingly little tweaking necessary for a production deployment.
As a bonus, our developers are able to run the same basic setup of Riak KV nodes in just minutes on their development machines, allowing development to mirror deployment in a very natural fashion. This has been very useful for ensuring quality during the development cycle.
We understand that it was a bit of a journey to Riak KV. What technical problems/challenges did you need to solve? Had you been trying to solve this with another product or solution and if so why wasn’t it working?
A variety of options were looked at when we started spec’ing the DLP system. Traditional SQL databases were dropped from consideration due to scalability concerns and the various “gotchas” around effective clustering.
Keeping in mind that Kolab is designed as a scalable set of microservices, and supports advanced features such as mirroring across geographically disparate data centers, we needed a database that could keep up with that.
Then there was the twin requirement of security. This is achieved (in part) by separating the storage of groupware object histories from the rest of the Kolab system, and controlling access to it through two carefully managed entry points behind which the key/value store itself sits. Those endpoints consist of a service that translates events in real-time into object histories in the key/value store (an internally visible write-only system), and a RESTful web service that provides access to those timelines and provides ways to restore from those timelines without losing any of that history.
So we needed a database that was robust against system failure; could scale out to keep up with Kolab installations that scale from dozens of users to millions of users; and could keep up with the real-time processing requirements of building those timelines, even with tens or hundreds of thousands of users hitting it simultaneously. Riak KV was the solution that we found to be the most natural fit with those demanding requirements.
On top of all this, we need to deliver enterprise-grade support across our stack, and that meant needing a partner that could provide that level of support for the key/value store at the center of the Kolab DLP system. This point in particular was what removed a number of other key/value stores from the table very quickly.
Of course, this all needed to be available as open source, as many of our clients require that and Kolab Systems itself is committed to fully open solutions. Riak KV’s open source versions fit that bill very well.
What was it about Riak that convinced you that we were the right choice?
One eye-opening moment was when we had development systems mirroring our deployment plan within minutes, allowing our developers to work against a real-world like system right on the development machines.
Another was when we tested the scalability for our needs and found Riak KV easily kept up. When we started looking into how we could branch out to full text search, Riak KV’s integration with Solr ticked that box. The robustness was another box-ticker.
What really sealed the deal, however, was Basho’s professionalism and responsiveness to requests. We feel confident walking into governmental deployments serving 10s of 1000s of users, such as our recent deployment project for the city of Munich, with Basho as a partner.
Can you quantify or qualify the business benefits you have experienced since implementing Riak KV?
The need for a specialized “DBA” evaporated by going with a key/value store, and Basho’s excellent technical feedback let us move extremely quickly in deployment. We were able to get pointers on performance tweaks specific to our needs, with Basho engineers looking over our application design with us.
It is also far easier to sell a product such as data loss prevention when the storage system is built from the ground up for robustness and scalability. Being able to show the customer the benefit of a flexible and tried-and-true product like Riav KV is a real benefit when it comes to sitting down with potential customers.
What advice would you give anyone struggling with their database requirements both as to key insights as to how to evaluate as well as how to implement?
Find the developers behind the system and interact with them. They will know better than anyone what the limitations are, and how to get the most out of the product. If you can’t get solid engineering answers in a reasonable time frame, I would recommend looking elsewhere quickly.
That said, key/value stores are not for every use case out there. Thankfully Riak KV comes with a huge amount of documentation that really helped us when it came to evaluating whether it fit our requirements or not. RTFM, as they say.
Finally, measure. That’s an often repeated, but too often not followed, mantra in software development. We set up a cluster of VMs on enterprise class hardware to push, and measure, Kolab’s DLP system as it went through development. The workloads were (accurately!) modeled on real world use cases that our clients face every day. Only with such real-world measurements in hand can you know with certainty what sort of database your application requires. Unfortunately, I see time and time again projects that fail to do this and end up with a storage system poorly suited to the application; too often they end up throwing away expensive development and Q/A investments when they are forced into replacing their storage system after earlier versions were released. Measure twice, cut once.
Can you share any plans on how you see Riak KV fitting into additional areas of your business in the future?
There are two areas we are currently investigating which could involve Riak KV. The first is our push into business intelligence. Using those same groupware object histories Kolab uses for data loss prevention, we will provide tools to data mining that user activity (individually and in groups) to find out the realities of your business activities. Being able to see which people collaborate the most, organize meetings with other; track keywords and project titles; see how teams share files with each other; etc. can open new windows to how a large organization is actually working.
The other place we’ve only just started exploring is an advanced file cloud solution to augment the current Kolab Files system. Similarly with our path to Riak KV, we’ve trialed a number of products on the market, and plan to examine Riak S2 in more detail this year.
To see the partnership of Basho’s Riak KV and Kolab Enterprise in action together, come see us in Munich at the TDWI European Conference 22-24th June. We’ll be in a booth showing both Riak KV and Kolab Enterprise, and will be happy to answer your questions!