Clojure is a Lisp-like programming language that runs on the JVM. Ring is a web application library that provides a simple framework for serving HTTP content with clojure. It is similar to Rack for Ruby or WSGI for Python.
Content Delivery Networks (CDNs) cache your content around the globe to reduce latency and improve performance for end users. It is a very powerful tool and can be leveraged by any web site. Historically, CDN providers have been very expensive and were not practical for most people. Now things have changed and there are many CDN providers that cater to cloud customers who want pay-as-you-go service.
Arduino is an open source prototyping platform for electronics. There is so much you can do with Arduino and the community is proof. In playing with Arduino I decided that it would be a great project to create a small multitasking library for use on AVR platforms, which includes Arduino.
In a previous post I talked about the need to shard Redis data and how I accomplished this by adding shard/hashing support to erldis, an Erlang client for Redis. This solution proved to work well, it distributed our data very well amongst many Redis servers – but there was one problem. Performance.
In the change I made to erldis, the hash ring was stored in ETS (an erlang memory store) and anytime a key was hashed, the ring had to be retrieved from ETS. The problem with this is that Erlang copies the entire ring when it comes out of ETS. The ring we used was very large, with thousands of items. Copying this every single time became a huge performance hit.
To mitigate the performance problems, I decided to implement the hash ring in C, and write an Erlang driver to use it. That is how libhashring was born. It is very high performance and currently has bindings for Erlang and Java. It is currently deployed in our production environment and its speed is incredible. I am confident that increasing the size of the ring as we add capacity will not cause an impact to the hash ring’s performance.
libhashring supports MD5 and SHA-1. MD5 seems to be about 25% faster than SHA-1, so if you want the extra performance, MD5 is probably the best bet.
Feel free to fork libhashring and make it even better, I’d be really happy to get some feedback and contributions.
Thrift provides a great framework for developing and accessing remote services. It allows developers to create services that can be consumed by any application that is written in a language that there are Thrift bindings for (which is…just about every mainstream one, and more).
This is great for systems that are heterogeneous – for example, you could write a user authentication service in Java, but call it from your Ruby web application.
Thrift manages serialization of data to and from a service, as well as the protocol that describes a method invocation, response, etc,. This is great because instead of writing all the RPC code – you can just get straight to your service logic. Thrift uses TCP (not sure if UDP is/will be supported) and so a given service is bound to a particular port.
As you start to scale an infrastructure with Thrift services, you may find, as I have, that putting all of your Thrift server IP/port combinations in a configuration file that your clients read is just not…the best. If you have an environment where Thrift servers go down for maintenance (or crash), or you add more capacity on the fly, a dynamic way to manage the state of all of the services is needed.
Enter Apache ZooKeeper, a distributed, fault tolerant, highly available system for managing configuration information, naming, and more (distributed synchronization, anyone?).
ZooKeeper appears like a filesystem to clients, it is a hierarchy of znodes, which are analogous to directories or files, both of which can contain a small amount of data.
ZooKeeper can be used to store Thrift service location information, allowing clients to dynamically discover Thrift services. ZooKeeper even provides a way to create ephemeral znodes, which means that once your Thrift service goes down, it will be removed from ZooKeeper automatically. And if that isn’t cool enough, ZooKeeper even supports watches, where clients can ask to be notified whenever a znode changes. This means that clients can begin using new Thrift service capacity instantly, and when failures happen, clients will stop attempting to contact a down Thrift service.
Laying out your Thrift services in ZooKeeper is important. Clients will need to know about the layout when performing service discovery. For example, you can do something such as:
Clients can get a list of all znodes at /services/user to receive the list of servers for the user service. The only problem is…getting the list of znodes is only half the battle. user0000000001 doesn’t really tell you how to access the user service on that node. This is why its important to store some kind of service location data with the znode.
ZooKeeper allows you to set the data for a znode, so when a node comes up, it just needs to also set data at the znode that describes how to locate or access the service. I’ve adopted to use a URI as the znode data – this is very flexible and easy to parse and read from most languages. For Thrift services I am using a URI that uses a thrift scheme:
The URI can be easily adapted to your application, such as adding query string parameters with any extra custom metadata.
Thrift is great, and with ZooKeeper, its even better. I’m in the process of implementing this integration now and am looking forward to all of the benefits it has to offer. I would love to hear any feedback about this approach if anyone has personal experience, or just some good ideas. What is everyone else using to solve this type of problem?
Oh and also, if ZooKeeper isn’t your thing, you should definitely check out Doozer. It has very similar features to ZooKeeper, and although new, it is definitely on my list of projects to watch. Oh, and did I mention that it is written in golang?
Doozer is a highly-available, completely consistent store for small amounts of extremely important data. When the data changes, it can notify connected clients immediately (no polling), making it ideal for infrequently-updated data for which clients want real-time updates. Doozer is good for name service, database master elections, and configuration data shared between several machines.