Victor Castell

Sat, Aug 1, 2015

Dkron - Distributed job scheduler

Over the last few months, I’ve been writing an open source program to run scheduled jobs that could work as a replacement for cron.

There’s some literature on why the venerable and battle tested cron system service is not appropriate for many use cases read for more info.

Me and my co-workers at my current job at Jobandtalent, realized that we were suffering of this weaknesses ourselves.

We’ve tried some strategies over the time, like having the cron jobs centralized in a single server only dedicated to this role and some other approaches, but we never came to a solution that allowed us to get rid of the single point of failure problem.

Some time ago, I found this Google whitepaper, Reliable Cron across the Planet, in this paper Štěpán Davidovič, Kavita Guliani from Google, analyze cron service weaknesses and it’s impact on a distributed application system architecture, they also go through some possible ways to address the problem.

At the same time I found Airbnb’s Chronos a cron replacement that has a lot of nice features, the downside of Chronos is that it needs an Apache Mesos cluster to run, I mean, it’s built on top of Mesos and we don’t use mesos at Jobandtalent as a lot of other companies don’t use it either.

I started to investigate how we could run a job scheduler that automatically failover, that have a nice UI, and it’s easy integrable with our current platform.

Serf is a small and well written piece of software by the guys at Hashicorp (they know what they do!), that allows to send arbitrary commands to clusters of machines of any size, with failure detection and a good security layer.

On the other hand, etcd provides a key-value data store needed to handle configuration and it’s fault-tolerant using the Raft consensus algorithm, quite mature at the time of writing and most important, it provides primitives to build distributed systems on top of it, kind of leader election building blocks.

Dkron was born as the glue for these components with some added features on top of it.

Before you say, - hey, this shit doesn’t work! - you must know that it’s a very young project and a work in progress right now, still far from production ready.

Hope you find it useful.

dkron.io