Background Jobs
A common approach to this problem is to use background workers like Resque or Sidekiq. For the problem at hand, these are fine and somewhat more suitable. The only problem I have with that is:- The logic of sending email lives in our application that does not necessarily care about email.
- I will probably duplicate the process of communicating to my SMTP server through a few applications within the architecture.
- Background workers know a little too much about their origin, i.e. what models they came from, what they can access (my whole app stack).
Beanstalkd
Hopefully now you are getting the gist of why MQs are awesome. There are a few open source MQs available, most notable are RabbitMQ (there is a nice article on RubySource with details) and my personal favourite and what we will be using today Beanstalkd. Getting started with Beanstalkd really couldn’t be simpler. On OSX, you want to use homebrew (brew install beanstalkd
) or for debian linux flavour you can use sudo apt-get install beanstalkd
. It seems pretty well supported by most package mangers across platforms. You can see the details on the Beanstalkd download docs.
Once installed, you can open the terminal and execute beanstalkd
. This will startup a Beanstalkd instance using its default port 11300
on localhost in the foreground. Not always ideal to run it in the foreground, so my typical command looks something like:
beanstalkd -b ~/beanstore &
This simply persists the queue data in a binstore under the directory ~/beanstore
instead of just memory and runs the process in the backgound (the ampersand). For development, these settings are fine. When it comes to production, I would suggest you have a read of the docs pertaining to the admin tool that ships with Beanstalkd.
Beanstalkd Lingo
Beanstalkd has some nice vocabulary for describing the main players and operations. Let’s walk through them.Tubes
A tube is a namespace for your messages. A Beantstalkd instance can have multiple tubes. On a vanilla boot, Beanstalkd will have a single tube nameddefault
.
The idea is you wish a certain process to listen to messages coming in on a specific tube. As mentioned, tubes just act as namespaces for the consumers of the queue.
Jobs
The Jobs are what we are placing in a tube. It’s common for me to place JSON in a tube and marshall that at the other end. Beanstalkd doesn’t really care about the content of the job, so things like YAML, plain text or Thrift would be just fine. In a normal, happy path operation, jobs have 2 states:- Ready – Waiting to be processed.
- Reserved – Being processed
OM NOM NOM
Now we have Beanstalkd running on our development boxes, we want to get some jobs in the queue. To achieve that, my usual weapon of choice is the Beaneater gem. Getting a job into a tube is as simple as:require 'beaneater'
require 'json'
beanstalk = Beaneater::Pool.new(['localhost:11300'])
tube = beanstalkd.tubes['my-tube']
job = {some: 'key', value: 'object'}.to_json
tube.put job
And that is it. Now we get to the interesting bit, consuming the tube and all the jobs who live there.
I am a big fan of a daemon process handling that. If the tubes start getting too full, we can spin up more daemons to help clear the backlog of jobs. Of course, we can also kill them off as required.
So far I have used the Dante gem for wrapping scripts into daemons. It seemed a bit lighter than Daemon Kit and I like to keep my daemons from getting bloated. The benefit of using Dante over something like ruby script/my_mailer_script.rb
for me is nothing more than Dante gives you Process ID (PID) file generation out the box. With that, I can keep the daemons in check with monit.
Beaneater provides a really nice API for consuming jobs in 2 ways. The first is manually stepping through the process of reserving a job, working on it, then deleting if it completes correctly or burying if an exception is raised. It looks something like this:
beanstalkd.tubes.watch!('my-tube')
loop do
job = beanstalk.tubes.reserve
begin
# ... process the job
job.delete
rescue Exception => e
job.bury
end
end
A couple of things here worth mentioning. Yes, I’m using an infinite loop and the reserve
method on the tube will actually sit and wait for a job to be “Ready”, reserve it, and continue.
Beaneater provides a better interface for long running tasks and the above can simply be condensed into:
beanstalkd.jobs.register('my-tube') do |job|
# ... process the job
end
beanstalkd.jobs.process!
This method wraps the behaviour (albeit in a much better way) of the previous example, reserving, processing, then deleting or burying based on the outcome.
No Magic Beans
The beauty of Beanstalkd is its absolute simplicity. There is really not much more I would be willing to dive into as an introduction. In terms of getting things running quickly, it is no more complicated than any of the background worker solutions discussed earlier. It does make sense to be pragmatic in your adoption of MQs, to be honest. Resque, Sidekiq etc. all have their place and work very well, but Beanstalkd addresses a few more problems, namely, interfacing between services which may or may not be written in Ruby (.NET clients for Beanstalkd are available). In fact, the entire thing is completely language agnostic. The neckbeard way of communicating with beanstalkd is via it’s own protocol over TCP. The Beaneater gem, as you will probably know, abstracts all that protocal stuff into a well packaged API for us. It is safe to say I’ll be leaning on Beaneater gem when using Beanstalkd for some time to come. If I had any advice on designing/composing tube consumers, stick to the Single Responsibility Principle (SRP) as much as possible. There will come a time when you will have to kick a buried job. If that job writes to a database AND sends an email, what happens when the sending of the email blows up? Replaying said message will result in a duplicate database entry. By splitting the processing of the job into the smallest responsibilities that are reasonable, the less you have to worry about performing duplicate actions. I really urge you too look to Beanstalkd as your application architecture grows. In personal experience, I have found it simple to get running, straightforward to manage and maintain, and the ruby client via Beaneater is one of the better interfaces I have used.Frequently Asked Questions (FAQs) about Beanstalkd
What is the main difference between Beanstalkd and other job queue systems like RabbitMQ?
Beanstalkd is a simple, fast work queue service that is designed to improve the distribution of jobs among multiple workers. Unlike RabbitMQ, which is a more complex message-broker system, Beanstalkd focuses on providing a minimalistic approach to job queuing. It doesn’t support advanced features like routing, persistence, or replication, but it excels in its simplicity and speed. It’s easy to set up and use, and it’s perfect for scenarios where you need a straightforward job queue without the need for complex configurations.
How can I install and use Beanstalkd?
Beanstalkd is easy to install and use. You can install it using package managers like apt-get for Ubuntu or brew for macOS. Once installed, you can start the Beanstalkd service and begin using it. You can interact with Beanstalkd using various client libraries available in different programming languages like PHP, Ruby, Python, and more. These libraries provide an interface to create jobs, assign them to the queue, and process them.
Can Beanstalkd handle large-scale applications?
Yes, Beanstalkd is designed to handle large-scale applications. It’s a lightweight and efficient job queue system that can manage thousands of jobs without any significant performance degradation. It’s used by many large-scale web applications to distribute jobs among multiple workers efficiently.
How does Beanstalkd ensure job reliability?
Beanstalkd ensures job reliability through its job lifecycle management. When a job is created, it’s placed in the “ready” state. A worker can then reserve the job for processing. If the job is processed successfully, the worker can delete the job from the queue. If the job fails, the worker can release it back to the queue or bury it for later inspection. This lifecycle management ensures that no job is lost in the process.
What are tubes in Beanstalkd?
Tubes in Beanstalkd are essentially named job queues. When you create a job, you can specify the tube it should go into. Workers can then watch specific tubes for jobs. This allows you to categorize and prioritize jobs based on their tubes.
How can I monitor Beanstalkd?
You can monitor Beanstalkd using the built-in admin interface, which provides information about the current state of the server, including the number of jobs in each state and statistics about each tube. There are also third-party tools available that provide more advanced monitoring capabilities.
Can I use Beanstalkd with Docker?
Yes, you can use Beanstalkd with Docker. There are Docker images available for Beanstalkd that you can use to run it in a Docker container. This can simplify the deployment and scaling of Beanstalkd in a containerized environment.
How can I troubleshoot issues with Beanstalkd?
Beanstalkd logs errors and important events to the syslog, which you can check for any issues. If a job fails, it can be buried for later inspection. You can then kick the job back to the queue once the issue is resolved.
Is Beanstalkd actively maintained?
Yes, Beanstalkd is actively maintained. It’s an open-source project with a community of contributors who regularly contribute to its development and maintenance.
Can I contribute to Beanstalkd?
Yes, as an open-source project, Beanstalkd welcomes contributions from the community. You can contribute by reporting issues, submitting pull requests, improving documentation, and more.
Dave is a web application developer residing in sunny Glasgow, Scotland. He works daily with Ruby but has been known to wear PHP and C++ hats. In his spare time he snowboards on plastic slopes, only reads geek books and listens to music that is certainly not suitable for his age.