gitorious-org-setup
Table of Contents
The setup on gitorious.org
As you probably know, gitorious.org runs the exact same version of the Gitorious mainline repository as distributed on gitorious.org. The setup we're running on those servers may be a bit more complex than what you need to setup yourself, but in case you're curious or plan to operate a Gitorious site with hundreds of thousands of users, this chapter is for you.
Deployment
We use Capistrano to deploy to the gitorious.org servers. We keep our Capfile and deploy.rb in a separate Git repository, and deploy from that repository.
The configuration files for the gitorious.org servers
(database.yml, gitorious.yml etc) are kept in this repository and
pushed (via Capistrano's upload task) to the app/shared
directory on the server after the code is updated; which in turn
are symlinked into app/current/config. The rest of the deployment
process is fairly standard; but we have added Capistrano tasks for
starting/stopping/re-indexing Thinking Sphinx, starting/stopping
Resque workers etc.
Our Sphinx tasks look like this:
namespace :sphinx do desc "Configure Thinking Sphinx" task :configure, :roles => :app do run "sudo #{current_path}/bin/rake ts:configure" end desc "Update Index" task :update_index, :roles => :app do run "sudo #{current_path}/bin/rake ts:reindex" end desc "Stop Sphinx" task :stop, :roles => :app do run "sudo #{current_path}/bin/rake ts:stop" end desc "Start Sphinx" task :start, :roles => :app do run "sudo #{current_path}/bin/rake ts:start" end desc "Restart Sphinx" task :restart, :roles => :app do run "sudo #{current_path}/bin/rake ts:restart" end end
While the Capistrano recipes for controlling Resque are quite
minimal (they simply call Upstart's status command, since our
Resque workers are managed by Upstart:
namespace :resque do desc "Restart the resque workers" task :restart, :roles => :app do run "sudo /sbin/restart gitorious/resque-worker" end desc "Status of resque workers" task :status, :roles => :app do run "/sbin/status gitorious/resque-worker" end end
The restart task we use is a bit special. Since we use Unicorn
(more on that later), we don't do the touch tmp/restart.txt
maneuver, and we want to reindex the search index after
deploying. In our previous setup we ran the indexing from
Capistrano, which caused some really long-running deployments with
a lot of output. Our current restart task emits an Upstart event
which triggers a post-deploy Upstart task to run on the
server. The last action performed is to send a USR2 to the Unicorn
master, which results in reloading the server. When the
post-deploy process has ended on the server, a deployment report
is sent by email to us with a result of what happened (still work
is still not 100% done).
Web servers
Our web/app server setup looks like this:
Varnish
We run Varnish for caching on gitorious.org. Varnish is basically a zero-config setup, and will do wonderful things to the responsiveness of your app provided you take care of two things:
- Any request with a
Set-Cookieresponse header will not be cached by Varnish - As long as the
Cache-Controlresponse header is set to public, Varnish will cache the request for as long as specified by themax-ageparameter.
Varnish is set up to handle port 80 (HTTP) on our servers, and is set up with a single backend: the private Nginx server mentioned below. This means that Varnish will cache as much as it can of any requests on port 80 of gitorious.org.
Nginx
We run Nginx on port 443, since Varnish doesn't run SSL. The
server running on gitorious.org:443 will serve any static files
directly from Rails.root on the server, and proxy any other
request to the public Varnish server on port 80. By sending these
requests through Varnish, we get to use the same cache for HTTP
and HTTPS.
Nginx is also set up to listen on a private port, where it receives requests from (only) Varnish. Like the HTTPS Nginx server, this will deliver any static assets directly, and pass all other requests over a UNIX socket to Unicorn.
Nginx is also set up to deliver sending of other files,
intercepting the X-Accel-Redirect response headers emitted by
Gitorious; equivalent of Apache modxsendfile's X-Sendfile
headers. To enable this, we have frontend_server:nginx in the
gitorious.yml file on gitorious.org, and the configuration in
Nginx looks like this:
# Will deliver /srv/gitorious/tarballs-cache/filename.tar.gz location /tarballs/ { internal; alias /srv/gitorious/tarballs-cache/; } location /git-http/ { internal; alias /srv/gitorious/repositories/; }
If a user requests
https://gitorious.org/gitorious/mainline/archive-tarball/master
Gitorious will (once the tarball has been generated) respond with
an X-Accel-Redirect header like
/tarballs/gitorious-mainline-$sha1.tar.gz ($sha1 is which SHA1
the master branch points to at request time), which is picked up
by Nginx by the first rule above. Nginx will resolve this to the
file /srv/gitorious/tarballs-cache/$sha1.tar.gz and deliver this
file directly.
The /tarballs/ locations are marked as private in Nginx, which
means a user isn't allowed to request them directly. Using Apache
with mod_x_sendfile the X-Sendfile header would contain the
full path to the repository, while Nginx lets us maintain a
symbolic mapping resolved by Nginx itself.
The same mechanism is used for Git over HTTP.
Unicorn
Unicorn is a Ruby based HTTP server leaning heavily on fundamental UNIX concepts. Unicorn works by starting a master process which loads the full Rails environment. Once this is done, it will run fork(2) to create 16 child processes (this is how many workers we have running on gitorious.org). These child processes will inherit the socket set up by the master process, which means the kernel will take care of load balancing the requests between the active worker processes.
Unicorn is designed for chaotic situations, like the one we have on gitorious.org. An IO intensive application like Gitorious will run into problematic situations caused by things like IO load all the time, and our previous setup (Apache and Passenger) would end up with some really CPU and memory hungry processes running for a long time. Our Unicorn setup has a strict timeout of 30 seconds for any request, which means that any request that takes more than 30 seconds to complete will cause the worker process to be killed. And once the worker is killed, the master will immediately fork again, with the new child process ready to serve requests right away.
Like the good UNIX citizen Unicorn is, the easiest way to communicate with it is using signals. We use the following signals on gitorious.org:
- We send a USR2 to the master process after deploying a new
version of Gitorious. This causes the master process to spawn a
new master process; using the newly deployed code. Once the new
master is started, it looks for a PID file for the "old" master
process in
pid_dir/unicorn.pid.oldbin. If this file exists, it sends a QUIT signal to that, which causes it do shut down itself and all its worker processes. This gives us a zero downtime deployment, which is a big deal for us. - We send a USR1 to the master process after rotating the logs
(done by
logrotate). This causes the master and worker processes to reopen the log files.
The Unicorn configuration file we use on gitorious.org is
practically identical to the one in Gitorious mainline, except we
use a full path in RAILS_ROOT since expanding a relative path
would resolve to Capistrano's app/releases directory.
Git over HTTP (New!)
As you may be aware, Git supports two kinds of Git over HTTP. The "dumb" protocol, which basically creates an HTTP request for every commit, and "smart" HTTP which uses HTTP as a carrier for git-upload-pack. The git-scm.com site has all the details.
Gitorious ships with a simplistic version of the former, a Rails
metal which uses Nginx' X-Accel-Redirect or Apache's
X-Sendfile to deliver Git objects directly from the file
system. This method is inefficient and somewhat error-prone, but
after setting up the new servers last week the stability of this
solution was not near good enough.
Friday afternoon we set up another backend for Git over HTTP, using a Jgit based servlet for doing Git over HTTP. I ran it using Maven for a few hours, after setting up Varnish to proxy traffic to git.gitorious.org to this Java app. This speeded up HTTP cloning a lot, so we decided to run it through the weekend.
After running it for around 48 hours the Jetty server was using around 5 GB of (resident) RAM, having served around 80.000 git clone operations through the weekend.
The app we're running for this is a single Java webapp, and the source code is at https://gitorious.org/gitorious/jgit-http-server. We're working on a push+pull implementation in Jruby, also using JGit, but we really like the way Java and Jgit works for us.
Message queue and consumers
gitorious.org has been using Apache ActiveMQ since 2009, and we
have not had a single problem with using that. No messages dropped,
no crashes, no problems at all. The ActiveMessaging Rails plugin
we've been running with, however, has never worked really
well. Some considerable memory leaks forced us to use Monit to kill
script/poller processes consuming more than a few hundred
megabytes of RAM, and killing these processes has often led to
zombie processes on the server; potentially even zombies still
connected to ActiveMQ.
When setting up the new servers for gitorious.org we chose to go with Resque instead, which has been supported in Gitorious for a year or so. Resque uses the Redis key-value store as its queue. Resque works similarly to Unicorn by setting up a master worker polling for new messages from Redis and forking a child process to process each message. Once the child is done processing it exits, which means we don't leak memory.
Switching to Redis/Resque is done in a few simple steps:
Install Redis
On Ubuntu/Debian servers:
sudo apt-get install redis-server update-rc.d redis-server defaults sudo service start redis-server
On RHEL/CentOS-like systems:
sudo yum install redis sudo chkconfig redis on sudo /etc/init.d/redis start
Configure Gitorious to use Resque
This is a simple setting in gitorious.yml:
messaging_adapter: resque
Restart the app server
This depends on which server you're running. If you're using Passenger:
touch tmp/restart.txt
If you're using Unicorn
kill -USR2 /path/to/unicorn.pid
Start a worker
The bin/rake script shipping with Gitorious will run a rake task
from anywhere, setting up the correct RAILS_ENV, HOME
environment variables and ensuring the task is run as the user
specified as gitorious_user in gitorious.yml, and Resque
workers are run with Rake:
QUEUE=* /path/to/gitorious/bin/rake resque:work
To run dedicated workers for single queues, change the QUEUE
environment variable, eg.
QUEUE="/queue/GitoriousPush" /path/to/gitorious/bin/rake resque:work
Since the bin/rake task can be called directly, we simply added
an Upstart script with an exec stanza (no shell required) to
control the Resque workers:
description "Run a Resque worker on all queues" author "Marius Mårnes Mathiesen <marius@gitorious.com>" start on started gitorious/unicorn stop on runlevel [06] env QUEUE=* env PIDFILE=/path/to/gitorious/pids/resque-worker1.pid exec /path/to/gitorious/bin/rake resque:work
Init scripts and process babysitting
We're still a little on the fence with regards to
babysitting/monitoring processes. Our experience with
ActiveMessaging has made us set up Monit, but we're not using it
yet. We start all the services using some really simple Upstart
scripts. This was the main motivation for shipping the bin/
scripts with Gitorious, since these set up everything themselves we
don't need to spawn a shell to start them (eg. to set up
environment variables, dropping privileges etc.). Spawning a shell
would confuse Upstart, which relies on counting fork calls and
keeping track of PID files.
In particular, the way Unicorn is used for hot deployment would
lead Upstart to try to track the PID of the old master once a new
master was started. Instructing Upstart to respawn Unicorn would
get us into trouble when using the USR2 technique to reload
Unicorn.
Monit keeps track of PID files, which would work better with Unicorn.
Git proxying
We run a stack of native git daemon processes listening on port 9400 on the servers, and have set up Gitorious' git-proxy script to proxy requests to these (this proxy will translate the incoming paths to the paths on the file system before passing them on to the native git daemons). The git-proxy process listens on 127.0.0.1:9418.
We've set up HAProxy in front of the git-proxy process, listening on the public interfaces (gitorious.org:9418, ssh.gitorious.org:443 and 2a02:c0:1014::1:9418). Running haproxy in front of these may not be strictly necessary, but we found it easier to set up the public facing addresses/ports to listen to in the HAProxy configuration; and we're a little more comfortable running HAProxy to the public as it gives us fine-grained control over server/client timeouts.
Again, we used Upstart to start the git:// protocol handlers, since Upstart lets us specify the dependency between them. Our git-daemons Upstart recipe is set up like this:
start on started gitorious/unicorn
which means it's started once the Unicorn process is running. The Upstart recipe for our git-proxy, which requires the git-daemons to be running, is like this.
start on started gitorious/git-daemons stop on stopped gitorious/git-daemons
This way the native git daemons will be started as soon as the web app is ready, and the git proxy will be started once the git daemons are ready.
