Seeding your rails database

I’ve been working on a project that requires a good deal of test data to verify functionality. It involves train timetables with many lines, trains, stations etc. To help in uploading data for testing and development I’ve been using the rake db:seed command.

rake db:seed RAILS_ENV=development


rake db:seed RAILS_ENV=test

For the record I’m using a postgresql database but the use of the seeding implementation is database neutral. It will use the contents of the config/database.yml file to connect to whatever target environment you specify.

One of the problems with loading seed data is when dependencies exist between tables. You need to be able to identify records from table A to be able to create appropriate joins in table B. From a testing perspective there’s fixtures and factories. All well and good and each serves its’ purpose adequately. I wanted to build up a file that can be used to populate a database from scratch with the ability to add new records as your table set grows. I was also able to use it to sanity check my data model and joins as the project continued.

Here’s how the seeding works.

The relevant file is the seeds.rb file in your db folder.

To create single records you won’t need to refer to later in the seeding process, use something like this.

#firstly delete any existing data
#now build up an array
users = [
  {email:'', password:'@dmin123', password_confirmation: '@dmin123', admin:true, confirmed_at: '01/01/2011'},
  {email:'', password:'user123', password_confirmation: 'user123', confirmed_at: '01/01/2011' }

#now process the array using an iterator
users.each { |user| User.create user }

To create rows to which you can refer to later on (such as when establishing a table join).


@frankston_direct_line=Line.create({name:'Frankston Direct'})
@frankston_loop_line=Line.create({name:'Frankston Loop'})

You can now refer to id of these records using the syntax

So now I create a few train stations…


@aircraft=Station.create(name:'Aircraft' ,latitude:-37.866689 ,longitude:144.760795)
@alamein=Station.create(name:'Alamein' ,latitude: -37.86862  ,longitude: 145.08002)
@altona=Station.create(name:'Altona' ,latitude:-37.867231 ,longitude: 144.829609)
@armadale=Station.create(name:'Armadale', latitude:-37.85544, longitude: 145.018802)

#list cut short for brevity

Now I create the association between the lines and stations


sandringham_line_stations = [
  { line_id:, station_id:, ordinal:1, time:0},
  { line_id:, station_id:, ordinal:2, time:2},
  { line_id:, station_id:, ordinal:3, time:3},
  { line_id:, station_id:, ordinal:4, time:4},

  #list cut short for brevity

#now create all the LineStations iterating over the array
sandringham_line_stations.each { |linestation| LineStation.create linestation }

So there you have it. The ability to apply full referential integrity at database seeding time using the power of db:seed.

Rails background tasks with Rufus Scheduler

I have a database that is populated based on events that happen in real-time. I wanted to view the output of reports from that database on a regular basis (every 30 seconds). I found a lovely little gem called ‘rufus-scheduler’ that does the trick nice and neatly. Here’s how it worked for me.

Add the gem to your Gemfile

gem 'rufus-scheduler'

Update your bundle with bundle install

/path/to/my/app/$ bundle install

Create a file in your initializers folder. I’ve called mine task_scheduler. This file contains instructions to start the scheduled background process and on the tasks you want to run regularly.

The contents of mine are as follows:

scheduler = Rufus::Scheduler.start_new

scheduler.every("30s") do
   stats_direct ="Frankston Direct")

   stats_loop ="Frankston Loop")

   stats_loop ="Sandringham")

And that’s all there is to it.

More info on the gem can be found here

Capistrano Without Root Privileges

Given a user with sudo (but not root) access on a remote box, the following deploy.rb script will perform a capistrano deploy of a ruby application:


  1. You’re using ‘git’. Although svn can be used, the script targets a git setup.

  2. You’re using mongrel_cluster. Change the script accordingly if using passenger etc.

  3. The application being deployed is dropped into a subfolder/subdirectory on the remote server. You can remove the task :recreate_public_link if you’re deploying to the root of a virtual directory.

  4. A user and group called ‘mongrel’ has been created on the remote server. This owns the running mongrel_cluster processes. Relevant permissions are set by the script.

    # deploy.rb - controls deployment setup/configuration
    # using the capistrano or 'cap' deployment utility.

    requires mongrel_cluster recipes to allow restart of mongrel cluster
    require 'mongrel_cluster/recipes'

    set :application, "[application name]"
    set :user, "peter"
    set :web_user, "apache"
    set :location, "[ip address]"
    #If you are using Passenger mod_rails uncomment the following block:
    #if you're still using the script/reapear helper you will need these

    namespace :deploy do

    task :start do ; end
    task :stop do ; end

    task :restart, :roles => :app, :except => { :no_release => true } do
        run "#{try_sudo} touch #{File.join(current_path,'tmp','restart.txt')}"

    ssh_options[:forward_agent] = true

    default_run_options[:pty] = true
    set :scm, "git"
    set :scm_user, "git"
    set :repository, "#{scm_user}@#{location}:/usr/local/share/gitrepos/#{application}.git"
    set :scm_passphrase, "your password" #This is your custom users password
    set :git_shallow_clone, 1
    set :deploy_via, :remote_cache
    set :branch, "master"
    set :use_sudo, true
    set :site_root, "app/[application name]"
    role :app, location
    role :web, location
    role :db, location, :primary=>true
    set :deploy_to, "/var/www/html/[your test site url]/#{application}"

    mongrel details


    set :mongrel_conf, "#{deploy_to}/current/config/mongrel_cluster.yml"
    set :mongrel_user, "mongrel"
    set :mongrel_group, "mongrel"
    set :runner, nil
    set :mongrel_clean, true # helps keep mongrel pid files clean


    migration parameters


    set :rake, "rake"
    set :rails_env, "production"
    set :migrate_env, ""
    set :migrate_target, :latest

    before "deploy:update_code", "custom:set_permissions_for_checkout"
    before "deploy:migrate", "custom:set_permissions_pre_schema_dump"
    after "deploy:migrate", "custom:set_permissions_post_schema_dump"

    before "deploy:migrations", "custom:set_permissions_pre_schema_dump"
    after "deploy:migrations", "custom:set_permissions_post_schema_dump", "deploy:cleanup"
    before "deploy:symlink", "custom:get_current_ownership"

    after "deploy:symlink", "custom:update_application_controller",

    namespace(:deploy) do
        desc "Restart the Mongrel processes on the app server."
        task :restart, :roles => :app do
            sleep 2.5

    namespace(:custom) do
    desc "Change ownership of target folders and files to current user"
    task :set_permissions_for_checkout, :except => { :no_release => true } do
        chown of files to current user
        sudo "chown -R #{scm_user}:#{scm_user} #{deploy_to}"

    desc "Change ownership of target folders and files to current user"
    task :set_permissions_for_runtime, :except => { :no_release => true } do
        chown of files to current user
        sudo "chown -R #{web_user}:#{web_user} #{deploy_to}"
        sudo "chown #{mongrel_user}.#{mongrel_group} -R #{deploy_to}/current/tmp/pids"
        sudo "chown #{mongrel_user}.#{mongrel_group} -R #{deploy_to}/current/log"
        sudo "chown #{mongrel_user}.#{mongrel_group} -R #{shared_path}/pids"

    desc "Recreate link to serve public folders when hosting within subfolder"
    task :recreate_public_link do
        run <<-CMD
            cd #{deploy_to}/current/public && sudo ln -s . #{application}

    desc "Take temporary ownership of current folder to allow symlink updates"
    task :get_current_ownership do
        sudo "chown #{user}:#{user} #{release_path}"

    desc "Take temporary ownership of current folder to allow symlink updates"
    task :yield_current_ownership do
        sudo "chown -R #{web_user}:#{web_user} #{release_path}"

    desc "Change ownership of db folders and files to current user"
    task :set_permissions_pre_schema_dump, :except => { :no_release => true } do
        chown of files to current user
        sudo "chown -R #{user}:#{user} #{release_path}/db"

    desc "Change ownership of db folders and files to current user"
    task :set_permissions_post_schema_dump, :except => { :no_release => true } do
        chown of files to current user
        sudo "chown -R #{web_user}:#{web_user} #{release_path}/db"

    desc "Update application.rb to application_controller.rb"
    task :update_application_controller, :roles => :app do
        run <<-CMD
            cd #{deploy_to}/current/ && sudo rake rails:update:application_controller

    task :config, :roles => :app do
        run <<-CMD
            sudo ln -nfs #{shared_path}/system/database.yml #{release_path}/config/database.yml

    desc "Creating symbolic link (custom namespace)"
    task :symlink, :roles => :app do
        run <<-CMD
            sudo ln -nfs #{shared_path}/system/uploads #{release_path}/public/uploads

Moving from Subversion to Git

I have been using subversion for a number of years and it wasn’t until I had seen and sampled the simplicity and power of git that I decided once and for all to bite the bullet and migrate all my subversion repositories to git.

I’d done the same from cvs to subversion in the early 2000’s and still think the move worked out well with a bit of preparation.

An unfortunate side effect of subversion is that it takes a good bit of effort to set up and manage a new repository so I ended up with two somewhat monolithic repositories…

  1. archives
  2. projects

The ‘archives ‘ repository contained all old project work and the ‘projects’ one contained relevant or current project work.

The result of this was that each archive had a bundle of sub-folders, each containing a project in its’ own right. Although git can handle the concept of sub-modules it’s not the best way to structure your project. In fact, it was a pretty lazy to structure my subversion projects in the first place but convenience overcame system administrative chores at the time.

This article explains the following:

  1. Converting a monolithic subversion repository to a git repository
  2. Breaking out the new git repository into a set of discrete repositories.
  3. General backup process and scripts to copy the new repositories across to a backup system.

First off, a couple of conventions I keep are…

My server stores a central set of repositories with the name of myproject.git
My workstations work with their local version of repositories called myproject (no .git)
My server exports it’s .git repositories to a backup server using the same .git naming convention.
I have created a git.git on my server to run all git processes. File ownership is given to this user so other users can’t fowl things up at least without thinking about it first.

Step 1 – Migrating to git from Subversion

I store all my git repositories on a server in the directory


All of my subversion repositories are stored in


Even though each folder is accessible using standard paths and commands, I have to go through the subversion door using the same mechanism I would access it from a remote box, namely http://. Other people may use svn:// if that’s their way of working.

Migration is taken care of with a single command.

sudo git svn clone https://localhost/repository/projects --stdlayout --authors-file=/home/peter/authors.txt \
-t tags -b branches -T trunk /usr/local/share/gitrepos/projects.git/

An explanation of all the bits follow:

sudo – because I run everything as unprivileged user, sudo gives me the rights i need to create new files and folders.

git svn – this is the git subcommand that manages conversion of subversion repositories.

clone – make a copy of the repositories that I’m pointing at.

https://localhost/repository/projects – I access all my subversion repositories using secure http. This allows me to securely browse the content across the Internet and track things easily using Trac.

Depending on your setup, this parameter will contain the path to your svn repository,however you access it.

–stdlayout option tells git svn that my subversion repository is in the standard layout of trunk/branch/tags directories.

–authors file – this is a file I created by hand containing a list of all the people who committed project material in the past. It’d format is as follows:

peter= Peter Mac Giollafhearga
simon= Simon Shagwell <simon\’s email at his>

-t tags -b branches -T trunk – these values are for completeness and I’m not sure they are necessary given the –stdlayout option, but if you’ve called your branches, tags and trunks anything different, this is how you find it.

Once the command has completed you should find a projects.git folder has been created and navigating into it you will see all your nasty subversion sub-folders which you should have set up as individual repositories in the first place…tut tut!

Step 2 – Breaking out Git into baby gits

The structure of the new git repository is something like this.


The .git folder (you cans see it using ls -a) contains a list of all the git related material such as project history, revisions and tags.
Check you can see your history by typing

git log

The next task is to break the contents of the rather large ‘project’ folder into a git repository per project.
The tool for the job is a combination of the very useful git subtree command and a bit of custom shell scripting specific to this job.

The subtree functionality was written by Avery Pennarun and is hosted by the github site. You can download the git subtree command from the URL Installation instructions are on the same site so I won’t bore you with it here.

Once you have it installed you’re almost ready to roll. The following shell script has comments at the top to explain what it does. Save this to your favourite scripts folder and chmod it so it’s executable (chmod +x

Then cd into the folder you created earlier (in my case /usr/local/share/gitrepository/projects). Run the script and watch the output. Depending on the size of your projects it will take a bit of time to complete; mine took about 20 minutes.

#This script should be run from within a git repository folder that
#contains many child folders.
#It will create a branch for every subfolder it finds and a new
#top level folder for each.
#It then initialises a git respoitory, copying into the relevant branch

# Make sure only root can run our script
if [ "$(id -u)" != "0" ]; then
  echo "This script must be run as root" 1&gt;&amp;2
  exit 1

current_dir=`echo ${PWD##*/}`

for project in $( ls . )
  if [ -d $project ]; then
    project_lower=`echo $project.git | tr [A-Z] [a-z]`
    echo "performing subtree split..."
    git subtree split -P $project -b $branchname
    mkdir ../$project_lower
    cd ../$project_lower
    echo "initialising git"
    git init
    echo "fetching branch"
    git fetch ../$current_dir $branchname
    git checkout -b master FETCH_HEAD 2&gt;&amp;1
    cd ..
    echo "setting appropriate ownership"
    chown $git_user.$git_group -R $project_lower
    cd $current_dir

Once the script has completed, you should be able to see a new directory structure, something along the following lines.


Each folder is now its’ own git repository with it’s own internal version history, tags, branches etc.
cd into one of the folders and do a git log just to confirm you still have history.

The original projects.git can be removed (after backing it up for safety’s sake) if you’re satisfied everything is in place. We’re now ready to proceed with the next stage of the game, namely backing up our new git repositories to a remote share.

Step 3 – Backing up Git

I use a 2TB removable disk as a central data server for sharing files throughout my network. My thinking is that should the place ever burn down and I have the opportunity, there’s only one box I need to grab before rushing out the door. Of course, I burn frequent snapshots of this box to DVD.

I have created a ‘backups’ share on the disk which I map to my physical server using NFS.

The entry in my /etc/fstab file is something like this.  /usr/local/share/backups  nfs defaults    0 0

This means I can read/write to my backups folder as if it was a local folder on a local disk.

The job at hand is to be able to backup my git repositories using a cron task scheduled to run when everything is nice and quiet. Below is the script I use to run my git backups.

# Backup git repositories to another folder

# Make sure only root can run our script
if [ "$(id -u)" != "0" ]; then
   echo "This script must be run as root" 1&gt;&amp;2
   exit 1

#Where they're going to be backed up
#Location of current live repositories

cd $repository_dir
for project in $( ls . )
    if [ -d $project ]; then

    if [ -d $destination ]; then
      cd $destination
      echo "pulling $project..."
      $git_command pull
      echo "done."
      cd $repository_dir
      echo "mkdir $destination"
      mkdir $destination
      cd "$repository_dir/$project"
      echo "cloning $project..."
      $git_command clone -v -l --no-hardlinks . $destination
      cd ..
      echo "done."

Once this script is run, you ‘ll have a copy of all your repositories in the backup folder. The next time it’s run should only take a fraction of the time as it won’t have to reestablish the git repository again.

So there you have it, migrating a subversion repository to git is really simple, the fun starts when you try to play with the results. I hope this has been helpful to others faced with the same job. Any improvements please let me know.

Configure git to use a remote repository

  1. Return back to the local machine and add reference to the new ‘remote’ repository from the base directory of the project.

    $ git remote add remote ssh://git at].git

  2. Here the ‘git remote add’ part says add a reference to a remote repository. The second ‘remote’ is the friendly name I want to use when referring to the repository on the git server

  3. Now commit the local files to the local repository – Note: Step 3 was only an add, not a commit. When you commit you’ll be prompted (or you can enter it as a -m option) to enter a message to be used as a comment.

    peter@peter-desktop:~/Projects/myprojectname$ git commit
    Created initial commit 633fd3c: initial checkin of project core and data migration files

  4. It’s time to test the new remote repository by ‘pushing your local repository info up to it. This is done using git push

    peter@peter-desktop:~/Projects/myprojectname$ git push –dry-run –all –repo=remote
    fatal: ‘origin': unable to chdir or not a git archive fatal: The remote end hung up unexpectedly

Didn’t quite go to plan – so let’s see what’s wrong

peter@peter-desktop:~/Projects/myprojectname$ git remote show remote
The authenticity of host ' (' can't be established.
RSA key fingerprint is 5a:ce:6e:a4:78:d5:01:50:36:2b:bb:12:67:e1:be:53.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '' (RSA) to the list of known hosts.
git at's password:
* remote remote
URL: ssh://git at

let’s try again

$ git push –dry-run –all –repo=remote
git at's password:
To ssh://git at
[new branch] master &gt; master

looks like it will work so remove the dry-run parameter

$ git push –all –repo=remote

That’s all folks!

Setup Git Local and Remote Repositories

If you are a remote worker or, like me get some work done on the train on the way to/from your office, you’ll appreciate the need for set up of a local and remote source code repository, This allows you to develop using your laptop/netbook, check in your changes locally and when you arrive at the mothership later, you can synchronise your local repository with the remote one ensuring all your changes are available for other developers. This article shows how to set up git for both a mothership type repository (let’s call it the remote repository) and a local version on your own laptop.

Configuring Git project on the mothership.


A Git server (the mothership) has been set up on a box called bluelight. This box is available to the network as

A git user has been created on the server called ‘git’. This user has access to the folder where the git repositories are stored.

The Steps:

1.On your dev machine create your code project using whatever tools you need.

2.Initialise this working project under the git version control system

$ cd ~/projects/[myprojectname]
$ git init
peter@peter-desktop:~/Projects/rentmanager$ git init
Initialized empty Git repository in /home/peter/Projects/rentmanager/.git/

3.Add whatever work you’ve done to the repository

$ git add app/
$ git add docs/

4.Check the files you want added have been added

$ git status
# On branch master
# Initial commit
# Changes to be committed:
#   (use "git rm --cached ..." to unstage)
#    new file: app/rentmanager/,

5.Open a SSH session to bluelight – (the central git repository server)

$ sudo mkdir /usr/local/share/gitrepos/[myprojectname].git

6.Obviously substitute your real projectname and don’t forget to leave the .git extenstion

7.Initialise the repository under the new folder

$cd /usr/local/share/gitrepos/[myprojectname].git
[peter@bluelight myprojectname.git]$ sudo git init
Initialized empty Git repository in /usr/local/share/gitrepos/myprojectname.git/.git/

8.Change ownership of the repository to the system git user

$cd ..; sudo chown git.git -R /usr/local/share/gitrepos/myprojectname.git

The Martian Principles for Successful Enterprise Systems

I’ve just finished reading a little gem of a book. It’s called ‘The Martian Principles for Successful Enterprise Systems’ with a subtitle of ’20 Lessons Learned from NASA’s Mars Exploration Rover Mission’. The author is Ronald Mak.

Imagine designing an information retrieval, indexing and presentation system for the two Mars rover vehicles that were sent on a one-way reconnaissance mission to Mars for a three month mission. The feisty little vehicles kept going for two years and the information systems had to be designed to cope with this unexpected project over-run.

The book runs to 168 pages and is a ‘should-read’ for anybody involved in designing or buying large-scale enterprise software. From an architect’s perspective, you get a reinforced mental checklist of the aspects of your designs that make them work and ensure they keep working long after you’ve moved on. From a customer’s perspective, you gain an appreciation of the effort put into designing such systems. From a developer’s perspective, now you know why you spend so much time writing and executing unit tests.

The book has short and well directed chapters and is an easy read with coverage of both the technical side of software development and the soft or human side.

As a result of this read, I went back to enhance some application logging classes that I’ve used on a number of projects to provide more granular output and statistics on usage patterns.