• chop - efficiently dispense with current git branch

    My git workflow includes the hack and ship commands for easy tracking of a shared master branch, and conveniently delivering commits. Feature branches are cheap and fast in git, and I am often spawning new branches to try stuff out or work on unrelated things.

    Now, meet chop - for chopping down the current working branch after it has been shipped and is no longer needed. The script changes the current branch to master, and then deletes the branch you was previously on. If you give a branch-name as an argument that will be the new current branch.

    #!/bin/sh -x
    set -o errexit
    CURRENT_BRANCH=$(git branch | grep '\*')
    git checkout ${1:-"master"} || exit 1
    git branch -d ${CURRENT_BRANCH:2}

    I use this small script is multiple times every day, and I really like the name of it. There is not a whole lot of functionlity, but as this is an often repeated action, it makes sense to automate it.

    Enjoy!

  • Quick data import and linking in Rails

    Some web-applications have to ingest an enormous amount of new data on a regular basis. Import scripts easily become an ever-growing procedural mess, annoying to maintain. In this post I show a bit of code which can be used to simplify and unify such import scripts.

    Assume you have a pipeline of post-import steps to run. This can be organized in numerous ways. Simplest is to just have a bunch of methods called one after the other once you have the data loaded:

    link_frobnitz
    spin_really_fast_around_z_axis
    reticulate_splines
    deploy_hamsters

    Now, assume once in a while one of the steps fail for an unexpected reason. You know, it’s rare data from external sources is as clean as we’d like. So you need to fix a few things and retry the import. However, as datasizes grow and with that the running time of the import, it can be a huge waste redoing all the work because of a misplaced comma made the final deploy_hamsters step fail.

    Exceptions are the obvious way to report fatal data-errors, and implicit or explicit transactions to ensure consistency of the import. But how can this easily be combined for a resume-friendly import mechanism?

    Enter the bulk importer step runner with trivial progress reporting:

    def import_updaters
      all_steps.each do |step_name|
        run_import_step(step_name)
      end
    end
    
    private
    
    def run_import_step step_name
      puts "Running #{step_name}"
    
      ImportModel.transaction do
        self.send(step_name)
      end
    
    rescue => e
      STDERR.print "\nImport error:  #{e.inspect}\n#{e.backtrace.join("\n")}"
      STDERR.print "Please resume at step #{step_name}"
      exit 1
    end
    
    protected
    
    def all_steps
      [
        :link_frobnitz,
        :spin_really_fast_around_z_axis,
        :reticulate_splines,
        :deploy_hamsters
      ]
    end

    Notice you obviously have to change the model-name (ImportModel above) and provide the actual implementation for these individual steps. all_steps returns the list of methods to run, run_import_step runs a single step with error-handling, and import_updaters runs all the relevant updaters.

    Easy performance statistics

    As a bit of bonus-functionality, the following can be used for reporting import progress with timing-statistics after each step completes:

    def report_progress message, &block
      STDERR.print message
    
      if block_given?
        time = Benchmark.measure { yield }
        formatted_time = "%.2fs" % time.real
    
        STDERR.puts " - #{formatted_time}"
      else
        STDERR.puts
      end
    end

    Usage is simple - just call report_progress with a comment to print and a block of code, like this:

    def run_import_step step_name
      report_progress "Running #{step_name}" do
        ImportModel.transaction do
          self.send(step_name)
        end
      end
      //...

    What do you use to make data-imports easier to manage?

subscribe via RSS