Jon Aquino's Mental Garden

Engineering beautiful software jon aquino labs | personal blog

Tuesday, January 11, 2005

Timesheets: A Story

Today I was faced with the problem of updating my resume and entering my work history into a "skills matrix", for a Request For Proposal for a government contract. The problem with this sort of activity is that it is often hard to present quantitative measures of what one has been doing for the past five years, yet this is what is asked for in RFP's -- number of months in this project, number of months using that skill. Nobody keeps track of what they are doing every minute of every day. So the problem is that there are no easily available records providing exactly the kind of quantitative information asked for in the RFP ...

But the desired information is almost readily available. It turns out that my company keeps a database of the timesheet records of its employees. My hope was to extract my records from the database and somehow graph them, thus providing me with the dates and number-of-months information that I needed, as well as a reminder of what projects I have been engaged in recently. I started by opening PL/SQL Developer and running a SQL query to retrieve my records and export them into a text file. My plan was to import the text file into Excel to create the graph. I knew I would need to transform the text into an appropriate format for Excel to generate the kind of graph I wanted (a scatterplot of horizontal lines, with the Y-axis representing Project and the x-axis representing Date). This intermediate text-transformation step I would accomplish with a small Ruby script, because of Ruby's wonderful native regular-expression handling. Data would be pumped into and out of the script using cat, | and >.

The first way I laid out the text didn't quite work. I tried arranging the dates for each project in rows. But the number of dates in some of the rows exceeded Excel's maximum number of columns, so half of the data failed to import.

Next I tried arranging the text in columns. This made the Ruby script significantly more complex, because different columns had different lengths (different projects have different numbers of dates). But this too didn't work - Excel didn't pick up on the fact that the columns represented different datasets - it treated all the data as a single dataset.

In desperation, I tried interspersing dummy Y columns between the real columns, hoping that Excel would correctly treat each real column as a separate dataset. The results were still incorrect. Out of curiosity, I tried importing the raw, untransformed data, with a dummy Y column of all 1's. The resulting chart was promising; but the datasets were indistinguishably clumped together. So I assigned different dummy Y values to each dataset -- 1's for the dates of the first project, 2's for the dates of the second, etc. Much better! The horizontal lines emerged! It still wasn't perfect -- horizontal lines were identified by a number; it would be nice if they were labeled, to save me from looking up numeric codes.

That is when a critical insight struck me: I could place each set of dummy Y values in their own column, producing a kind of "staircase" effect in the spreadsheet layout. It worked! Excel understood that I wanted to represent column as a different dataset, and labeled each with a colour and symbol, with a legend describing each. Hooray!

I polished the chart a bit, dropping the background colour, setting an interval of 1 year for the tick marks on the X-axis, and changing the horizontal gridlines to vertical ones. I also sent a screenshot to five of my colleagues to make them aware of my script, and offered to generate the chart for their timesheet records on request (one of them was interested).

When the time came for me to use the chart to enter the dates into the RFP form, I realized that it would be advantageous to arrange the project names alphabetically - this would make it easier to find specific projects; plus, it would tend to group similarly-named projects together. I tweaked the Ruby script to do this. I also noticed that when you move the mouse cursor over a data point, Excel displays a tooltip showing the data (project name, date) - this made it remarkably easy to determine the start and end dates of any given project.

The final result was a 10-minute procedure for producing a 1-page visual summary of anyone's career at the company, extracted from a database of hundreds of thousands of timesheet records.

0 Comments:

Post a Comment

<< Home