Monday, December 3, 2012

Little Tools - Big Tasks

When you think of government weather agencies, you normally think of millions of weather readings being taken throughout history, of people trying to work out what the next forecast will be based on past experiences and current conditions, and of computers. Big computers.

A problem I repeatedly hit is that in trying to prove something I've noticed for a while (a man-made weekly atmospheric pressure pattern that exists at night in both Toronto and to a lesser extent, Montreal), all this computing power and online data publishing did nothing to lessen my workload in gathering more data to explore it further.

Environment Canada has a historical data product CD that you can download, and this is great for some general purposes, but there's two drawbacks:
1. It was designed as a DOS product, so only works these days on Windows in a CMD shell.
2. It has daily totals, but not the hourly data that I need.

Ironically, the hourly data exists elsewhere on their site and you can click on a link and download the hourly data for each city at one month at a time (the records go back to 1840 according to the drop-down box), but that means manually downloading and processing 172 years of monthly data for 19 cities.  

That's approximately 39,000 files (172yrs x 12mnts x 19 cities).

So having manually done Toronto over a period of a few decades (that took like a week to save all the files and manually import into a spreadsheet and then massage and crunch the data), I decided that I needed some help speeding up this process.

As timing would have it, I'd recently taken delivery of a Raspberry Pi (see details of that here), so I had a little spare computer that would serve two purposes:
1. Teach me more about Linux (it runs a form of Debian) whilst allowing me to program it in straight C (a rarity as I spend 99% of my time in Objective-C).
2. It could go and get all the data I needed, and crunch it all down for me.

The fun thing here is when you look at the stats for the two systems - The Environment Canada Supercomputer that crunches the data, and my little Raspberry Pi that turns it into something I can use:

Processors:  936 vs 1.
Power: 275KW (Enough for 200 homes) vs 0.5W (enough for 5 USB ports)
Cooling: Enough refrigerant to cool 32 Avg homes on a hot day, vs none.  

As you can see, there are two very different systems in play here!

In the case of Environment Canada, speed is important.  It takes 40 minutes to compile a forecast that would take a standard PC 28 days to compile.  In my case, I just want a database of old data that I can query in it's entirety.  To download an entire city's history from the 1950's takes about two minutes.  To get the entire history from 1840 takes substantially longer - somewhere in the order of ten minutes or so.

I have one more tool to write, which takes these downloaded files and prepares them to be loaded into a database.  I will do this on the Raspberry Pi as well. 

You may ask why bother doing it on the Raspberry Pi?  The biggest impetus for this is if I can make this work reasonably fast on a board with limited specs and which only costs $35, then it will work like lightning on a full-blown modern computer that costs $1000.  

This is something that is largely lost in an era when processor speeds and core numbers, memory and access speeds are increasing so fast that you can write shoddy software that may run slow this year, but not worry as next years faster processors will compensate for that.

When I am done, I will likely license out the tools that were created, as I know I'm not the only person that needs historical weather data that can actually be queried on a modern computer.