Tuesday, June 11, 2013

Tenacity, Progress, And Uncertainty

Those that know me well, know that I always have a to-do list as long as my arm.  If I've a few hours free, I'm likely to chip away at these projects.  Some projects are quick, and others take time.  One of the longest running projects I've had now is weather related - which started some 4 to 5 years ago and having discussed it with Environment Canada, I was left with more questions than I had to begin with.

Having spent a lot of time mulling over the problem, I finally got around to progressing it forward.  Unlike many of my projects where I go gangbusters on it for a week and I'm done, this one has been done at a much more leisurely pace.  I've allowed myself to deliberately take the most painful and difficult paths I could find, so I can say "I did that".

For instance, I could of just written a bit of script to go and download the weather data I needed.  But instead I chose to install a Debian squeeze on a Raspberry Pi, then code a custom utility in C using "nano" as my editor to grab the weather files.

The resulting code looked like this.

1:  /*  
2:      Name:         getcsv.c  
3:      Copyright:     (C) Jason Coulls, 2012  
4:      Purpose:     Downloads Environment Canada historical data to a Raspberry Pi,  
5:                   using the date ranges supplied for the specified station ID.  
6:  */  
8:  //#define CURL_STATICLIB  
9:  #define MAXPATH 255  
11:  #include <stdio.h>  
12:  #include <string.h>  
13:  #include <time.h>  
14:  #include <curl/curl.h>  
15:  #include <curl/easy.h>  
17:  size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {      
18:    size_t written;  
19:    written = fwrite(ptr, size, nmemb, stream);  
20:    return written;  
21:  }  
23:  int main(int argc, char *argv[])  
24:  {  
26:      //Must have five arguments.  
27:      if( argc != 5){  
28:          //Let the user know how this should be done.  
29:          printf("Usage: getcsv [start mm/yyyy] [end mm/yyyy] [station id] [output dir]\n");  
30:          printf("The file name will be automatically named.\n");  
31:          return 99;    //Error 99 - Replace User and Press Any Key To Continue.  
32:      }  
34:      //Declare where we'll store our arguments.  
35:      char strStartDate[7];        //mm/yyyy  
36:      char strEndDate[7];        //mm/yyyy  
37:      char stationID[6];        //4-6 digits.  
38:      char strOutput[MAXPATH];    //The directory to save file in.  
40:      //Grab the supplied command line parameters.  
41:      strcpy(strStartDate, argv[1]);  
42:      strcpy(strEndDate, argv[2]);  
43:      strcpy(stationID, argv[3]);  
44:      strcpy(strOutput, argv[4]);  
46:      //Get ready to store the broken down dates.  
47:      int startYear, startMonth;  
48:      int endYear, endMonth;  
50:      //Break it down.  
51:      sscanf(strStartDate, "%d/%d", &startMonth, &startYear);  
52:      sscanf(strEndDate, "%d/%d", &endMonth, &endYear);  
54:      //Now we loop through each year between these dates.  
55:      int currentYear;  
56:      int currentMonth;  
57:      for(currentYear = startYear; currentYear <= endYear; currentYear++){  
59:          //Now we loop through each month between these dates.  
60:          for(currentMonth = startMonth; currentMonth <= endMonth; currentMonth++){  
62:              //Make this into a date.  
63:              struct tm * thisDate;  
64:              time_t now = time(NULL);  
65:              thisDate = gmtime(&now);  
66:              thisDate->tm_year = currentYear;  
67:              thisDate->tm_mon = currentMonth;  
68:              thisDate->tm_mday = 1;  
69:              thisDate->tm_hour = 1;  
70:              thisDate->tm_min = 1;  
71:              thisDate->tm_sec = 1;  
72:              thisDate->tm_isdst = 1;  
74:              printf("Processing: %d/%d\n", currentMonth, currentYear);  
76:              //Download the file to the destination.  
77:              //http://www.climate.weatheroffice.gc.ca/climateData/bulkdata_e.html?timeframe=1&Prov=XX&StationID=5097&Year=2012&Month=11&Day=&format=csv  
78:              char urlString[1024];  
79:              snprintf(urlString, sizeof(urlString), "http://www.climate.weatheroffice.gc.ca/climateData/bulkdata_e.html?timeframe=1&Prov=XX&StationID=%s&Year=%d&Month=%d&Day=&format=csv", stationID, currentYear, currentMonth);  
81:              //Having built the url for the month's data, now request it using CURL.  
82:              CURL *curl;  
83:              CURLcode res;  
85:              //Work out where we're saving this file.  
86:              FILE *filePointer;  
87:              char outputPath[MAXPATH];  
88:              snprintf(outputPath, sizeof(outputPath), "%s/%s-%d-%d.csv", strOutput, stationID, currentYear, currentMonth);  
89:              curl = curl_easy_init();  
90:              if(curl) {  
92:                  printf("fetching %s\n", outputPath);  
93:                  filePointer = fopen(outputPath,"wb");  
94:                  curl_easy_setopt(curl, CURLOPT_URL, urlString);  
95:                  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);  
96:                  curl_easy_setopt(curl, CURLOPT_WRITEDATA, filePointer);  
98:                  /* Perform the request, res will get the return code */  
99:                  res = curl_easy_perform(curl);  
101:                  /* Check for errors */  
102:                  if(res != CURLE_OK){  
103:                      //Tell us what went wrong.  
104:                      fprintf(stderr, "curl_easy_perform() failed: %s\n",  
105:                      curl_easy_strerror(res));  
106:                  }  
108:                  /* always cleanup */  
109:                  curl_easy_cleanup(curl);  
110:                  fclose(filePointer);  
112:              }  
113:          }  
114:      }  
116:      return 0;  
117:  }  

As you can see, the code isn't rocket science, but doing it in nano means no debugger, and no help to see what went wrong.  As a result, I quickly got my raw C coding skills improved.  To continue the theme of scaling the most difficult way possible, I decided on compiling the thing using the GCC compiler on the command line.  

Seeing this come to life on the Raspberry Pi device was rather a big thing for me.  I like seeing things come to life - no matter how small.  

The next problem was I had to assemble the data into something that could be used.  For this, I set up a spare Mac Mini OS X Server with MySQL on it, and built a quick bit of PHP to populate it.  This took another two weeks to complete. (Partially due to two outages - one caused by our neighbour catching fire and knocking out the power, and one caused by one of the kids interfering with the laptop driving the PHP - and as I didn't know how far through the current city it was each time, I had to delete and reimport the current city).

As I move forward, once I've populated the 27million day names (Monday, Tuesday, etc) for the readings, the biggest issue is once I've worked out if the problem I've noticed with Toronto is across Canada (I know it's a similar problem in Montreal), what to do with that answer.

I've started running some "pre-flightcheck" runs of reports I've created whilst the rest of the cities are still imported.

Here we have some cities and their "Average kPa" over the past 5 decades.  Note the dip in the 1990s.

Next, we have the same data flattened out and compared with more cities.  The big anomoly here is Charlottetown PEI.  I'm not sure if that's Environment Canada's data yet, or a real anomaly...

So, as you can see, I'm starting to get some fruits from this labour.  I'm almost done importing, then it'll take a few days to populate the days.  Then, I can start the reports I have built running.

Then I have some questions to ask.

I know I need to check the following:
Is the weather pattern coast to coast?
Is the weather pattern recent, or has it always been there?
Is it getting more or less pronounced?

Then I'm sort of stuck again.

See, I'm not a meteorologist.  I'm just a guy that has spotted something man-made in the weather cycle and is trying to get to the bottom of the cause of what it is I'm seeing.

Fun times, eh?