Tuesday, February 8, 2011

Meteorological data

UPDATE: Remember when I said there are some missing data? Yeah. Only for the case I want to analyze. The exact 16 hours I most wanted. Funny that the operational data that is saved all over the place has more observations than the archived ASOS data at NCDC. Redundancy is an important part of data archival.

I am analyzing a particular case and have been looking at the unique observations that the Oklahoma mesonet collects. I want to add to this huge data source. Therein lies the issue. Merging data sets is quite a task. Of course the fancier one gets, the more trouble there is.

Recently the NWS added many stations to its list of archived, 1 and 5 minute ASOS data. It is a decent dataset even if it is spatially sparse. The issue is the format. Now regular hourly and special observations get transmitted in METAR format. There is a nice decoder written for GEMPAK which processes this data. I would say it is awesome but it is suitable. What makes it better is that it retains the whole METAR data string for subsequent data mining. This ensures that some of the metadata (+TSGRFC, PRSFR, etc) are not lost.

However, the potential research quality dataset currently being archived at NCDC undergoes no quality control and the files can have transmission problems. It also suffers because if the data are not relayed in time, the data is lost. Thus one must process the data and visually inspect for issues. I did this many years ago as part of my PhD training for the BAMEX field project and it was a nightmare to write code to process the 1 minute data. I ended up having multiple Fortran codes to deal with some of the transmission problems, formatting problems or missing data.

The five minute data are stored in METAR format, but not exactly METAR form. I would think someone could process the minute data into a quality 5 minute data set with a decent, readable format and provide some measure of quality control. (I will share my code which reads the data. ) This could be an interesting data set. As it is now it is difficult to work with, but not impossible. I heard that the community was trying to organize a network of networks for surface data. I hope they succeed and I hope they model whatever they do with the successful components of the Oklahoma Mesonet, MesoWest, and the  Iowa Environmental Mesonet.