Friday, December 9, 2011

Killer tornado perspective


Caption: Far upper left is a zoom-in of the Tornado histogram by year, middle is a spinogram (the vertical scale is 100 percent and the bin width is the histogram count) of the magnitude of tornadoes, and below that is the map of killer tornadoes (inset shows 1460 tornadoes highlighted in red, out of the sample population of 55 439). To the right are the log(fatalities) histogram, below that is the log(fatalities+injuries) spinogram, and below that is the log(length*width). The red shading (the 1460 or 2.63% tornadoes) reflects the conditional distribution (based on fatalities). The color brushing is based on the bins of the log(fatalities+injuries).

I was just playing with the tornado data to see where the killer tornadoes have occurred and their stats. You can see the peak in 1974 from the Super Outbreak. The killer tornadoes, statistically, are those that are long-track and wide. Clearly these are also a function of their strength.

I really like the color brushing feature where I highlighted the log(fatalities+injuries) or large impact tornadoes regardless of fatalities. It appears that the distribution is shifted to the left  (attempting to filter out the yellower shade) in the log(length*width) plot or shorter path lengths and not as wide.

The software I used to generate these plots is called Mondrian. The software is really pretty cool that you can visually play with the data. Exploring the richness of your data set becomes pretty easy in this framework. And at least you don't need to write a bunch of code to discover any patterns or even make associations.