I have been computing total severe probabilities from the SSEO and the event on the 17th offered a nice opportunity to do "verification". The idea is to use object based Hourly Maximum Updraft Helicity and treat local maxima as storm reports. This method allows us to extract information from the 4km pseudo convection allowing ensemble in a way that is comparable to storm reports (at least at the level of the grid used for verification, wind reports are scrutinized for speed).
Here is the verification, and of course this is all experimental:
Observed reports are in the lower right corner. The observed probabilities highlight 3 local maxima (SW OK, N MO, and IN). Many of the models do not have the same pattern as reports and are in fact different enough from each other that the ensemble mean is relatively smooth except for MO. NSSL-WRF does a pretty good job with the events in SW OK despite having only one significant storm (yet multiple strong tracks indicated by the magenta contours). The NMMB Nest did the best in terms of the overall pattern and at least overlapping all of the observed areas. So notice that the correlations are 0.87 (NSSL) and 0.85 (NMMB) while the CSI (taken at the 15% threshold) are .59 and .51 respectively. So while my eye (quite subjective) says NMMB was better, the other metrics indicate otherwise.
Lets look at the Bias: 0.95 and 0.67 and ROC Area: .87 and .87 . So the NSSL-WRF gets slightly higher scores despite missing one of the local maxima areas. It does so partially because the bias is near 1, covering more of the area with higher probabilities. NMMB covers more overall area but not at the 15% threshold. The relative maxima in MO is more displaced than in NSSL-WRF.
And thus is the problem of verification. What you decide is a good forecast based on your eyes can be different when you start applying hard and fast "metrics" of skill or forecast goodness. There are many aspects to forecast verification including constructing skill scores. I plan to have an REU student pursue this over the summer as I continue to accrue more forecasts (so far I have about 1.5 years of daily forecasts).
The other interesting aspect of this particular case is the position of the primary probability axis along the frontal boundary. Note how each model has a slightly different position and each of those positions correspond to the front. This was an exceptionally difficult case especially for the time lagged members (the 2nd set of HRW members on the plots 2nd row left, 3rd row left). This variability was typical even for the operational GFS and NAM. Why each model had drastically different frontal positions is likely to be a function of both grid spacing, initialization time, physics, land surface state and data assimilation methods. This is clearly a day that would be useful for further analysis outside of simply severe weather. Investigating issues like these will help us understand how to get the details better constrained.
No comments:
Post a Comment