Saturday, June 23, 2012

Data Discovery

Today was a good day for mentoring. Teaching students the power of discovery is never easy. They have to have some passion, a little bit of creativity, and some imagination. There has to be some time pressure. They have to learn how to explore on their own, to make sense of the data, and to display it in obvious and intuitive ways ... to know the data well.  They must take ownership of the data set. If there is something wrong you must trust them to find it so it doesn't interfere with your analysis.



Then and only then, can they begin to start the discovery phase. Learning by doing; discovering relationships between the variables. Some obvious and others not so much. And giving them the tools to do so.

A lot of times you offer to teach them how to program in a new exciting language, or you drop them off at your local excel. Sometimes you can give them software that is intuitive and visual. Software that has some complexity but is easy to use and allows for building confidence in searching for patterns in data. For the last few months I have been using Mondrian and so have my students.

The nice part about this is that there is no code to learn ... just clicking and plotting. Suddenly you can see your data, in context, with a little effort. You can use it to help you quality control it and to define new relevant variables which you didn't realize you needed until you saw your data. Its flexible nature allows you to make connections, animations, and displays that help you understand your data and make you think critically about what your data can tell you and what it can not. It shows you, graphically, when you have low sample sizes for hypothesis testing. It gets you connected to your data and you invest valuable and necessary time in exploratory data analysis. And this is the process of learning.