An article on data mining appeared in the Atlantic yesterday.  This brief piece explains a number of the complex methods of data mining in wonderfully simple terms.

Discovering information from data takes two major forms: description and prediction. At the scale we are talking about, it is hard to know what the data shows. Data mining is used to simplify and summarize the data in a manner that we can understand, and then allow us to infer things about specific cases based on the patterns we have observed. Of course, specific applications of data mining methods are limited by the data and computing power available, and are tailored for specific needs and goals. However, there are several main types of pattern detection that are commonly used.

Rather than digging into the many specific analysis types, the article discusses the broad methodological classes instead.  Anomaly detection, Association learning, Cluster detection ( see here for more on cluster analysis and how it is used in the Art of Counting project), Classification, and Regression data mining methods are all explained with familiar examples.  For instance, association learning is simplified as:

This is the type of data mining that drives the Amazon recommendation system. For instance, this might reveal that customers who bought a cocktail shaker and a cocktail recipe book also often buy martini glasses. These types of findings are often used for targeting coupons/deals or advertising. Similarly, this form of data mining (albeit a quite complex version) is behind Netflix movie recommendations.

Thank you, Mr. Furnas!

Share Button