Since the beginning of mankind, information was the key factor for human progress. Whether it was how to make tools or where to catch animals, knowledge was extremely valuable and essential for our advancement. Today the situation is the same, with the difference that we are dealing with an enormous amount of data. The real challenge is to find out what is valuable and significant from this mess of information. This is where data mining comes in handy.
Why Data Mining?
The idea behind data mining is not new. Companies have used sales reports and other statistics to analyze customers habits for years. Traditionally, this work was done by data analysts. Technological advancement changed it drastically. Today we can talk about the explosion of data. Every credit card transaction, online search, purchase at the store produce new information about customers, their preferences, and tendencies. We have a lot of data but little knowledge. Modern technology allows us to automate the process of extracting implicit, previously unknown information and processing it into useful conclusions.
Methodology of data mining is based on searching for trends, regularities, and patterns in large databases. We can distinguish three, most commonly used methods of processing that data:
classification (e.g. creating a profile of the customer who is most likely to buy a certain product),
clustering (e.g. identifying customers with similar buying habits),
association rules (e.g. finding which products are frequently purchased together).
It is possible to perform such an analysis by humans, but machines are much more efficient in this task. Another advantage of data mining is the use of relational data which give us much more complex results than propositional data. The analysis process takes into account both the entity but also all the entities to which it is related. This is why data mining has recently become the center of attention both for business and for science.
Uses of Data Mining
Data mining is most often used by companies to analyze customer behavior. Knowledge of specific patterns and habits allows for targeting ads, creating the specific layout of stores and many other actions which increase revenue. But there are many other areas where data mining became very useful.
A very interesting example is education. Analysis of students learning behavior and results can be used to adjust the educational program and choose what to teach and how to teach. This allows schools to focus on specific aspects of teaching and improve students results.
Data mining holds great potential for financial markets security. Machines are much more effective at detecting risk of frauds. Data mining can provide factual patterns, which may be used to analyze dozens of transactions and detect those that raise suspicion of fraud.
Other areas of data mining use include, for example, the classification of discovered space objects, criminal investigations (where and when is crime most likely to happen?) or healthcare (predicting volumes of patients in different categories).
Controversies Around Data Mining
Data mining, however, raises some concerns, particularly when it comes to privacy issues. Private investigator Steve Rambam, during his speech at the Next Hope hacker conference in 2006, even said: “Privacy is dead — get over it”. This is true, our data is being used, whether we like it or not. Many web services, like Google or Facebook, requires users to allow access to your information for data mining. Even if we delete our account, the data is still stored. It seems that if we want to live in a digital world we just have to reconcile it.