What is Data Mining?

Football Result Predictions using Data Mining

The purpose of data mining: data mining serves to discover (hidden, non-trivial) patterns in large amounts of data records in order to be used very effectively for (ex post) analysis and (ex ante) forecasting.

Conventional statistics tools can either not deliver comparable results at all, or only by spending significantly more time and manual input. Specific examples of use can be found for basically any company – also for yours! From experience it can be stated that the generated value added by data mining increases the more non-trivial and “surprising” the results are – simply because these results are helping you to achieve important competitive advantages.

In contrast to simpler, mainly linearly working statistics tools, Data.Mining.Fox uses a complex, multi-variate approach (in the background, hidden for the user). In this process, data mining results are devided into distinct patterns and clusters, which may well show very different characteristics and predominant attribute combinations.

As a fictitious demonstrative initial example for data mining you may think about a car insurance. This company detects with the help of data mining that customers with the combination “red car + pet + professional category xyz” cause clearly fewer accidents, whereas a second (differently characterised!) customer cluster causes clearly more than average accidents – though in this pattern the driving characteristics are “street number smaller than 23 + age above 35 + marital status single”.In contrast to common statstics programs data mining is calculating in one go all relevant driver attributes for all clusters/ patterns. These driver attributes are usually different (and hence much more valuable) than the average-based analysis of statistical analysis on the whole basic set.

A very elegant and handy feature of data mining tools like Data.Mining.Fox is the fact that the quality of the mathematic forecasts can be tested with your own data – in fact even before you have to launch a potentially risky field trial. For this purpose the forecast model is calculated with one part of your historical data, and then the model is applied to the second part (for which you initially “hide” the results). By then comparing the calculated forecasts with the real (initially “hidden”) results of this second part of data records you get a good feeling for the quality of the forecast and hence trust in the quality of Data.Mining.Fox.

Generally speaking, with the help of data mining you can run valid (ex-post) analysis and (ex-ante) forecast models. For each cluster these will calculate the dependence of the target attribute on the other (free) attributes.

How data mining works:
Data mining is all about detecting relations in large data in order to use them for data anlysis and predictions. The essence of data mining can be explained via data tables: If you have a data table, each row of it representing a data record and each column representing an attribute, then you want to build a model in order to predict one column value of a data record on the basis of its other column values.

For example:

userID Income Marital Status
Own Real Estate
Customer Value
0001 2700 single yes 210
0002 3600 married no 320
0003 2400 widow yes 190

After having built a prediction model for ‘customer value’ you can predict this value for clients of whom you only know the values of the first three columns.In data mining, table columns can contain both numbers and text. In addition, good data mining algorithms deal with missing column values in a mathematically correct way. Therefore, data mining is very suitable for data tables appearing in practice: very large amounts of data, though not always in perfect quality.There are two types of prediction models:

  1. classification model: This type of model predicts a column which contains non-numericalvalues.
  2. regression model: This type of model predicts a column which contains numerical values.

Examples referring to the above data table:

predicted column: the forecast provides:
Classification: ‘Own Real Estate’ probability for ‘yes’ and ‘no’
Regression: ‘Customer Value’ a number

FAQ:

?   Do I need experience with data mining when using your tool?
>   You may be surprised – but the answer is no! A sound analytical understanding in combination with our software’s built-in documentation as well as an adequate understanding of your company data are sufficient when using Data.Mining.Fox®.

?   Somebody told me that I need to assure high data quality before starting to use data mining software – e.g. by launching large data quality assurance projects  and a data warehouse setup. Is this true?
>   No, not necessarily. Both project make definitely sense, but they require a lot of money and time. Data.Mining.Fox® offers integrated and automated functionalities which can e.g. even deal with missing data fields and still deliver very good results. This is done by the way in a manner which does not discard all erroneous data sets (i.e. the remaining information in the defective data sets is still used).

?   Do I have to use scaling and sampling of data before using it with Data.Mining.Fox® – e.g. in order to avoid having very large numbers for one attribute and very small numbers in others, or to avoid the my target attribute shows very many results for one value (e.g. number of non-buyers) and very few for another (e.g. buyers)?
>   No, Data.Mining.Fox® takes care of this automatically. In contrast: if you manipulate your data manually you may even distort the results significantly.

?   I have heard that I must not analyse some data with a data mining tool, even if I have this data saved in my own database – this can not be true, can it?
>   Indeed it can! You should get well informed about which data of your customers etc. you may analyse in which way. This depends among other from the country you operate in, your legal terms and conditions, where you got the data when and how, etc.  The data you use should not contain any personal data and information. You should make use of anonymised and pseudonymised data in order to be compliant with data privacy and data protection law. In case of doubt we recommend to consult a legal expert on this.

?   Is the algorithm used in Data.Mining.Fox® the best on the market?
>   We would not want to claim this in general. If one has enough resources regarding time, money and people you may even obtain excellent results without a data mining tool. Yet, any time you lack one or more of these resources, the algorithm of Easy.Data.Mining™ provides enormous advantages regarding the quality of results which you can obtain in a given period of time with a limited effort. An implicit advantage is that Data.Mining.Fox® uses various mathematical concepts in parallel, allowing you to not bother about the best algorithm choice for a given problem of yours.

?   What algorithm and techniques do you use?
>   Data.Mining.Fox® uses an intelligent combination of a multivariate approach, decision trees, and genetic algorithms – our GMDT™ (Genetic Multivariate Decision Trees). This core principle is complemented by other criteria – e.g. protection factors for over-fitting, confidence calculations, etc.

?   Can I run the Windows version of Data.Mining.Fox® on a VM (Virtual Machine)?
>   Yes. What you should consider though is that e.g. for an installation of the Windows version of Data.Mining.Fox® on a VM of an Apple Mac you can not work with the Java version of your Mac, i.e. the folder (e.g.”jre1.6.0″ for Windows) has to be copied into the installation folder (alternatively you have to entirely install Java for Windows). Also it may happen that you get an error message regarding MSVCR71.dll not being found when starting the application; this Windows problem (note: this is not a problem of Easy.Data.Mining™) can be solved by downloading this dll from the Internet and copying it into the folder Windows/System32.

?   With all these great possibilities at Easy.Data.Mining™ it looks as if I can solve all my entrepreneurial challenges with Data.Mining.Fox®, right?
>   Not entirely. One simple rule is still valid, even for the best tools: an essential part of intelligence is still sitting in front of the computer – your employee. Data.Mining.Fox® offers much more automation than many other tools. Yet, if you do your analyses without any analytical sense or with a lack of knowledge about the essentials of your business model, then of scourse you may end up with a wrong assessment even when using Data.Mining.Fox®.

?   We are missing the product feature XYZ in Data.Mining.Fox® – could you offer us an uncomplicated help on this?
>   We very much appreciate any such proposals. And if the feature request shows a reasonable balance between efforts and value added as well as a likelihood of being suitable also for other customers, then we will start the implementation right away. Yet, even if you require something rather special we are sure to find a suitable solution for your challenge.