{"id":3286,"date":"2018-05-17T01:26:22","date_gmt":"2018-05-16T23:26:22","guid":{"rendered":"http:\/\/dataminingsoccer.com\/en\/?page_id=3286"},"modified":"2018-05-17T08:04:01","modified_gmt":"2018-05-17T06:04:01","slug":"what-is-data-mining","status":"publish","type":"page","link":"http:\/\/dataminingsoccer.com\/en\/our-analysis\/what-is-data-mining\/","title":{"rendered":"What is Data Mining?"},"content":{"rendered":"<h2>Football Result Predictions using Data Mining<\/h2><p><strong>The purpose of data mining<\/strong>: data mining serves to <strong>discover <\/strong>(hidden, non-trivial)<strong> patterns in large amounts of data records<\/strong> in order to be used very effectively for (ex post) analysis and (ex ante) forecasting.<\/p>\n<p>Conventional statistics tools can either not deliver comparable results at all, or only by spending significantly more time and manual input. Specific examples of use can be found for basically any company \u2013 also for yours! From experience it can be stated that the generated <strong>value added<\/strong> by data mining increases the more non-trivial and \u201csurprising\u201d the results are \u2013 simply because these results are helping you to achieve important <strong>competitive advantages<\/strong>.<\/p>\n<p>In contrast to simpler, mainly linearly working statistics tools, Data.Mining.Fox uses a complex, multi-variate approach (in the background, hidden for the user). In this process, data mining results are devided into distinct <strong>patterns <\/strong>and<strong> clusters<\/strong>, which may well show very different characteristics and predominant attribute combinations.<\/p>\n<p>As a fictitious <strong>demonstrative initial example<\/strong> for data mining you may think about a car insurance. This company detects with the help of data mining that customers with the combination \u201cred car + pet + professional category xyz\u201d cause clearly fewer accidents, whereas a second (differently characterised!) customer cluster causes clearly more than average accidents \u2013 though in this pattern the driving characteristics are \u201cstreet number smaller than 23 + age above 35 + marital status single\u201d.In contrast to common statstics programs <strong>data mining<\/strong> is calculating <strong>in one go all relevant driver attributes for all clusters\/ patterns<\/strong>. These driver attributes are usually <strong>different (and hence much more valuable)<\/strong> than the average-based analysis of statistical analysis on the whole basic set.<\/p>\n<p>A very elegant and handy feature of data mining tools like Data.Mining.Fox is the fact that <strong>the quality of the mathematic forecasts can be tested with your own data<\/strong> \u2013 in fact even before you have to launch a potentially risky field trial. For this purpose the forecast model is calculated with one part of your historical data, and then the model is applied to the second part (for which you initially \u201chide\u201d the results). By then comparing the calculated forecasts with the real (initially \u201chidden\u201d) results of this second part of data records you get a good feeling for the quality of the forecast and hence trust in the quality of Data.Mining.Fox.<\/p>\n<p>Generally speaking, with the help of data mining you can run valid <strong>(ex-post) analysis<\/strong> and <strong>(ex-ante) forecast models<\/strong>. For each cluster these will calculate the dependence of the target attribute on the other (free) attributes.<\/p>\n<p><strong>How data mining works:<\/strong><br \/>\nData mining is all about detecting relations in large data in order to use them for data anlysis and predictions. The essence of data mining can be explained via data tables: If you have a data table, each row of it representing a data record and each column representing an attribute, then you want to build a model in order to predict one column value of a data record on the basis of its other column values.<\/p>\n<p>For example:<\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<td><strong>userID<\/strong><\/td>\n<td><strong>Income<\/strong><\/td>\n<td><strong>Marital Status<br \/>\n<\/strong><\/td>\n<td><strong>Own Real Estate<br \/>\n<\/strong><\/td>\n<td><strong>Customer Value<br \/>\n<\/strong><\/td>\n<\/tr>\n<tr>\n<td>0001<\/td>\n<td><strong>2700<\/strong><\/td>\n<td>single<\/td>\n<td>yes<\/td>\n<td>210<\/td>\n<\/tr>\n<tr>\n<td>0002<\/td>\n<td><strong>3600<\/strong><\/td>\n<td>married<\/td>\n<td>no<\/td>\n<td>320<\/td>\n<\/tr>\n<tr>\n<td>0003<\/td>\n<td><strong>2400<\/strong><\/td>\n<td>widow<\/td>\n<td>yes<\/td>\n<td>190<\/td>\n<\/tr>\n<tr>\n<td>\u2026<\/td>\n<td><strong>\u2026<\/strong><\/td>\n<td>\u2026<\/td>\n<td>\u2026<\/td>\n<td>\u2026<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div>\n<p>After having built a prediction model for \u2018customer value\u2019 you can predict this value for clients of whom you only know the values of the first three columns.In data mining, table columns can contain both numbers and text. In addition, good data mining algorithms <strong>deal with missing column values in a mathematically correct way<\/strong>. Therefore, data mining is very suitable for data tables appearing in practice: very large amounts of data, though not always in perfect quality.There are two types of prediction models:<\/p>\n<ol>\n<li><strong>classification model<\/strong>: This type of model predicts a column which contains <strong>non-numerical<\/strong>values.<\/li>\n<li><strong>regression model<\/strong>: This type of model predicts a column which contains <strong>numerical<\/strong> values.<\/li>\n<\/ol>\n<p>Examples referring to the above data table:<\/p>\n<\/div>\n<table border=\"1\">\n<tbody>\n<tr>\n<td><\/td>\n<td><strong>predicted column:<\/strong><\/td>\n<td><strong>the forecast provides:<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Classification:<\/strong><\/td>\n<td>\u2018Own Real Estate\u2019<\/td>\n<td>probability for \u2018yes\u2019 and \u2018no\u2019<\/td>\n<\/tr>\n<tr>\n<td><strong>Regression:<\/strong><\/td>\n<td>\u2018Customer Value\u2019<\/td>\n<td>a number<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>FAQ:<\/p>\n<p>?\u00a0\u00a0 Do I need experience with data mining when using your tool?<br \/>\n&gt;\u00a0\u00a0 You may be surprised \u2013 but the answer is no! A sound analytical understanding in combination with our software\u2019s built-in documentation as well as an adequate understanding of your company data are sufficient when using Data.Mining.Fox\u00ae.<\/p>\n<p>?\u00a0\u00a0 Somebody told me that I need to assure high data quality before starting to use data mining software \u2013 e.g. by launching large data quality assurance projects\u00a0 and a data warehouse setup. Is this true?<br \/>\n&gt;\u00a0\u00a0 No, not necessarily. Both project make definitely sense, but they require a lot of money and time. Data.Mining.Fox\u00ae offers integrated and automated functionalities which can e.g. even deal with missing data fields and still deliver very good results. This is done by the way in a manner which does not discard all erroneous data sets (i.e. the remaining information in the defective data sets is still used).<\/p>\n<p>?\u00a0\u00a0 Do I have to use scaling and sampling of data before using it with Data.Mining.Fox\u00ae \u2013 e.g. in order to avoid having very large numbers for one attribute and very small numbers in others, or to avoid the my target attribute shows very many results for one value (e.g. number of non-buyers) and very few for another (e.g. buyers)?<br \/>\n&gt;\u00a0\u00a0 No, Data.Mining.Fox\u00ae takes care of this automatically. In contrast: if you manipulate your data manually you may even distort the results significantly.<\/p>\n<p>?\u00a0\u00a0 I have heard that I must not analyse some data with a data mining tool, even if I have this data saved in my own database \u2013 this can not be true, can it?<br \/>\n&gt;\u00a0\u00a0 Indeed it can! You should get well informed about which data of your customers etc. you may analyse in which way. This depends among other from the country you operate in, your legal terms and conditions, where you got the data when and how, etc.\u00a0 The data you use should not contain any personal data and information. You should make use of anonymised and pseudonymised data in order to be compliant with data privacy and data protection law. In case of doubt we recommend to consult a legal expert on this.<\/p>\n<p>?\u00a0\u00a0 Is the algorithm used in Data.Mining.Fox\u00ae the best on the market?<br \/>\n&gt;\u00a0\u00a0 We would not want to claim this in general. If one has enough resources regarding time, money and people you may even obtain excellent results without a data mining tool. Yet, any time you lack one or more of these resources, the algorithm of Easy.Data.Mining&#x2122; provides enormous advantages regarding the quality of results which you can obtain in a given period of time with a limited effort. An implicit advantage is that Data.Mining.Fox\u00ae uses various mathematical concepts in parallel, allowing you to not bother about the best algorithm choice for a given problem of yours.<\/p>\n<p>?\u00a0\u00a0 What algorithm and techniques do you use?<br \/>\n&gt;\u00a0\u00a0 Data.Mining.Fox\u00ae uses an intelligent combination of a multivariate approach, decision trees, and genetic algorithms \u2013 our GMDT&#x2122; (Genetic Multivariate Decision Trees). This core principle is complemented by other criteria \u2013 e.g. protection factors for over-fitting, confidence calculations, etc.<\/p>\n<p>?\u00a0\u00a0 Can I run the Windows version of Data.Mining.Fox\u00ae on a VM (Virtual Machine)?<br \/>\n&gt;\u00a0\u00a0 Yes. What you should consider though is that e.g. for an installation of the Windows version of Data.Mining.Fox\u00ae on a VM of an Apple Mac you can not work with the Java version of your Mac, i.e. the folder (e.g.\u201djre1.6.0\u2033 for Windows) has to be copied into the installation folder (alternatively you have to entirely install Java for Windows). Also it may happen that you get an error message regarding MSVCR71.dll not being found when starting the application; this Windows problem (note: this is not a problem of Easy.Data.Mining&#x2122;) can be solved by downloading this dll from the Internet and copying it into the folder Windows\/System32.<\/p>\n<p>?\u00a0\u00a0 With all these great possibilities at Easy.Data.Mining&#x2122; it looks as if I can solve all my entrepreneurial challenges with Data.Mining.Fox\u00ae, right?<br \/>\n&gt;\u00a0\u00a0 Not entirely. One simple rule is still valid, even for the best tools: an essential part of intelligence is still sitting in front of the computer \u2013 your employee. Data.Mining.Fox\u00ae offers much more automation than many other tools. Yet, if you do your analyses without any analytical sense or with a lack of knowledge about the essentials of your business model, then of scourse you may end up with a wrong assessment even when using Data.Mining.Fox\u00ae.<\/p>\n<p>?\u00a0\u00a0 We are missing the product feature XYZ in Data.Mining.Fox\u00ae \u2013 could you offer us an uncomplicated help on this?<br \/>\n&gt;\u00a0\u00a0 We very much appreciate any such proposals. And if the feature request shows a reasonable balance between efforts and value added as well as a likelihood of being suitable also for other customers, then we will start the implementation right away. Yet, even if you require something rather special we are sure to find a suitable solution for your challenge.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Football Result Predictions using Data MiningThe purpose of data mining: data mining serves to discover (hidden, non-trivial) patterns in large amounts of data records in order to be used very effectively for (ex post) analysis and (ex ante) forecasting. Conventional statistics tools can either not deliver comparable results at all, or only by spending significantly &hellip; <a href=\"http:\/\/dataminingsoccer.com\/en\/our-analysis\/what-is-data-mining\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">What is Data Mining?<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"parent":55,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v14.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<meta name=\"robots\" content=\"index, follow\" \/>\n<meta name=\"googlebot\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta name=\"bingbot\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/dataminingsoccer.com\/en\/our-analysis\/what-is-data-mining\/\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"http:\/\/dataminingsoccer.com\/en\/#website\",\"url\":\"http:\/\/dataminingsoccer.com\/en\/\",\"name\":\"Football Result Predictions\",\"description\":\"based on statistical analyses using data mining.. We love football! We love number games!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"http:\/\/dataminingsoccer.com\/en\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"http:\/\/dataminingsoccer.com\/en\/our-analysis\/what-is-data-mining\/#webpage\",\"url\":\"http:\/\/dataminingsoccer.com\/en\/our-analysis\/what-is-data-mining\/\",\"name\":\"What is Data Mining? - Football Result Predictions\",\"isPartOf\":{\"@id\":\"http:\/\/dataminingsoccer.com\/en\/#website\"},\"datePublished\":\"2018-05-16T23:26:22+00:00\",\"dateModified\":\"2018-05-17T06:04:01+00:00\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/dataminingsoccer.com\/en\/our-analysis\/what-is-data-mining\/\"]}]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"http:\/\/dataminingsoccer.com\/en\/wp-json\/wp\/v2\/pages\/3286"}],"collection":[{"href":"http:\/\/dataminingsoccer.com\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/dataminingsoccer.com\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/dataminingsoccer.com\/en\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"http:\/\/dataminingsoccer.com\/en\/wp-json\/wp\/v2\/comments?post=3286"}],"version-history":[{"count":3,"href":"http:\/\/dataminingsoccer.com\/en\/wp-json\/wp\/v2\/pages\/3286\/revisions"}],"predecessor-version":[{"id":3302,"href":"http:\/\/dataminingsoccer.com\/en\/wp-json\/wp\/v2\/pages\/3286\/revisions\/3302"}],"up":[{"embeddable":true,"href":"http:\/\/dataminingsoccer.com\/en\/wp-json\/wp\/v2\/pages\/55"}],"wp:attachment":[{"href":"http:\/\/dataminingsoccer.com\/en\/wp-json\/wp\/v2\/media?parent=3286"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}