SAbineSThe highlights of this article:
- Business Intelligence (BI) involves data mining, visualization, and verbal “what if?” analysis.
- predictive analytics considers algorithms in the form of statistical principles and calculates them using computers.
- Prescriptive Analytics takes up the "what if?" analysis and the simulation and tries to predict the future by visualizing the data and parameterized changes.
- Data Mining and Visualization as an intersection of the 3 methods
Advanced Analytics consists of the three areas of Business Intelligence, Predictive Analytics and Prescriptive Analytics. These domains overlap in various disciplines, such as data mining with all its sub-steps and visualization.
The following diagram should give a rough classification.
While business intelligence (BI) is all about data mining, visualization and verbal “what if?” analyses, predictive analytics also takes algorithms into account in the form of statistical principles and calculates them using computers. Prescriptive analytics takes up the "what if?" analysis and the simulation and tries to predict the future through visualization of the data and parameterized changes and to optimize it according to your own company vision and strategy.
Data Mining
Technical development promotes a wide variety of departments. Database systems are constantly evolving and can efficiently store increasing amounts of data. New algorithms are discovered or set up that could not be implemented with previous technical means. Data Mining (DM) is viewed as a process designed to generate inferences and patterns from large amounts of data. This process includes data preparation, data integration, data selection, data transformation, pattern recognition and validation, as well as the presentation of the findings. Data mining is influenced by many different areas, as the following figure shows.
The process of data mining itself can be built into a higher-level process that already covers parts of these steps.
The data flow of this process starts with the acquired data from one or more data sources. These are then consolidated in a data warehouse and then processed with DM functions. This results in a recognized pattern, on the basis of which the user can generate knowledge. The characteristics of DM come into play at all points in the data flow. In the first step at database level, the data is cleaned and integrated. Then the relevant data is selected and transformed into a target structure that is important for the DM. A template results from the DM, which must be evaluated and presented to the end user. He can use the pattern as a basis for knowledge or discard it if the evaluation fails. In the process, you can go bilaterally to the next step as well as to the previous step for improvement. It is important for companies to achieve consolidated data management despite partially distributed system landscapes. In this way, data can be linked efficiently and, if necessary, patterns can be recognized. For example, the ERP data can be stored with the data from a company's own online shop in order to check how many customers actually make a purchase and vice versa when the customer cancels the purchase. A combination of these heterogeneous data sources has great potential for further exploration for patterns.
With regard to predictive analytics, data mining can be divided into different types of functions, such as regression, clustering or time series forecasting. The end product of these functionalities is a pattern depending on the function.
Visualization
Another common feature of business analytics is the visualization of the results. In order to generate knowledge from the patterns after processing in data mining, it is also important to be able to visualize and read them in the correct format.
Due to the steadily increasing importance of "Big Data" and the countless data series generated every day, data visualization is becoming more and more of an elementary tool. With the help of visual elements such as graphs, charts and maps, outliers, trends and patterns in the data can be quickly identified. The data is abstracted and presented visually in a compressed form. Data visualization can be univariate (one dependent variable) or multivariate (two or more dependent variables).
Examples of such can be found in the following graphic:
Most of the time, the focus is on the relationship and distribution of the data.
For example, the distribution can be displayed in a histogram to help a car brand make a decision when buying a used car by means of a survey among friends.
It is immediately recognizable that most of them drive brand B cars, as this is where the distribution is greatest. The decision depends only on the variable "car brand" and the distribution of such.
Once you have decided on a brand, there are several factors that determine the final purchase. To put it simply, we only look at the price in relation to used vehicles that are no more than five years old.
Apparently, the relationship between the price and the age of the vehicle can already be seen. The older the vehicle, the lower the price. In addition to the statistical facts that reflect the price today, the risk of an older vehicle must also be weighed up with the question "what if".
We advise you on how business performance management can be extended with predictive analysis models.