Steps to Turn Data Into Information: Data selection, Data preprocessing, Data conversion, Data mining, Interpretation and evaluation
What is Data?
Well, what is data information in its simplest definition, but information that can be measured, counted or classified, that is, if we can measure, put it into a number of categories if we can count, that information will be useful, even in its raw form. For example, time, money, all dimensions, length, weight,% expressions, everything that can be fractional is basically measurable continuous data. Some things are not fractional, we count them as pieces, this is information for us, it is a numerical expression. We can also classify the information in terms of practicality, whether it passes, remains, or falls into which category. This also works for us, but the most desirable data is measurable data, continuous data. Because the tools we will use in it are richer. If we rely on qualitative data only to categorical data, our tools are restricted.
Today DATA is everything. We need to know what data means and how we should evaluate it. If we cannot use data, if we cannot make sense, if we cannot plan using data, data is only a number, it does not make sense. In order to convert data into information, we need to know what the definitions of minimum, maximum, mode, median, range, standard deviation and variance mean and how we should use them.
What is Information?
Information is the meaning that a person gives to the data by using real rules and obtained through research, examination, or observation. It is information about what a thing does, what its physical or functional properties are. For example, the green apple is knowledge, it emphasizes that the apple is green. Or the number of male and female students in a class is information about the students in that class.
Knowledge is information enriched with experiment, experience, interpretation, analysis, and context. It is a form of high-value information ready to be applied in social events, decisions, and actions. Knowledge is a more complex concept than data and information, and it is also defined as a fluid mixture of information about experience and values. According to another definition, knowledge is a useful relationship established between information pieces. Knowledge is a structure of organized information and allows us to make predictions and generalizations.
The data mining process includes three basic steps. The first step is often referred to as clearing data. The most time-consuming part of data mining is preparing data for data mining. The second is used to process, compress, and transform data prepared by data mining algorithms in a way that makes it easy to identify any hidden set of valuable information. In the second phase of data mining, after the data is collected and pre-processed, the data mining algorithm will perform the actual collection process. The third step is the data analysis phase where the output of data mining is evaluated to see if confidential information is found and determine the significance of the phenomenon generated by the data mining algorithm.
Steps to Turn Data Into Information
Development and understanding of the application area
This stage is the preparatory stage for event identification in order to understand the processes that must be completed using transformations and algorithms before starting data mining. Before starting knowledge discovery through data mining, the goals of the end-user must be understood and defined. The information discovery process will take place here and requires prior knowledge.
The first step in the data mining process is to select relevant data from many existing databases to accurately describe a specific task. At this stage, the information in the system should be properly analyzed and associated appropriately for the purpose of the end-user. Since this stage will seriously affect the information obtained as a result of data discovery, care should be taken to select data suitable for the purpose of the end-user.
The most time-consuming part of data mining is data preprocessing. After determining the data source to use, you must clean the original data, create it, and format it in the required format. This stage includes data cleansing operations such as processing loss values and eliminating outliers. The data preprocessing phase directly affects the information obtained as a result of data mining. The success of the preprocessing phase ensures accurate results.
In this step (also called the data reduction step), better data will be generated and ready for data mining. Data directly involved in data mining research can lead to erroneous results. When dealing with numbers, a large number has a greater effect on the result, and a small number has a much smaller effect on the result. These operations should normalize the data. Standardize data by applying various technologies. For example; Suppose a field in the table gets 5-10 values and the other field gets 1-10000 values. In this case, the effect of the data on the results will be different. By applying minimum-maximum normalization to these data, all fields affect the result at the same rate, and data in the same area are converted to 0-1 equivalent.
Data is available at this stage. Data mining techniques should be chosen according to the information to be obtained at the end of the study. For example, you can choose regression or clustering. The choice here depends on the steps taken to get here.
Brief History of Data Mining
- In the 1950s, the first computers were used for mathematical counting.
- Data Collections, Database usage started in the 1960s
- Relational data models and Relational RDMS applications were developed in the 1970s.
- In the 1980s, the use of Relational RDMS is becoming widespread.
- In the 1990s, it is beginning to question how the data collected in daily works can be evaluated.
- The definition and concepts of Knowledge Discovery in Real Databases are introduced in 1991
- Development of the first software on Data Mining in 1992
- In the 2000s, Data Warehouses and Data Mining are becoming widespread.
Scope of application
– Maximizing the return in the marketing campaign
– Increasing Customer Loyalty
– Determining customer purchasing habits
– Market Basket Analysis
– Sales Forecasting
Banking & Finance Sector
– Determination of customer groups according to credit card expenses
– Evaluation of loan requests
In Electronic Commerce
– Detection of Attacks (Intrusion Detection)
– Analysis of visits to Web Pages
– Renewal of the website according to user behavior
Interpretation and evaluation
This stage is the final stage of data mining, interpretation, and validation of the results obtained. For research purposes, the applicability and accuracy of the results obtained are examined. If there are previous studies, compare them to prove the accuracy of the studies.
also read: How I Built a career in Data Analytics