2. Data cleaning
After the data collection is completed, it business email list cannot be used directly, and data cleaning needs to be done to turn the data into usable data. This is like the process of washing and chopping vegetables after buying vegetables from the market.
Data cleaning mainly cleans three types of data:
The solution to missing data is roughly divided into two types, the first is to delete it directly, and the second is to fill it.
If the data format is not uniform, it is better to solve it, and it is good to do normalization directly.
In the case of outliers, it is only necessary to find outliers and remove them. There are different ways to find outliers for different data. For example, 30,000 people in a school have a physical examination, and each person’s weight is manually entered. The 3σ law test can be used to find out the wrong data entered.
3. Summary
Data collection and data cleaning are very important in the entire modeling process, and the quality of the data directly affects the accuracy of the final model. However, data collection and data cleaning are hard work. The process is cumbersome and the technical content is not high. It needs AI product managers and algorithm engineers to complete it together. This will take a lot of time, so you must be patient and careful.
Zhang, everyone is a product manager columnist. AI Product Manager, specializing in natural language processing and image recognition. Now a partner of an intelligent insurance startup company, I hope to communicate more with entrepreneurs in the field of artificial intelligence.
This article was originally published on everyone is a product manager, and reprinting is prohibited without permission