Remember Me
forgot your password?

How to Handle Missing Values

 

In any organization Data Analysis is important in order to improve the business performance and to sort out different problems associated with the business. To analyse the situation first thing you need is historical data (relevant to the problem). Once you have data, its need to be in proper format so it can be easily analyse. There may be number of problems you have to face while bringing the data into proper format. Missing Data is most common problem that comes up during the data analysis process, especially when you get some feed back from customers to analyse the situation and due to some reason they are not interested to answer all the questions. During the Data Preparation phase (Data Mining Phases) you have to sort out these things to bring the data into proper format. There are two methods that are widely used to come up such situation and helps in well formed decision making in the end of analysis. One is “Avoid the missing data” and other is “Data Imputation” Many data cleansing methods had been developed and incorporated into analysis softwares (SPSS) to handle these problems.

          

Avoid Missing data:

 

This is the easiest way to handle the missing data. Delete all those records that are not completely filled. For example you send feedback form to 1000 customers, and out of 1000 only 800 feedback forms are completely filled. So easiest way is to avoid 200 forms and only consider rest of 800 feedback forms for analysis purposes.                                              

               Some time it may happen for a particular attribute or column you get less response as compare to other attributes. For example out of 1000 responders, 500 don’t like to answer question 6th. So in this case attribute corresponding to 6th number question can be discarded from the analysis.

                                                     Main advantage of this method is, it is not time consuming and same time it is very easy to follow. But there are many drawback associated with this method. Avoiding record or feedback results in losing some information. If the sample data size is large avoiding some records or attributes may not effect the results, but still you need to keep in mind you are losing something.

 

 

Data imputation:

 

Data imputation is another method of handling missing values. By using this method we try to fill missing values in the in the records and attributes. This method is quite useful because by following this method we can make sure we have all the information from responders. There are number of methods to fill the missing values few are given below.

Case Substitution

Mean Substitution

Hot Deck Imputation

Nearest Neighbour Imputation

 

Case Substitution:

By using this method we can replace the missing value with historical value from similar cases. We can not use value from current sample for case substitution; it must be from previous observations.

 

Mean Substitution:

This is quite simple method. In this method we can simply replace the missing value with the mean value for that particular attribute. But this method can not be applied for categorical data. This method is only useful for column or attributes imputation not for row imputation.

 

Hot Deck Imputation:

In this method, missing value is filled with value that comes from similar cases or records in the current sample. That means if two records are quite similar and in one record, value for some attribute is missing then we can fill the value from other similar record.

 

Cold Deck Imputation:

In this method we can replace missing value with single fixed value. Value must be from external source, which mean it can not be from current sample. Fixed value means if in particular attribute we are missing 5 values, then in all cases we have to fill same value.

 

 Next Article: Population Sampling and Need of Sampling

Mandeep Singh

www.mandeepvba.blogspot.com

Rate this Article: 0 / 5 stars - 0 vote(s)
Print Email Re-Publish

Add new Comment



Captcha

  • Latest Information Technology Articles
  • More from Mandeep Singh

Toshiba Regza 37XV635 LCD TV Review

By: Scotty Hughes | 06/01/2010
Toshiba has come up with the cutting-edge in LCD television engineering with the Toshiba Regza 37XV635.It easily merges a stylish blueprint with state of the art technology. You will be astounded by this unmatched television system. The Toshiba Regza 37XV635 is equipped with the latest in TV technology known as the MetaBrain. It controls certain TV functions in order to achieve better performance. Features such as Active Picture Processing, Resolution+, AutoView function and Dolby V...

Examsoon 000-284 test questions

By: Adela1987 | 06/01/2010
000-284 test questions from Examsoon will be the most reliable source for a good quality. With much thorough analysis of the feedback from thousands of certified experts, we are able to determine which providers will provide you with updated and relevant 000-284 practice questions and good quality 000-284 practice test.

Pass4side 000-543 braindumps

By: Adela1987 | 06/01/2010
Pass4side Practice Exams for IBM 000-543 are written to the highest standards of technical accuracy, using only certified subject matter experts and published authors for development.

Examsoon 000-154 braindumps exam

By: Adela1987 | 06/01/2010
We provides high quality IBM 000-154 braindumps exam. It is the best and the lastest IBM Practice Exams. Furthermore, we are constantly updating our Examsoon 000-154 Exam. These Exam Resources updates are supplied free of charge to Examsoon customers. If you have any question about Examsoon 000-154 braindumps, please feel free to contact us at any time.

Examsoon 000-936 exam Guide

By: Adela1987 | 06/01/2010
The IBM 000-936 exam Guide in pdf format that can also be downloaded from our training package. The IBM 000-936 Braindumps are also available for free in our training package to practice for the exams. These Braindumps can be really very useful to prepare exam. You will find on Braindumps are given by people which have gone through the same exam that you are about to do. They are seasoned experts and know the ins and outs of various IBM 000-936 Exams and others to give you an added edge when yo

Examsoon 000-817 exam

By: Adela1987 | 06/01/2010
Examsoon 000-817 study guide will introduce you to the core logic of various subjects so that you not only learn, but you also understand various technologies and subjects. We guarantee that using our 000-817 practice test will adequately prepare you for your 000-817 exam, and set you up to pass your 000-817 exam the First Time.

Examsoon 000-101 Exam Objectives

By: Adela1987 | 06/01/2010
You can find a better solution to your 000-101 preparation needs than Examsoon Links. Our 000-101 Free Notes, IBM 000-101 Sample Questions and 000-101 Brain dumps are reliable and are updated regularly with the changing IBM 000-101 Exam Objectives to give you the most accurate 000-101 Study Material possible. You can trust on our 000-101 Free Notes, IBM 000-101 Sample Questions and 000-101 Free Notes for a successful preparation of IBM 000-101 Certification Exam.

Examsoon 006-002 exam Trainings

By: Adela1987 | 06/01/2010
Examsoon 006-002 is written to coincide with the real test by the experienced IT experts and specialists. With the complete collection of Questions and Answers, Examsoon 006-002 is high enough to help the candidates to pass this exam easily without any other study materials and no need to attend the expensive training class.

Need of Sampling and Sampling Methods

By: Mandeep Singh | 24/09/2008 | Marketing Tips
In the area of Marketing Research Sampling is very important topic. If our initial steps are not correct we can never get those results that we expect from any marketing campaign. So in this article we will discuss about need of sampling and sampling methods.

How to Handle Missing Values

By: Mandeep Singh | 27/08/2008 | Information Technology
To analyse the situation first thing you need is historical data (relevant to the problem). Once you have data, its need to be in proper format so it can be easily analyse. There may be number of problems you have to face while bringing the data into proper format. One problem is how to handle missing or inconsistent data. In this article we go through few basic techniques that used to solve this problem.

Data Mining Phases/process

By: Mandeep Singh | 24/08/2008 | Project Management
Data Mining can be very complex if you are not sure what exactly you have to do in the whole process. Before you perform any data mining task one thing is sure you have some sort of problem in hand and your intention is to solve it with best possible solution. This article will help you to understand the phases involved in the Data Mining process, and what responsibilities we need to perform at each phase. We will also discuss about CRISP-DM (Data mining standard used in industry) in detail.

Submit Your Articles Free: Signup
Article Categories




Use of this web site constitutes acceptance of the Terms Of Use and Privacy Policy | User published content is licensed under a Creative Commons License.
Copyright © 2005-2008 Free Articles by ArticlesBase.com, All rights reserved. (0.13, 1, w3)