You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by unmesha sreeveni <un...@gmail.com> on 2014/02/04 10:40:55 UTC

Binning for numerical dataset

I am able to normalize a given data say
100,1:2:3
101,2:3:4

into
100 1
100 2
100 3
101 2
101 3
101 4

How to do binning for a numerical data say iris.csv.

I worked out the maths behind it
Iris DataSet:  http://archive.ics.uci.edu/ml/datasets/Iris
1. find out the minimum and maximum values of each attribute
in the data set.

             Sepal Length Sepal Width Petal Length Petal Width
Min            4.3                2.0             1.0                0.1
Max            7.9               4.4             6.9                2.5

Then, we should divide the data values of each attributes into 'n' buckets .
Say, n=5.
Bucket Width= (Max - Min) /n


Eg: Sepal Length
= (7.9-4.3)/5
= 0.72
So, the intervals will be as follows :
4.3 -   5.02
5.02 - 5.74
Likewise,
5.74 -6.46
6.46 - 7.18
7.18- 7.9
continue for all attributes
How to do the same in Mapreduce .



-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

Re: Binning for numerical dataset

Posted by unmesha sreeveni <un...@gmail.com>.
To do binning in MapReduce we need to find min and max in mapper  let
mapper() pass the min,max values to reducer.then after reducer calculate
the buckets.
Is that the best way




-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

Re: Binning for numerical dataset

Posted by unmesha sreeveni <un...@gmail.com>.
To do binning in MapReduce we need to find min and max in mapper  let
mapper() pass the min,max values to reducer.then after reducer calculate
the buckets.
Is that the best way




-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

Re: Binning for numerical dataset

Posted by unmesha sreeveni <un...@gmail.com>.
To do binning in MapReduce we need to find min and max in mapper  let
mapper() pass the min,max values to reducer.then after reducer calculate
the buckets.
Is that the best way




-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

Re: Binning for numerical dataset

Posted by unmesha sreeveni <un...@gmail.com>.
To do binning in MapReduce we need to find min and max in mapper  let
mapper() pass the min,max values to reducer.then after reducer calculate
the buckets.
Is that the best way




-- 
*Thanks & Regards*

Unmesha Sreeveni U.B