You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by iwannaplay games <fu...@gmail.com> on 2012/10/05 09:02:28 UTC

Multiple Aggregate functions in map reduce program

Hi All,

I have to get the count and sum of data
for eg if my  table is


*employeename   salary   department*
A                       1000     testing
B                       2000     testing
C                       3000     development
D                       4000     testing
E                       1000     development
F                       5000     management



I want result like

Department       TotalSalary      count(employees)

testing                    7000                 3
development           4000                  2
management           5000                  1


Please let me know whether it is possible to write a java map reduce for
this.I tried this on hive.It takes time for big data.I heard map reduce
java code will b faster.IS it true???Or i should go for pig programming??

Please guide..


Regards
Prabhjot

Re: Multiple Aggregate functions in map reduce program

Posted by Bertrand Dechoux <de...@gmail.com>.

>
> .It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>

I guess one important question is what do you mean by 'it takes time'. And
what goal do you want to reach.
It may be that your current implementation is naive and can be improved
(which begs for the question : what is your current implementation?).
Or it may be simply that given your data volume and cluster capacity, you
can not reduce greatly the time.

Anyway, please do not post across multiple mailing lists at the same time,
especially when not related to your problem. I have a hard time figuring
why user@hbase.apache.org is targeted when there is no mention of HBase in
your message. Anyone answering this message, please answer only to
user@hadoop.apache.org.

Regards

Bertrand


On Fri, Oct 5, 2012 at 9:18 AM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi
>
> It is definitely possible. In your map make the dept name as the output
> key and salary as the value.
>
> In the reducer for every key you can initialize a counter and a sum. Add
> on to the sum for all values and increment the counter by 1 for each value.
> Output the dept key and the new aggregated sum and count for each key.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * iwannaplay games <fu...@gmail.com>
> *Date: *Fri, 5 Oct 2012 12:32:28 +0530
> *To: *user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<
> hdfs-user@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Multiple Aggregate functions in map reduce program
>
> Hi All,
>
> I have to get the count and sum of data
> for eg if my  table is
>
>
> *employeename   salary   department*
>
> A                       1000     testing
> B                       2000     testing
> C                       3000     development
> D                       4000     testing
> E                       1000     development
> F                       5000     management
>
>
>
> I want result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
> development           4000                  2
> management           5000                  1
>
>
> Please let me know whether it is possible to write a java map reduce for
> this.I tried this on hive.It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>
> Please guide..
>
>
> Regards
> Prabhjot
>



-- 
Bertrand Dechoux

Re: Multiple Aggregate functions in map reduce program

Posted by Bertrand Dechoux <de...@gmail.com>.

>
> .It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>

I guess one important question is what do you mean by 'it takes time'. And
what goal do you want to reach.
It may be that your current implementation is naive and can be improved
(which begs for the question : what is your current implementation?).
Or it may be simply that given your data volume and cluster capacity, you
can not reduce greatly the time.

Anyway, please do not post across multiple mailing lists at the same time,
especially when not related to your problem. I have a hard time figuring
why user@hbase.apache.org is targeted when there is no mention of HBase in
your message. Anyone answering this message, please answer only to
user@hadoop.apache.org.

Regards

Bertrand


On Fri, Oct 5, 2012 at 9:18 AM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi
>
> It is definitely possible. In your map make the dept name as the output
> key and salary as the value.
>
> In the reducer for every key you can initialize a counter and a sum. Add
> on to the sum for all values and increment the counter by 1 for each value.
> Output the dept key and the new aggregated sum and count for each key.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * iwannaplay games <fu...@gmail.com>
> *Date: *Fri, 5 Oct 2012 12:32:28 +0530
> *To: *user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<
> hdfs-user@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Multiple Aggregate functions in map reduce program
>
> Hi All,
>
> I have to get the count and sum of data
> for eg if my  table is
>
>
> *employeename   salary   department*
>
> A                       1000     testing
> B                       2000     testing
> C                       3000     development
> D                       4000     testing
> E                       1000     development
> F                       5000     management
>
>
>
> I want result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
> development           4000                  2
> management           5000                  1
>
>
> Please let me know whether it is possible to write a java map reduce for
> this.I tried this on hive.It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>
> Please guide..
>
>
> Regards
> Prabhjot
>



-- 
Bertrand Dechoux

Re: Multiple Aggregate functions in map reduce program

Posted by Bertrand Dechoux <de...@gmail.com>.

>
> .It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>

I guess one important question is what do you mean by 'it takes time'. And
what goal do you want to reach.
It may be that your current implementation is naive and can be improved
(which begs for the question : what is your current implementation?).
Or it may be simply that given your data volume and cluster capacity, you
can not reduce greatly the time.

Anyway, please do not post across multiple mailing lists at the same time,
especially when not related to your problem. I have a hard time figuring
why user@hbase.apache.org is targeted when there is no mention of HBase in
your message. Anyone answering this message, please answer only to
user@hadoop.apache.org.

Regards

Bertrand


On Fri, Oct 5, 2012 at 9:18 AM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi
>
> It is definitely possible. In your map make the dept name as the output
> key and salary as the value.
>
> In the reducer for every key you can initialize a counter and a sum. Add
> on to the sum for all values and increment the counter by 1 for each value.
> Output the dept key and the new aggregated sum and count for each key.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * iwannaplay games <fu...@gmail.com>
> *Date: *Fri, 5 Oct 2012 12:32:28 +0530
> *To: *user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<
> hdfs-user@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Multiple Aggregate functions in map reduce program
>
> Hi All,
>
> I have to get the count and sum of data
> for eg if my  table is
>
>
> *employeename   salary   department*
>
> A                       1000     testing
> B                       2000     testing
> C                       3000     development
> D                       4000     testing
> E                       1000     development
> F                       5000     management
>
>
>
> I want result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
> development           4000                  2
> management           5000                  1
>
>
> Please let me know whether it is possible to write a java map reduce for
> this.I tried this on hive.It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>
> Please guide..
>
>
> Regards
> Prabhjot
>



-- 
Bertrand Dechoux

Re: Multiple Aggregate functions in map reduce program

Posted by Bertrand Dechoux <de...@gmail.com>.

>
> .It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>

I guess one important question is what do you mean by 'it takes time'. And
what goal do you want to reach.
It may be that your current implementation is naive and can be improved
(which begs for the question : what is your current implementation?).
Or it may be simply that given your data volume and cluster capacity, you
can not reduce greatly the time.

Anyway, please do not post across multiple mailing lists at the same time,
especially when not related to your problem. I have a hard time figuring
why user@hbase.apache.org is targeted when there is no mention of HBase in
your message. Anyone answering this message, please answer only to
user@hadoop.apache.org.

Regards

Bertrand


On Fri, Oct 5, 2012 at 9:18 AM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi
>
> It is definitely possible. In your map make the dept name as the output
> key and salary as the value.
>
> In the reducer for every key you can initialize a counter and a sum. Add
> on to the sum for all values and increment the counter by 1 for each value.
> Output the dept key and the new aggregated sum and count for each key.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * iwannaplay games <fu...@gmail.com>
> *Date: *Fri, 5 Oct 2012 12:32:28 +0530
> *To: *user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<
> hdfs-user@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Multiple Aggregate functions in map reduce program
>
> Hi All,
>
> I have to get the count and sum of data
> for eg if my  table is
>
>
> *employeename   salary   department*
>
> A                       1000     testing
> B                       2000     testing
> C                       3000     development
> D                       4000     testing
> E                       1000     development
> F                       5000     management
>
>
>
> I want result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
> development           4000                  2
> management           5000                  1
>
>
> Please let me know whether it is possible to write a java map reduce for
> this.I tried this on hive.It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>
> Please guide..
>
>
> Regards
> Prabhjot
>



-- 
Bertrand Dechoux

Re: Multiple Aggregate functions in map reduce program

Posted by Bertrand Dechoux <de...@gmail.com>.

>
> .It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>

I guess one important question is what do you mean by 'it takes time'. And
what goal do you want to reach.
It may be that your current implementation is naive and can be improved
(which begs for the question : what is your current implementation?).
Or it may be simply that given your data volume and cluster capacity, you
can not reduce greatly the time.

Anyway, please do not post across multiple mailing lists at the same time,
especially when not related to your problem. I have a hard time figuring
why user@hbase.apache.org is targeted when there is no mention of HBase in
your message. Anyone answering this message, please answer only to
user@hadoop.apache.org.

Regards

Bertrand


On Fri, Oct 5, 2012 at 9:18 AM, Bejoy KS <be...@gmail.com> wrote:

> **
> Hi
>
> It is definitely possible. In your map make the dept name as the output
> key and salary as the value.
>
> In the reducer for every key you can initialize a counter and a sum. Add
> on to the sum for all values and increment the counter by 1 for each value.
> Output the dept key and the new aggregated sum and count for each key.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * iwannaplay games <fu...@gmail.com>
> *Date: *Fri, 5 Oct 2012 12:32:28 +0530
> *To: *user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<
> hdfs-user@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Multiple Aggregate functions in map reduce program
>
> Hi All,
>
> I have to get the count and sum of data
> for eg if my  table is
>
>
> *employeename   salary   department*
>
> A                       1000     testing
> B                       2000     testing
> C                       3000     development
> D                       4000     testing
> E                       1000     development
> F                       5000     management
>
>
>
> I want result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
> development           4000                  2
> management           5000                  1
>
>
> Please let me know whether it is possible to write a java map reduce for
> this.I tried this on hive.It takes time for big data.I heard map reduce
> java code will b faster.IS it true???Or i should go for pig programming??
>
> Please guide..
>
>
> Regards
> Prabhjot
>



-- 
Bertrand Dechoux

Re: Multiple Aggregate functions in map reduce program

Posted by Bejoy KS <be...@gmail.com>.

Hi 

It is definitely possible. In your map make the dept name as the output key and salary as the value.

In the reducer for every key you can initialize a counter and a sum. Add on to the sum for all values and increment the counter by 1 for each value. Output the dept key and the new aggregated sum and count for each key.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: iwannaplay games <fu...@gmail.com>
Date: Fri, 5 Oct 2012 12:32:28 
To: user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<hd...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Multiple Aggregate functions in map reduce program

Hi All,

I have to get the count and sum of data
for eg if my  table is


*employeename   salary   department*
A                       1000     testing
B                       2000     testing
C                       3000     development
D                       4000     testing
E                       1000     development
F                       5000     management



I want result like

Department       TotalSalary      count(employees)

testing                    7000                 3
development           4000                  2
management           5000                  1


Please let me know whether it is possible to write a java map reduce for
this.I tried this on hive.It takes time for big data.I heard map reduce
java code will b faster.IS it true???Or i should go for pig programming??

Please guide..


Regards
Prabhjot

Re: Multiple Aggregate functions in map reduce program

Posted by Khang Pham <kh...@gmail.com>.

Hi,

ideally you want to "scan" through data once and the the (sum,count).

One simple solution is write your own map-reduce with key = department,
value = new VectorWritable(vector);

With vector is an array which array[0] = salary, array[1] = 1.

In the reduce phase all you need is to do the aggregation on array[0] and
array[1] properly.

The reduce value is also array[0] = sum of salary, array[1] = sum of
employees.

This is common problem, I think others might have better solutions.

-- Khang

On Fri, Oct 5, 2012 at 3:02 PM, iwannaplay games <funnlearnforkids@gmail.com
> wrote:

> ant result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
>

Re: Multiple Aggregate functions in map reduce program

Posted by Khang Pham <kh...@gmail.com>.

Hi,

ideally you want to "scan" through data once and the the (sum,count).

One simple solution is write your own map-reduce with key = department,
value = new VectorWritable(vector);

With vector is an array which array[0] = salary, array[1] = 1.

In the reduce phase all you need is to do the aggregation on array[0] and
array[1] properly.

The reduce value is also array[0] = sum of salary, array[1] = sum of
employees.

This is common problem, I think others might have better solutions.

-- Khang

On Fri, Oct 5, 2012 at 3:02 PM, iwannaplay games <funnlearnforkids@gmail.com
> wrote:

> ant result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
>

Re: Multiple Aggregate functions in map reduce program

Posted by Bejoy KS <be...@gmail.com>.

Hi 

It is definitely possible. In your map make the dept name as the output key and salary as the value.

In the reducer for every key you can initialize a counter and a sum. Add on to the sum for all values and increment the counter by 1 for each value. Output the dept key and the new aggregated sum and count for each key.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: iwannaplay games <fu...@gmail.com>
Date: Fri, 5 Oct 2012 12:32:28 
To: user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<hd...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Multiple Aggregate functions in map reduce program

Hi All,

I have to get the count and sum of data
for eg if my  table is


*employeename   salary   department*
A                       1000     testing
B                       2000     testing
C                       3000     development
D                       4000     testing
E                       1000     development
F                       5000     management



I want result like

Department       TotalSalary      count(employees)

testing                    7000                 3
development           4000                  2
management           5000                  1


Please let me know whether it is possible to write a java map reduce for
this.I tried this on hive.It takes time for big data.I heard map reduce
java code will b faster.IS it true???Or i should go for pig programming??

Please guide..


Regards
Prabhjot

Re: Multiple Aggregate functions in map reduce program

Posted by Bejoy KS <be...@gmail.com>.

Hi 

It is definitely possible. In your map make the dept name as the output key and salary as the value.

In the reducer for every key you can initialize a counter and a sum. Add on to the sum for all values and increment the counter by 1 for each value. Output the dept key and the new aggregated sum and count for each key.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: iwannaplay games <fu...@gmail.com>
Date: Fri, 5 Oct 2012 12:32:28 
To: user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<hd...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Multiple Aggregate functions in map reduce program

Hi All,

I have to get the count and sum of data
for eg if my  table is


*employeename   salary   department*
A                       1000     testing
B                       2000     testing
C                       3000     development
D                       4000     testing
E                       1000     development
F                       5000     management



I want result like

Department       TotalSalary      count(employees)

testing                    7000                 3
development           4000                  2
management           5000                  1


Please let me know whether it is possible to write a java map reduce for
this.I tried this on hive.It takes time for big data.I heard map reduce
java code will b faster.IS it true???Or i should go for pig programming??

Please guide..


Regards
Prabhjot

Re: Multiple Aggregate functions in map reduce program

Posted by Khang Pham <kh...@gmail.com>.

Hi,

ideally you want to "scan" through data once and the the (sum,count).

One simple solution is write your own map-reduce with key = department,
value = new VectorWritable(vector);

With vector is an array which array[0] = salary, array[1] = 1.

In the reduce phase all you need is to do the aggregation on array[0] and
array[1] properly.

The reduce value is also array[0] = sum of salary, array[1] = sum of
employees.

This is common problem, I think others might have better solutions.

-- Khang

On Fri, Oct 5, 2012 at 3:02 PM, iwannaplay games <funnlearnforkids@gmail.com
> wrote:

> ant result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
>

Re: Multiple Aggregate functions in map reduce program

Posted by Bejoy KS <be...@gmail.com>.

Hi 

It is definitely possible. In your map make the dept name as the output key and salary as the value.

In the reducer for every key you can initialize a counter and a sum. Add on to the sum for all values and increment the counter by 1 for each value. Output the dept key and the new aggregated sum and count for each key.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: iwannaplay games <fu...@gmail.com>
Date: Fri, 5 Oct 2012 12:32:28 
To: user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<hd...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Multiple Aggregate functions in map reduce program

Hi All,

I have to get the count and sum of data
for eg if my  table is


*employeename   salary   department*
A                       1000     testing
B                       2000     testing
C                       3000     development
D                       4000     testing
E                       1000     development
F                       5000     management



I want result like

Department       TotalSalary      count(employees)

testing                    7000                 3
development           4000                  2
management           5000                  1


Please let me know whether it is possible to write a java map reduce for
this.I tried this on hive.It takes time for big data.I heard map reduce
java code will b faster.IS it true???Or i should go for pig programming??

Please guide..


Regards
Prabhjot

Re: Multiple Aggregate functions in map reduce program

Posted by Khang Pham <kh...@gmail.com>.

Hi,

ideally you want to "scan" through data once and the the (sum,count).

One simple solution is write your own map-reduce with key = department,
value = new VectorWritable(vector);

With vector is an array which array[0] = salary, array[1] = 1.

In the reduce phase all you need is to do the aggregation on array[0] and
array[1] properly.

The reduce value is also array[0] = sum of salary, array[1] = sum of
employees.

This is common problem, I think others might have better solutions.

-- Khang

On Fri, Oct 5, 2012 at 3:02 PM, iwannaplay games <funnlearnforkids@gmail.com
> wrote:

> ant result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
>

Re: Multiple Aggregate functions in map reduce program

Posted by Bejoy KS <be...@gmail.com>.

Hi 

It is definitely possible. In your map make the dept name as the output key and salary as the value.

In the reducer for every key you can initialize a counter and a sum. Add on to the sum for all values and increment the counter by 1 for each value. Output the dept key and the new aggregated sum and count for each key.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: iwannaplay games <fu...@gmail.com>
Date: Fri, 5 Oct 2012 12:32:28 
To: user<us...@hbase.apache.org>; <us...@hadoop.apache.org>; hdfs-user<hd...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Multiple Aggregate functions in map reduce program

Hi All,

I have to get the count and sum of data
for eg if my  table is


*employeename   salary   department*
A                       1000     testing
B                       2000     testing
C                       3000     development
D                       4000     testing
E                       1000     development
F                       5000     management



I want result like

Department       TotalSalary      count(employees)

testing                    7000                 3
development           4000                  2
management           5000                  1


Please let me know whether it is possible to write a java map reduce for
this.I tried this on hive.It takes time for big data.I heard map reduce
java code will b faster.IS it true???Or i should go for pig programming??

Please guide..


Regards
Prabhjot

Re: Multiple Aggregate functions in map reduce program

Posted by Khang Pham <kh...@gmail.com>.

Hi,

ideally you want to "scan" through data once and the the (sum,count).

One simple solution is write your own map-reduce with key = department,
value = new VectorWritable(vector);

With vector is an array which array[0] = salary, array[1] = 1.

In the reduce phase all you need is to do the aggregation on array[0] and
array[1] properly.

The reduce value is also array[0] = sum of salary, array[1] = sum of
employees.

This is common problem, I think others might have better solutions.

-- Khang

On Fri, Oct 5, 2012 at 3:02 PM, iwannaplay games <funnlearnforkids@gmail.com
> wrote:

> ant result like
>
> Department       TotalSalary      count(employees)
>
> testing                    7000                 3
>