You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/01/18 23:42:30 UTC

[jira] Created: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Hadoop Abacus, a package for performing simple counting/aggregation
-------------------------------------------------------------------

                 Key: HADOOP-908
                 URL: https://issues.apache.org/jira/browse/HADOOP-908
             Project: Hadoop
          Issue Type: New Feature
          Components: contrib/streaming
            Reporter: Runping Qi


Hadoop Abacus package is a specialization of map/reduce framework, 
specilizing for performing various counting and aggregations. 
It offers similar functionalities to Google's SawZall. 

Generally speaking, in order to implement an application using Map/Reduce model, 
the developer needs to implement Map and Reduce functions (and possibly Combine function). 
However, for a lot of applications related to counting and statistics computing, 
these functions have very similar characteristics. 
Abacus abstracts out the general patterns and provides a package implementing those patterns. 
In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
for creating Abacus jobs.

To create an Abacus job, the user just needs to implement one plugin class that 
is responsible for specifying what aggregators to use and what values are for which aggregators. 
The mapper will call this class in the runtime to generate aggregation ids and values.
The generic  combiner and reducer will aggregate the values associated with the same 
aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.






-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-908:
------------------------------

    Attachment:     (was: abacus.patch)

> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-908:
------------------------------

    Status: Open  (was: Patch Available)


A updated patch is available.

> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-908:
------------------------------

    Attachment: abacus.patch

> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: abacus.patch
>
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-908:
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.11.0
           Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Runping!

> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.11.0
>
>         Attachments: abacus.patch
>
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-908:
------------------------------

    Attachment: abacus.patch

> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: abacus.patch
>
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465912 ] 

Doug Cutting commented on HADOOP-908:
-------------------------------------

This looks great!

It would be good to add a package.html in the sources, with a description of abacus.  Also the top-level build.xml should be modified so that abacus's javadoc is included as a "contrib: Abacus" group.

> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: abacus.patch
>
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi reassigned HADOOP-908:
---------------------------------

    Assignee: Runping Qi

> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Doug Judd (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465914 ] 

Doug Judd commented on HADOOP-908:
----------------------------------

One issue (or at least I assume is an issue) that I'd like to see taken care of in this toolkit is the following.  You do a big crawl of a bunch of pages and want to perform a link count computation and then do a (reverse) sort by count.  The problem is that the link counts follow a Zipfian distribution where there is a long tail of links of count 1 or 2.  Conceptualy, you can imagine situations where you literally have 1 billion links of count 1 making it infeasible to pass into a reduce function.

To get around this situation, I've created a TaggedLongWritable class.  It contains a Long and a string tag (the tag in the above case would be the link/URL).  The comparison function first compares the Long and then if they match, compares the tag.  This way, you get a numeric comparison, but two keys don't match if their tags are different.


> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: abacus.patch
>
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465911 ] 

Hadoop QA commented on HADOOP-908:
----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12349218/abacus.patch applied and successfully tested against trunk revision r497583.

> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: abacus.patch
>
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-908:
------------------------------

    Status: Patch Available  (was: Open)

The attached patch contains the package for Hadoop Abacus

> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: abacus.patch
>
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-908) Hadoop Abacus, a package for performing simple counting/aggregation

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-908:
------------------------------

    Status: Patch Available  (was: Open)


A new patch with package.html for abacus package and 
updated build.xml including javadoc for abacus.


> Hadoop Abacus, a package for performing simple counting/aggregation
> -------------------------------------------------------------------
>
>                 Key: HADOOP-908
>                 URL: https://issues.apache.org/jira/browse/HADOOP-908
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/streaming
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: abacus.patch
>
>
> Hadoop Abacus package is a specialization of map/reduce framework, 
> specilizing for performing various counting and aggregations. 
> It offers similar functionalities to Google's SawZall. 
> Generally speaking, in order to implement an application using Map/Reduce model, 
> the developer needs to implement Map and Reduce functions (and possibly Combine function). 
> However, for a lot of applications related to counting and statistics computing, 
> these functions have very similar characteristics. 
> Abacus abstracts out the general patterns and provides a package implementing those patterns. 
> In particular, the package provides a generic mapper class, a reducer class and a combiner class, 
> and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
> for creating Abacus jobs.
> To create an Abacus job, the user just needs to implement one plugin class that 
> is responsible for specifying what aggregators to use and what values are for which aggregators. 
> The mapper will call this class in the runtime to generate aggregation ids and values.
> The generic  combiner and reducer will aggregate the values associated with the same 
> aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than 
> a normal map/reduce job. Since a  built-in generic combiner is always used, the execution is very efficient.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira