You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Edward Yoon (JIRA)" <ji...@apache.org> on 2007/10/07 04:07:50 UTC

[jira] Created: (HADOOP-2006) Aggregate Functions in select statement

Aggregate Functions in select statement
---------------------------------------

                 Key: HADOOP-2006
                 URL: https://issues.apache.org/jira/browse/HADOOP-2006
             Project: Hadoop
          Issue Type: Sub-task
            Reporter: Edward Yoon




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

Posted by edward yoon <we...@udanax.org>.

Sorry for my mistake...

> If you mistake that standard sql is all of A-DBMS capacity,
> I think you don't want to studies about database structure, access algorithms, philosophies,.., etc of A-DBMS.
>
> Then, Can i make you use the A-DBMS's 100% Full capacity by force?
>
> Or....
>
> Let's assume the A-DBMS didn't provide standard sql.
> Are you want to use the A-DBMS?

>> Do you want to use the A-DBMS? 

>
> Ok..
> If you want to use the A-DBMS, you already didn't thought the sql isn't all of A-DBMS.

>> If you want to use the A-DBMS, you already thought the sql isn't all of A-DBMS.

> So, conclusion?
> The more affluent the hbase shell, the use of hbase will be growing very rapidly.


------------------------------

B. Regards,

Edward yoon @ NHN, corp.
Home : http://www.udanax.org


> From: webmaster@udanax.org
> To: hadoop-dev@lucene.apache.org
> Subject: RE: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement
> Date: Thu, 6 Dec 2007 01:10:14 +0000
>
>
>>> it will encourage people to think that the shell is a good way to interact with HBase in general...
>
> I think this is a key point. :)
>
> The Hbase Shell aim is to improve the work's efficiency, without research of specified knowledge.
> I'll makes an accessory for database access methods on Hbase.
> Also, i'm thinking about Matrix operations on Hbase.
>
> But, ... Hbase Shell just a one of applications on Hbase.
>
> ...
> Let's think.
>
> If you mistake that standard sql is all of A-DBMS capacity,
> I think you don't want to studies about database structure, access algorithms, philosophies,.., etc of A-DBMS.
>
> Then, Can i make you use the A-DBMS's 100% Full capacity by force?
>
> Or....
>
> Let's assume the A-DBMS didn't provide standard sql.
> Are you want to use the A-DBMS?
>
> Ok..
> If you want to use the A-DBMS, you already didn't thought the sql isn't all of A-DBMS.
>
> So, conclusion?
> The more affluent the hbase shell, the use of hbase will be growing very rapidly.
>
>
> ------------------------------
>
> B. Regards,
>
> Edward yoon @ NHN, corp.
> Home : http://www.udanax.org
>
>
>> From: bryan@rapleaf.com
>> Subject: Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement
>> Date: Wed, 5 Dec 2007 15:50:50 -0800
>> To: hadoop-dev@lucene.apache.org
>>
>> If you have a table with something like a billion rows, and do an
>> aggregate function on the table from the shell, you will end up
>> reading all billion rows through a single machine, essentially
>> aggregating the entire dataset locally. This defeats the purpose of
>> having a massively distributed database like HBase. To do this more
>> efficiently, you'd ideally kick of a Map Reduce job that can perform
>> the various aggregation function on the dataset in parallel,
>> harnessing the power of the distributed dataset, and then returning
>> the results to a central location once they are calculated.
>>
>> I think putting this option into the shell is risky, because it will
>> encourage people to think that the shell is a good way to interact
>> with HBase in general, which it isn't. We want people to understand
>> HBase is best consumed in parallel and discourage solutions that
>> aggregate access through a single point. As such, we shouldn't build
>> features that allow people to inadvertently use the wrong access
>> patterns.
>>
>> On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote:
>>
>>>
>>> [ https://issues.apache.org/jira/browse/HADOOP-2006?
>>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
>>> tabpanel#action_12548879 ]
>>>
>>> Edward Yoon commented on HADOOP-2006:
>>> -------------------------------------
>>>
>>> I don't understand your comment.
>>> Please more explanation for me.
>>>
>>>> Aggregate Functions in select statement
>>>> ---------------------------------------
>>>>
>>>> Key: HADOOP-2006
>>>> URL: https://issues.apache.org/jira/browse/
>>>> HADOOP-2006
>>>> Project: Hadoop
>>>> Issue Type: Sub-task
>>>> Components: contrib/hbase
>>>> Affects Versions: 0.14.1
>>>> Reporter: Edward Yoon
>>>> Assignee: Edward Yoon
>>>> Priority: Minor
>>>> Fix For: 0.16.0
>>>>
>>>>
>>>> Aggregation functions on collections of data values: average,
>>>> minimum, maximum, sum, count.
>>>> Group rows by value of an columnfamily and apply aggregate
>>>> function independently to each group of rows.
>>>> * ƒ ~function_list~ (Relation)
>>>> {code}
>>>> select producer, avg(year) from movieLog_table group by producer
>>>> {code}
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>
>
> _________________________________________________________________
> Put your friends on the big screen with Windows Vista® + Windows Live™.
> http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_102007

_________________________________________________________________
You keep typing, we keep giving. Download Messenger and join the i’m Initiative now.
http://im.live.com/messenger/im/home/?source=TAGLM

RE: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

Posted by edward yoon <we...@udanax.org>.

>> it will encourage people to think that the shell is a good way to interact with HBase in general...

I think this is a key point. :)

The Hbase Shell aim is to improve the work's efficiency, without research of specified knowledge.
I'll makes an accessory for database access methods on Hbase.
Also, i'm thinking about Matrix operations on Hbase.

But, ... Hbase Shell just a one of applications on Hbase.

...
Let's think.

If you mistake that standard sql is all of A-DBMS capacity, 
I think you don't want to studies about database structure, access algorithms, philosophies,.., etc of A-DBMS.

Then, Can i make you use the A-DBMS's 100% Full capacity by force?

Or.... 

Let's assume the A-DBMS didn't provide standard sql.
Are you want to use the A-DBMS?

Ok.. 
If you want to use the A-DBMS, you already didn't thought the sql isn't all of A-DBMS.

So, conclusion?
The more affluent the hbase shell, the use of hbase will be growing very rapidly.


------------------------------

B. Regards,

Edward yoon @ NHN, corp.
Home : http://www.udanax.org


> From: bryan@rapleaf.com
> Subject: Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement
> Date: Wed, 5 Dec 2007 15:50:50 -0800
> To: hadoop-dev@lucene.apache.org
>
> If you have a table with something like a billion rows, and do an
> aggregate function on the table from the shell, you will end up
> reading all billion rows through a single machine, essentially
> aggregating the entire dataset locally. This defeats the purpose of
> having a massively distributed database like HBase. To do this more
> efficiently, you'd ideally kick of a Map Reduce job that can perform
> the various aggregation function on the dataset in parallel,
> harnessing the power of the distributed dataset, and then returning
> the results to a central location once they are calculated.
>
> I think putting this option into the shell is risky, because it will
> encourage people to think that the shell is a good way to interact
> with HBase in general, which it isn't. We want people to understand
> HBase is best consumed in parallel and discourage solutions that
> aggregate access through a single point. As such, we shouldn't build
> features that allow people to inadvertently use the wrong access
> patterns.
>
> On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote:
>
>>
>> [ https://issues.apache.org/jira/browse/HADOOP-2006?
>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> tabpanel#action_12548879 ]
>>
>> Edward Yoon commented on HADOOP-2006:
>> -------------------------------------
>>
>> I don't understand your comment.
>> Please more explanation for me.
>>
>>> Aggregate Functions in select statement
>>> ---------------------------------------
>>>
>>> Key: HADOOP-2006
>>> URL: https://issues.apache.org/jira/browse/
>>> HADOOP-2006
>>> Project: Hadoop
>>> Issue Type: Sub-task
>>> Components: contrib/hbase
>>> Affects Versions: 0.14.1
>>> Reporter: Edward Yoon
>>> Assignee: Edward Yoon
>>> Priority: Minor
>>> Fix For: 0.16.0
>>>
>>>
>>> Aggregation functions on collections of data values: average,
>>> minimum, maximum, sum, count.
>>> Group rows by value of an columnfamily and apply aggregate
>>> function independently to each group of rows.
>>> *  ƒ ~function_list~ (Relation)
>>> {code}
>>> select producer, avg(year) from movieLog_table group by producer
>>> {code}
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>

_________________________________________________________________
Put your friends on the big screen with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_102007

Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

Posted by Bryan Duxbury <br...@rapleaf.com>.

If you have a table with something like a billion rows, and do an  
aggregate function on the table from the shell, you will end up  
reading all billion rows through a single machine, essentially  
aggregating the entire dataset locally. This defeats the purpose of  
having a massively distributed database like HBase. To do this more  
efficiently, you'd ideally kick of a Map Reduce job that can perform  
the various aggregation function on the dataset in parallel,  
harnessing the power of the distributed dataset, and then returning  
the results to a central location once they are calculated.

I think putting this option into the shell is risky, because it will  
encourage people to think that the shell is a good way to interact  
with HBase in general, which it isn't. We want people to understand  
HBase is best consumed in parallel and discourage solutions that  
aggregate access through a single point. As such, we shouldn't build  
features that allow people to inadvertently use the wrong access  
patterns.

On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/HADOOP-2006? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12548879 ]
>
> Edward Yoon commented on HADOOP-2006:
> -------------------------------------
>
> I don't understand your comment.
> Please more explanation for me.
>
>> Aggregate Functions in select statement
>> ---------------------------------------
>>
>>                 Key: HADOOP-2006
>>                 URL: https://issues.apache.org/jira/browse/ 
>> HADOOP-2006
>>             Project: Hadoop
>>          Issue Type: Sub-task
>>          Components: contrib/hbase
>>    Affects Versions: 0.14.1
>>            Reporter: Edward Yoon
>>            Assignee: Edward Yoon
>>            Priority: Minor
>>             Fix For: 0.16.0
>>
>>
>> Aggregation functions on collections of data values: average,  
>> minimum, maximum, sum, count.
>> Group rows by value of an columnfamily and apply aggregate  
>> function independently to each group of rows.
>>  * <Grouping columnfamilies>  ƒ ~function_list~ (Relation)
>> {code}
>> select producer, avg(year) from movieLog_table group by producer
>> {code}
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>

[jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548879 ] 

Edward Yoon commented on HADOOP-2006:
-------------------------------------

I don't understand your comment.
Please more explanation for me.

> Aggregate Functions in select statement
> ---------------------------------------
>
>                 Key: HADOOP-2006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2006
>             Project: Hadoop
>          Issue Type: Sub-task
>          Components: contrib/hbase
>    Affects Versions: 0.14.1
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> Aggregation functions on collections of data values: average, minimum, maximum, sum, count.
> Group rows by value of an columnfamily and apply aggregate function independently to each group of rows.
>  * <Grouping columnfamilies>  ƒ ~function_list~ (Relation)
> {code}
> select producer, avg(year) from movieLog_table group by producer
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2006) Aggregate Functions in select statement

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532944 ] 

udanax edited comment on HADOOP-2006 at 10/6/07 7:19 PM:
--------------------------------------------------------------

{code}
SELECT { column_name [, column_name] ... | * | expr[alias] }
    FROM table_name
    [WHERE selection_condition] 
    [GROUP BY expr [,expr] ...]
    [ORDER BY {expr|position} 
    [ASC|DESC][,expr|position}[ASC|DESC]

    [NUM_VERSIONS = version_count]
    [TIMESTAMP 'timestamp']
    [LIMIT = row_count]
    [INTO FILE 'file_name'] 
{code}

      was (Author: udanax):
    SELECT { column_name [, column_name] ... | * | expr[alias] }
    FROM table_name
    [WHERE selection_condition] 
    [GROUP BY expr [,expr] ...]
    [ORDER BY {expr|position} 
    [ASC|DESC][,expr|position}[ASC|DESC]

    [NUM_VERSIONS = version_count]
    [TIMESTAMP 'timestamp']
    [LIMIT = row_count]
    [INTO FILE 'file_name'] 
  
> Aggregate Functions in select statement
> ---------------------------------------
>
>                 Key: HADOOP-2006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2006
>             Project: Hadoop
>          Issue Type: Sub-task
>          Components: contrib/hbase
>    Affects Versions: 0.14.1
>            Reporter: Edward Yoon
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> Aggregation functions on collections of data values: average, minimum, maximum, sum, count.
> Group rows by value of an columnfamily and apply aggregate function independently to each group of rows.
>  * <Grouping columnfamilies>  ƒ ~function_list~ (Relation)
> {code}
> select producer, avg(year) from movieLog_table group by producer
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2006) Aggregate Functions in select statement

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Yoon updated HADOOP-2006:
--------------------------------

          Component/s: contrib/hbase
        Fix Version/s: 0.16.0
             Priority: Minor  (was: Major)
          Description: 
Aggregation functions on collections of data values: average, minimum, maximum, sum, count.
Group rows by value of an columnfamily and apply aggregate function independently to each group of rows.

 * <Grouping columnfamilies>  ƒ ~function_list~ (Relation)

{code}
select producer, avg(year) from movieLog_table group by producer
{code}
    Affects Version/s: 0.14.1

Update description.

> Aggregate Functions in select statement
> ---------------------------------------
>
>                 Key: HADOOP-2006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2006
>             Project: Hadoop
>          Issue Type: Sub-task
>          Components: contrib/hbase
>    Affects Versions: 0.14.1
>            Reporter: Edward Yoon
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> Aggregation functions on collections of data values: average, minimum, maximum, sum, count.
> Group rows by value of an columnfamily and apply aggregate function independently to each group of rows.
>  * <Grouping columnfamilies>  ƒ ~function_list~ (Relation)
> {code}
> select producer, avg(year) from movieLog_table group by producer
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-2006) Aggregate Functions in select statement

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HADOOP-2006:
-------------------------------------

    Assignee: Edward Yoon

> Aggregate Functions in select statement
> ---------------------------------------
>
>                 Key: HADOOP-2006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2006
>             Project: Hadoop
>          Issue Type: Sub-task
>          Components: contrib/hbase
>    Affects Versions: 0.14.1
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> Aggregation functions on collections of data values: average, minimum, maximum, sum, count.
> Group rows by value of an columnfamily and apply aggregate function independently to each group of rows.
>  * <Grouping columnfamilies>  ƒ ~function_list~ (Relation)
> {code}
> select producer, avg(year) from movieLog_table group by producer
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

Posted by "Edward Yoon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532944 ] 

Edward Yoon commented on HADOOP-2006:
-------------------------------------

SELECT { column_name [, column_name] ... | * | expr[alias] }
    FROM table_name
    [WHERE selection_condition] 
    [GROUP BY expr [,expr] ...]
    [ORDER BY {expr|position} 
    [ASC|DESC][,expr|position}[ASC|DESC]

    [NUM_VERSIONS = version_count]
    [TIMESTAMP 'timestamp']
    [LIMIT = row_count]
    [INTO FILE 'file_name'] 

> Aggregate Functions in select statement
> ---------------------------------------
>
>                 Key: HADOOP-2006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2006
>             Project: Hadoop
>          Issue Type: Sub-task
>          Components: contrib/hbase
>    Affects Versions: 0.14.1
>            Reporter: Edward Yoon
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> Aggregation functions on collections of data values: average, minimum, maximum, sum, count.
> Group rows by value of an columnfamily and apply aggregate function independently to each group of rows.
>  * <Grouping columnfamilies>  ƒ ~function_list~ (Relation)
> {code}
> select producer, avg(year) from movieLog_table group by producer
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548874 ] 

Bryan Duxbury commented on HADOOP-2006:
---------------------------------------

This seems like a bad idea. You could have TONS of data, and aggregating it in one place would take forever. If you want to produce aggregate info, you should probably fire off a Map Reduce job, no?

> Aggregate Functions in select statement
> ---------------------------------------
>
>                 Key: HADOOP-2006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2006
>             Project: Hadoop
>          Issue Type: Sub-task
>          Components: contrib/hbase
>    Affects Versions: 0.14.1
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> Aggregation functions on collections of data values: average, minimum, maximum, sum, count.
> Group rows by value of an columnfamily and apply aggregate function independently to each group of rows.
>  * <Grouping columnfamilies>  ƒ ~function_list~ (Relation)
> {code}
> select producer, avg(year) from movieLog_table group by producer
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.