You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/05/09 19:59:15 UTC

[jira] Created: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
------------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-1342
                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
             Project: Hadoop
          Issue Type: Improvement
            Reporter: Runping Qi



In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1342:
---------------------------------

    Fix Version/s: 0.14.0
           Status: Open  (was: Patch Available)

I'm not sure what's changed, but this no longer passes unit tests against trunk.

Testcase: testAggregates took 1.59 sec
        FAILED
expected:<...5...> but was:<...9...>



> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495185 ] 

Runping Qi commented on HADOOP-1342:
------------------------------------


That explained why the unit test failed.

The patch failed to apply because r537300 did some format change on TestAggregates.java, which caused conflicts.

I will re-generate the patch next.




> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495962 ] 

Hadoop QA commented on HADOOP-1342:
-----------------------------------

Integrated in Hadoop-Nightly #89 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/89/)

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-1342:
-------------------------------

    Attachment: patch-1342.txt

This patch added a limit on the number of unique values for UniqueValueCount aggregator. If the actual number of values is greater than the limit, the counter will be limit + 1.

The limit is under the attribute name: "aggregate.max.num.unique.values".
It can be set by calling job.setLong("aggregate.max.num.unique.values", 200).
The default is Long.MAX_VALUE (same as the current behavior).


> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-1342:
-------------------------------

    Attachment: patch-1342.txt


A new patch with conflict with trunk resolved.

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1342:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed  this.  Thanks, Runping!

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495191 ] 

Hadoop QA commented on HADOOP-1342:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12357147/patch-1342.txt applied and successfully tested against trunk revision r537295.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/136/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/136/console

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495183 ] 

Doug Cutting commented on HADOOP-1342:
--------------------------------------

The patch simply fails to apply to trunk.

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495064 ] 

Hadoop QA commented on HADOOP-1342:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12357108/patch-1342.txt applied and successfully tested against trunk revision r536583.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/132/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/132/console

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-1342:
-------------------------------

    Status: Patch Available  (was: Open)

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi reassigned HADOOP-1342:
----------------------------------

    Assignee: Runping Qi

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495178 ] 

Runping Qi commented on HADOOP-1342:
------------------------------------


Looks like the changes made on TestAggregates part were applied, but the changes on the aggregate code did not.

Can you try to re-apply the patch? Or send me the following files in your trunk:

ValueAggregatorBaseDescriptor.java and 
UniqValueCount.java

so that I can take a look at them.


> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-1342:
-------------------------------

    Attachment:     (was: patch-1342.txt)

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1342) A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-1342:
-------------------------------

    Status: Patch Available  (was: Open)

> A configurable limit on the number of unique values should be set on the UniqueValueCount and ValueHistogram aggregators
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1342
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1342
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>             Fix For: 0.14.0
>
>         Attachments: patch-1342.txt
>
>
> In the current implementation, the uniq number of values may increase unbounded, causing out of memory eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.