You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Jared R (JIRA)" <ji...@apache.org> on 2017/10/24 15:14:00 UTC

[jira] [Commented] (ACCUMULO-4730) Create an Entry length summarizer

    [ https://issues.apache.org/jira/browse/ACCUMULO-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217082#comment-16217082 ] 

Jared R commented on ACCUMULO-4730:
-----------------------------------

I will take a look at this as a first ticket project.

> Create an Entry length summarizer
> ---------------------------------
>
>                 Key: ACCUMULO-4730
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4730
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>             Fix For: 2.0.0
>
>
> It would be very useful to have a built in [Summarizer|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/summary/Summarizer.java] that computes summary information about field lengths.  Specifically key length, row length, family length, qualifier length, visibility length, and value length.   Whatever stats are computed must be able to computed incrementally.  For example can incrementally compute min, max, count, sum, and log2 histogram.  I think these would be good stats to start with.  Count and sum can be used to compute the average.  There is an example of computing a log2 histogram in the Summarizer javadoc.
> The Summarizer could be named EntryLenghtSummarizer and possibly produce summaries like the following.  
> {noformat}
> count=XXX     //do not need to track this per field, its the same for all
> key.min=XXX
> key.max=XXX
> key.sum=XXX
> key.logHist.8=XXX   //only output non zero exponents 
> key.logHist.9=XXX
> row.min=XXX
> row.max=XXX
> row.sum=XXX
> row.logHist.7=XXX
> row.logHist.8=XXX
> row.logHist.10=XXX
> family.min=XXX
> family.max=XXX
> family.sum=XXX
> family.logHist.6=XXX
> family.logHist.7=XXX
> etc...
> {noformat}
> This new summarizer would be placed in the [summarizers|https://github.com/apache/accumulo/tree/master/core/src/main/java/org/apache/accumulo/core/client/summary/summarizers] package.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)