You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/03/23 23:55:00 UTC

[jira] [Commented] (DRILL-7117) Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp

    [ https://issues.apache.org/jira/browse/DRILL-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16799858#comment-16799858 ] 

ASF GitHub Bot commented on DRILL-7117:
---------------------------------------

amansinha100 commented on pull request #1715: DRILL-7117: Support creation of equi-depth histogram for selected dat…
URL: https://github.com/apache/drill/pull/1715
 
 
   …a types.
   
   - This PR adds support for creating equi-depth histograms on the following data types: INT, BIGINT, FLOAT4, FLOAT8, DATE, TIME, TIMESTAMP and BOOLEAN.   No selectivity calculations have been modified yet (that will be done in a later PR).  
   
   - The histogram is built using the t-digest approximation algorithm and associated data structure.  
   Please see details in [DRILL-7117](https://issues.apache.org/jira/browse/DRILL-7117) and the parent JIRA [DRILL-6992](https://issues.apache.org/jira/browse/DRILL-6992) which contains a link to the design document. 
   
   - The same ANALYZE command used for NDV etc will also gather histograms and no new syntax has been added.  For testing, I have done a bunch of manual testing using both skewed and uniform distributions and with different data types.  Please see [DRILL-7117](https://issues.apache.org/jira/browse/DRILL-7117) for results of such testing.  No unit tests have been added yet since the bucket boundaries change slightly by the underlying t-digest.  Making this repeatable and unit-testable needs some thinking and I will do this in a follow-up PR. 
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp
> ----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7117
>                 URL: https://issues.apache.org/jira/browse/DRILL-7117
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Query Planning &amp; Optimization
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>             Fix For: 1.16.0
>
>
> This JIRA is specific to creating histograms for numeric data types: INT, BIGINT, FLOAT4, FLOAT8  and their corresponding nullable/non-nullable versions.  Additionally, since DATE/TIME/TIMESTAMP are internally stored as longs, we should allow the same numeric type histogram creation for these data types as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)