You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2015/11/27 07:26:10 UTC

[jira] [Created] (TAJO-1995) Improve range partitioning using histogram

Jihoon Son created TAJO-1995:
--------------------------------

             Summary: Improve range partitioning using histogram
                 Key: TAJO-1995
                 URL: https://issues.apache.org/jira/browse/TAJO-1995
             Project: Tajo
          Issue Type: New Feature
          Components: QueryMaster
            Reporter: Jihoon Son
            Assignee: Jihoon Son
             Fix For: 0.12.0


Currently implemented range repartition algorithm has two major problems as follows:
* It assumes that data distribution is uniform, so is fragile for skewed data distribution.
* Given floating point values, it ignores the numbers to the right to the decimal point, so is difficult to guess the proper partition number.

One of the solutions for this problem is to use the histogram. With a histogram, we can figure out data distribution and provide a proper handling of floating point values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)