You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2015/11/27 07:26:10 UTC
[jira] [Created] (TAJO-1995) Improve range partitioning using
histogram
Jihoon Son created TAJO-1995:
--------------------------------
Summary: Improve range partitioning using histogram
Key: TAJO-1995
URL: https://issues.apache.org/jira/browse/TAJO-1995
Project: Tajo
Issue Type: New Feature
Components: QueryMaster
Reporter: Jihoon Son
Assignee: Jihoon Son
Fix For: 0.12.0
Currently implemented range repartition algorithm has two major problems as follows:
* It assumes that data distribution is uniform, so is fragile for skewed data distribution.
* Given floating point values, it ignores the numbers to the right to the decimal point, so is difficult to guess the proper partition number.
One of the solutions for this problem is to use the histogram. With a histogram, we can figure out data distribution and provide a proper handling of floating point values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)