You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by hyunsik <gi...@git.apache.org> on 2014/07/23 13:19:14 UTC

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

GitHub user hyunsik opened a pull request:

    https://github.com/apache/tajo/pull/91

    TAJO-966: Range partition should support split of multiple characters.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hyunsik/tajo TAJO-966

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/91.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #91
    
----
commit 78b2a42d807c7f244ecab814d6a99513a01daf33
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-21T07:03:12Z

    TAJO-965: Upgrade Bytes class and move some methods to others.

commit 89287cb4b69354cbab1ffa4485c58684c384c982
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-21T08:13:36Z

    initial work.

commit 9602e9f5036eb078751dc786c0e64ca256d60d6f
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-22T05:38:00Z

    Changed data type to BigInteger from BigDecimal.

commit f4a83c2b6fc71485d856680b5f62b7a5859ddb8b
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-22T08:02:56Z

    Improved logic.

commit 983dd11365010a977df9a75e511d25ceabc5b9b2
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-22T09:11:27Z

    Conflicts:
    	tajo-common/src/main/java/org/apache/tajo/util/BytesUtils.java
    	tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java

commit 78e8f17c2cd7b27a576a3e9a095fa3e3e3d63c61
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-22T10:19:16Z

    Changing the bytes normalization

commit 9ade82fdd36224e60c10b01702f183ae993eb60e
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-23T10:54:27Z

    Completed uniform splitter of multiple character range.

commit 4f2c832e89f56b2d289b0036a7e0afb0ee0d0ba7
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-23T10:57:50Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into HEAD
    
    Conflicts:
    	CHANGES
    	tajo-common/src/main/java/org/apache/tajo/util/BytesUtils.java
    	tajo-core/src/test/java/org/apache/tajo/engine/query/TestCaseByCases.java

commit b009e09a2517f0b9ccc3201238b45a328876f8a4
Author: Hyunsik Choi <hy...@apache.org>
Date:   2014-07-23T11:17:38Z

    Add more comments.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/91#discussion_r15741026
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java ---
    @@ -94,25 +100,81 @@ public UniformRangePartition(TupleRange range, SortSpec [] sortSpecs) {
         }
     
         List<TupleRange> ranges = Lists.newArrayList();
    -    BigDecimal term = reverseCardsForDigit[0].divide(
    -        new BigDecimal(partNum), RoundingMode.CEILING);
    -    BigDecimal reminder = reverseCardsForDigit[0];
    -    Tuple last = range.getStart();
    -    while(reminder.compareTo(new BigDecimal(0)) > 0) {
    +
    +    BigDecimal x = new BigDecimal(reverseCardsForDigit[0]);
    +
    +    BigInteger term = x.divide(BigDecimal.valueOf(partNum), RoundingMode.CEILING).toBigInteger();
    +    BigInteger reminder = reverseCardsForDigit[0];
    +    Tuple last = mergedRange.getStart();
    +    TupleRange tupleRange;
    +    while(reminder.compareTo(BigInteger.ZERO) > 0) {
           if (reminder.compareTo(term) <= 0) { // final one is inclusive
    -        ranges.add(new TupleRange(sortSpecs, last, range.getEnd()));
    +        tupleRange = new TupleRange(sortSpecs, last, mergedRange.getEnd());
           } else {
    -        Tuple next = increment(last, term.longValue(), variableId);
    -        ranges.add(new TupleRange(sortSpecs, last, next));
    +        Tuple next = increment(last, term, variableId);
    +        tupleRange = new TupleRange(sortSpecs, last, next);
           }
    +
    +      ranges.add(tupleRange);
           last = ranges.get(ranges.size() - 1).getEnd();
           reminder = reminder.subtract(term);
         }
     
    +    for (TupleRange r : ranges) {
    +      denormalize(sortSpecs, r);
    +    }
    +
         return ranges.toArray(new TupleRange[ranges.size()]);
       }
     
       /**
    +   * It normalizes the start and end keys to have the same length bytes if they are texts or bytes.
    +   *
    +   * @param sortSpecs The sort specs
    +   * @param range Tuple range to be normalize
    +   */
    +  public static void normalize(final SortSpec [] sortSpecs, TupleRange range) {
    +    // normalize text fields to have same bytes length
    +    for (int i = 0; i < sortSpecs.length; i++) {
    +      if (sortSpecs[i].getSortKey().getDataType().getType() == TajoDataTypes.Type.TEXT) {
    --- End diff --
    
    Yes, I think so. I didn't implement BLOB types. It would be easy to add BLOB support. I'll try it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/91#issuecomment-51014071
  
    rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/91#discussion_r15740968
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java ---
    @@ -94,25 +100,81 @@ public UniformRangePartition(TupleRange range, SortSpec [] sortSpecs) {
         }
     
         List<TupleRange> ranges = Lists.newArrayList();
    -    BigDecimal term = reverseCardsForDigit[0].divide(
    -        new BigDecimal(partNum), RoundingMode.CEILING);
    -    BigDecimal reminder = reverseCardsForDigit[0];
    -    Tuple last = range.getStart();
    -    while(reminder.compareTo(new BigDecimal(0)) > 0) {
    +
    +    BigDecimal x = new BigDecimal(reverseCardsForDigit[0]);
    +
    +    BigInteger term = x.divide(BigDecimal.valueOf(partNum), RoundingMode.CEILING).toBigInteger();
    +    BigInteger reminder = reverseCardsForDigit[0];
    +    Tuple last = mergedRange.getStart();
    +    TupleRange tupleRange;
    +    while(reminder.compareTo(BigInteger.ZERO) > 0) {
           if (reminder.compareTo(term) <= 0) { // final one is inclusive
    -        ranges.add(new TupleRange(sortSpecs, last, range.getEnd()));
    +        tupleRange = new TupleRange(sortSpecs, last, mergedRange.getEnd());
           } else {
    -        Tuple next = increment(last, term.longValue(), variableId);
    -        ranges.add(new TupleRange(sortSpecs, last, next));
    +        Tuple next = increment(last, term, variableId);
    +        tupleRange = new TupleRange(sortSpecs, last, next);
           }
    +
    +      ranges.add(tupleRange);
           last = ranges.get(ranges.size() - 1).getEnd();
           reminder = reminder.subtract(term);
         }
     
    +    for (TupleRange r : ranges) {
    +      denormalize(sortSpecs, r);
    +    }
    +
         return ranges.toArray(new TupleRange[ranges.size()]);
       }
     
       /**
    +   * It normalizes the start and end keys to have the same length bytes if they are texts or bytes.
    +   *
    +   * @param sortSpecs The sort specs
    +   * @param range Tuple range to be normalize
    +   */
    +  public static void normalize(final SortSpec [] sortSpecs, TupleRange range) {
    +    // normalize text fields to have same bytes length
    +    for (int i = 0; i < sortSpecs.length; i++) {
    +      if (sortSpecs[i].getSortKey().getDataType().getType() == TajoDataTypes.Type.TEXT) {
    --- End diff --
    
    According to your comment, the type can be BLOB as well as TEXT, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/91#issuecomment-51019071
  
    Even though I commented that BLOB also should be considered above, BLOB is currently not supported for range partition. Thus, we can add the code to handle BLOB later when we need to support range partition with BLOB.
    Here is my +1.
    Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/91#issuecomment-50425629
  
    Sorry @hyunsik,
    I'll review today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/91#issuecomment-51019691
  
    Thank you for your quick review. I'll commit it shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/91#discussion_r15740995
  
    --- Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java ---
    @@ -94,25 +100,81 @@ public UniformRangePartition(TupleRange range, SortSpec [] sortSpecs) {
         }
     
         List<TupleRange> ranges = Lists.newArrayList();
    -    BigDecimal term = reverseCardsForDigit[0].divide(
    -        new BigDecimal(partNum), RoundingMode.CEILING);
    -    BigDecimal reminder = reverseCardsForDigit[0];
    -    Tuple last = range.getStart();
    -    while(reminder.compareTo(new BigDecimal(0)) > 0) {
    +
    +    BigDecimal x = new BigDecimal(reverseCardsForDigit[0]);
    +
    +    BigInteger term = x.divide(BigDecimal.valueOf(partNum), RoundingMode.CEILING).toBigInteger();
    +    BigInteger reminder = reverseCardsForDigit[0];
    +    Tuple last = mergedRange.getStart();
    +    TupleRange tupleRange;
    +    while(reminder.compareTo(BigInteger.ZERO) > 0) {
           if (reminder.compareTo(term) <= 0) { // final one is inclusive
    -        ranges.add(new TupleRange(sortSpecs, last, range.getEnd()));
    +        tupleRange = new TupleRange(sortSpecs, last, mergedRange.getEnd());
           } else {
    -        Tuple next = increment(last, term.longValue(), variableId);
    -        ranges.add(new TupleRange(sortSpecs, last, next));
    +        Tuple next = increment(last, term, variableId);
    +        tupleRange = new TupleRange(sortSpecs, last, next);
           }
    +
    +      ranges.add(tupleRange);
           last = ranges.get(ranges.size() - 1).getEnd();
           reminder = reminder.subtract(term);
         }
     
    +    for (TupleRange r : ranges) {
    +      denormalize(sortSpecs, r);
    +    }
    +
         return ranges.toArray(new TupleRange[ranges.size()]);
       }
     
       /**
    +   * It normalizes the start and end keys to have the same length bytes if they are texts or bytes.
    +   *
    +   * @param sortSpecs The sort specs
    +   * @param range Tuple range to be normalize
    +   */
    +  public static void normalize(final SortSpec [] sortSpecs, TupleRange range) {
    +    // normalize text fields to have same bytes length
    +    for (int i = 0; i < sortSpecs.length; i++) {
    +      if (sortSpecs[i].getSortKey().getDataType().getType() == TajoDataTypes.Type.TEXT) {
    +        byte [] startBytes;
    +        byte [] endBytes;
    +        if (range.getStart().isNull(i)) {
    +          startBytes = BigInteger.ZERO.toByteArray();
    +        } else {
    +          startBytes = range.getStart().getBytes(i);
    +        }
    +
    +        if (range.getEnd().isNull(i)) {
    +          endBytes = BigInteger.ZERO.toByteArray();
    +        } else {
    +          endBytes = range.getEnd().getBytes(i);
    +        }
    +
    +        byte [][] padded = BytesUtils.padBytes(startBytes, endBytes);
    +        range.getStart().put(i, DatumFactory.createText(padded[0]));
    +        range.getEnd().put(i, DatumFactory.createText(padded[1]));
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Normalized keys have padding values, but it will cause the key mismatch in pull server.
    +   * So, it denormalize the normalized keys again.
    +   *
    +   * @param sortSpecs The sort specs
    +   * @param range Tuple range to be denormalized
    +   */
    +  public static void denormalize(SortSpec [] sortSpecs, TupleRange range) {
    +    for (int i = 0; i < sortSpecs.length; i++) {
    +      if (sortSpecs[i].getSortKey().getDataType().getType() == TajoDataTypes.Type.TEXT) {
    --- End diff --
    
    As commented above, the type seems to be able to be BLOB.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/91


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-966: Range partition should support split ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/91#issuecomment-50294968
  
    Anyone can review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---