You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by hyunsik <gi...@git.apache.org> on 2014/07/23 13:19:14 UTC
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
GitHub user hyunsik opened a pull request:
https://github.com/apache/tajo/pull/91
TAJO-966: Range partition should support split of multiple characters.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hyunsik/tajo TAJO-966
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/91.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #91
----
commit 78b2a42d807c7f244ecab814d6a99513a01daf33
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-07-21T07:03:12Z
TAJO-965: Upgrade Bytes class and move some methods to others.
commit 89287cb4b69354cbab1ffa4485c58684c384c982
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-07-21T08:13:36Z
initial work.
commit 9602e9f5036eb078751dc786c0e64ca256d60d6f
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-07-22T05:38:00Z
Changed data type to BigInteger from BigDecimal.
commit f4a83c2b6fc71485d856680b5f62b7a5859ddb8b
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-07-22T08:02:56Z
Improved logic.
commit 983dd11365010a977df9a75e511d25ceabc5b9b2
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-07-22T09:11:27Z
Conflicts:
tajo-common/src/main/java/org/apache/tajo/util/BytesUtils.java
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
commit 78e8f17c2cd7b27a576a3e9a095fa3e3e3d63c61
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-07-22T10:19:16Z
Changing the bytes normalization
commit 9ade82fdd36224e60c10b01702f183ae993eb60e
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-07-23T10:54:27Z
Completed uniform splitter of multiple character range.
commit 4f2c832e89f56b2d289b0036a7e0afb0ee0d0ba7
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-07-23T10:57:50Z
Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into HEAD
Conflicts:
CHANGES
tajo-common/src/main/java/org/apache/tajo/util/BytesUtils.java
tajo-core/src/test/java/org/apache/tajo/engine/query/TestCaseByCases.java
commit b009e09a2517f0b9ccc3201238b45a328876f8a4
Author: Hyunsik Choi <hy...@apache.org>
Date: 2014-07-23T11:17:38Z
Add more comments.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:
https://github.com/apache/tajo/pull/91#discussion_r15741026
--- Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java ---
@@ -94,25 +100,81 @@ public UniformRangePartition(TupleRange range, SortSpec [] sortSpecs) {
}
List<TupleRange> ranges = Lists.newArrayList();
- BigDecimal term = reverseCardsForDigit[0].divide(
- new BigDecimal(partNum), RoundingMode.CEILING);
- BigDecimal reminder = reverseCardsForDigit[0];
- Tuple last = range.getStart();
- while(reminder.compareTo(new BigDecimal(0)) > 0) {
+
+ BigDecimal x = new BigDecimal(reverseCardsForDigit[0]);
+
+ BigInteger term = x.divide(BigDecimal.valueOf(partNum), RoundingMode.CEILING).toBigInteger();
+ BigInteger reminder = reverseCardsForDigit[0];
+ Tuple last = mergedRange.getStart();
+ TupleRange tupleRange;
+ while(reminder.compareTo(BigInteger.ZERO) > 0) {
if (reminder.compareTo(term) <= 0) { // final one is inclusive
- ranges.add(new TupleRange(sortSpecs, last, range.getEnd()));
+ tupleRange = new TupleRange(sortSpecs, last, mergedRange.getEnd());
} else {
- Tuple next = increment(last, term.longValue(), variableId);
- ranges.add(new TupleRange(sortSpecs, last, next));
+ Tuple next = increment(last, term, variableId);
+ tupleRange = new TupleRange(sortSpecs, last, next);
}
+
+ ranges.add(tupleRange);
last = ranges.get(ranges.size() - 1).getEnd();
reminder = reminder.subtract(term);
}
+ for (TupleRange r : ranges) {
+ denormalize(sortSpecs, r);
+ }
+
return ranges.toArray(new TupleRange[ranges.size()]);
}
/**
+ * It normalizes the start and end keys to have the same length bytes if they are texts or bytes.
+ *
+ * @param sortSpecs The sort specs
+ * @param range Tuple range to be normalize
+ */
+ public static void normalize(final SortSpec [] sortSpecs, TupleRange range) {
+ // normalize text fields to have same bytes length
+ for (int i = 0; i < sortSpecs.length; i++) {
+ if (sortSpecs[i].getSortKey().getDataType().getType() == TajoDataTypes.Type.TEXT) {
--- End diff --
Yes, I think so. I didn't implement BLOB types. It would be easy to add BLOB support. I'll try it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:
https://github.com/apache/tajo/pull/91#issuecomment-51014071
rebased.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:
https://github.com/apache/tajo/pull/91#discussion_r15740968
--- Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java ---
@@ -94,25 +100,81 @@ public UniformRangePartition(TupleRange range, SortSpec [] sortSpecs) {
}
List<TupleRange> ranges = Lists.newArrayList();
- BigDecimal term = reverseCardsForDigit[0].divide(
- new BigDecimal(partNum), RoundingMode.CEILING);
- BigDecimal reminder = reverseCardsForDigit[0];
- Tuple last = range.getStart();
- while(reminder.compareTo(new BigDecimal(0)) > 0) {
+
+ BigDecimal x = new BigDecimal(reverseCardsForDigit[0]);
+
+ BigInteger term = x.divide(BigDecimal.valueOf(partNum), RoundingMode.CEILING).toBigInteger();
+ BigInteger reminder = reverseCardsForDigit[0];
+ Tuple last = mergedRange.getStart();
+ TupleRange tupleRange;
+ while(reminder.compareTo(BigInteger.ZERO) > 0) {
if (reminder.compareTo(term) <= 0) { // final one is inclusive
- ranges.add(new TupleRange(sortSpecs, last, range.getEnd()));
+ tupleRange = new TupleRange(sortSpecs, last, mergedRange.getEnd());
} else {
- Tuple next = increment(last, term.longValue(), variableId);
- ranges.add(new TupleRange(sortSpecs, last, next));
+ Tuple next = increment(last, term, variableId);
+ tupleRange = new TupleRange(sortSpecs, last, next);
}
+
+ ranges.add(tupleRange);
last = ranges.get(ranges.size() - 1).getEnd();
reminder = reminder.subtract(term);
}
+ for (TupleRange r : ranges) {
+ denormalize(sortSpecs, r);
+ }
+
return ranges.toArray(new TupleRange[ranges.size()]);
}
/**
+ * It normalizes the start and end keys to have the same length bytes if they are texts or bytes.
+ *
+ * @param sortSpecs The sort specs
+ * @param range Tuple range to be normalize
+ */
+ public static void normalize(final SortSpec [] sortSpecs, TupleRange range) {
+ // normalize text fields to have same bytes length
+ for (int i = 0; i < sortSpecs.length; i++) {
+ if (sortSpecs[i].getSortKey().getDataType().getType() == TajoDataTypes.Type.TEXT) {
--- End diff --
According to your comment, the type can be BLOB as well as TEXT, right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:
https://github.com/apache/tajo/pull/91#issuecomment-51019071
Even though I commented that BLOB also should be considered above, BLOB is currently not supported for range partition. Thus, we can add the code to handle BLOB later when we need to support range partition with BLOB.
Here is my +1.
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:
https://github.com/apache/tajo/pull/91#issuecomment-50425629
Sorry @hyunsik,
I'll review today.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:
https://github.com/apache/tajo/pull/91#issuecomment-51019691
Thank you for your quick review. I'll commit it shortly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:
https://github.com/apache/tajo/pull/91#discussion_r15740995
--- Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java ---
@@ -94,25 +100,81 @@ public UniformRangePartition(TupleRange range, SortSpec [] sortSpecs) {
}
List<TupleRange> ranges = Lists.newArrayList();
- BigDecimal term = reverseCardsForDigit[0].divide(
- new BigDecimal(partNum), RoundingMode.CEILING);
- BigDecimal reminder = reverseCardsForDigit[0];
- Tuple last = range.getStart();
- while(reminder.compareTo(new BigDecimal(0)) > 0) {
+
+ BigDecimal x = new BigDecimal(reverseCardsForDigit[0]);
+
+ BigInteger term = x.divide(BigDecimal.valueOf(partNum), RoundingMode.CEILING).toBigInteger();
+ BigInteger reminder = reverseCardsForDigit[0];
+ Tuple last = mergedRange.getStart();
+ TupleRange tupleRange;
+ while(reminder.compareTo(BigInteger.ZERO) > 0) {
if (reminder.compareTo(term) <= 0) { // final one is inclusive
- ranges.add(new TupleRange(sortSpecs, last, range.getEnd()));
+ tupleRange = new TupleRange(sortSpecs, last, mergedRange.getEnd());
} else {
- Tuple next = increment(last, term.longValue(), variableId);
- ranges.add(new TupleRange(sortSpecs, last, next));
+ Tuple next = increment(last, term, variableId);
+ tupleRange = new TupleRange(sortSpecs, last, next);
}
+
+ ranges.add(tupleRange);
last = ranges.get(ranges.size() - 1).getEnd();
reminder = reminder.subtract(term);
}
+ for (TupleRange r : ranges) {
+ denormalize(sortSpecs, r);
+ }
+
return ranges.toArray(new TupleRange[ranges.size()]);
}
/**
+ * It normalizes the start and end keys to have the same length bytes if they are texts or bytes.
+ *
+ * @param sortSpecs The sort specs
+ * @param range Tuple range to be normalize
+ */
+ public static void normalize(final SortSpec [] sortSpecs, TupleRange range) {
+ // normalize text fields to have same bytes length
+ for (int i = 0; i < sortSpecs.length; i++) {
+ if (sortSpecs[i].getSortKey().getDataType().getType() == TajoDataTypes.Type.TEXT) {
+ byte [] startBytes;
+ byte [] endBytes;
+ if (range.getStart().isNull(i)) {
+ startBytes = BigInteger.ZERO.toByteArray();
+ } else {
+ startBytes = range.getStart().getBytes(i);
+ }
+
+ if (range.getEnd().isNull(i)) {
+ endBytes = BigInteger.ZERO.toByteArray();
+ } else {
+ endBytes = range.getEnd().getBytes(i);
+ }
+
+ byte [][] padded = BytesUtils.padBytes(startBytes, endBytes);
+ range.getStart().put(i, DatumFactory.createText(padded[0]));
+ range.getEnd().put(i, DatumFactory.createText(padded[1]));
+ }
+ }
+ }
+
+ /**
+ * Normalized keys have padding values, but it will cause the key mismatch in pull server.
+ * So, it denormalize the normalized keys again.
+ *
+ * @param sortSpecs The sort specs
+ * @param range Tuple range to be denormalized
+ */
+ public static void denormalize(SortSpec [] sortSpecs, TupleRange range) {
+ for (int i = 0; i < sortSpecs.length; i++) {
+ if (sortSpecs[i].getSortKey().getDataType().getType() == TajoDataTypes.Type.TEXT) {
--- End diff --
As commented above, the type seems to be able to be BLOB.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/tajo/pull/91
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-966: Range partition should support split ...
Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:
https://github.com/apache/tajo/pull/91#issuecomment-50294968
Anyone can review this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---