You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by Jung JaeHwa <bl...@apache.org> on 2014/06/09 13:11:06 UTC
Review Request 22374: TAJO-673: Assign proper number of tasks when inserting
into partitioned table.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------
Review request for Tajo.
Bugs: TAJO-673
https://issues.apache.org/jira/browse/TAJO-673
Repository: tajo
Description
-------
When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
Diffs
-----
tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 3a2e79f
tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd
tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java 1581372
tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION
tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION
Diff: https://reviews.apache.org/r/22374/diff/
Testing
-------
mvn clean install
Thanks,
Jung JaeHwa
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Jung JaeHwa <bl...@apache.org>.
> On June 11, 2014, 7:25 p.m., Hyunsik Choi wrote:
> > tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java, line 533
> > <https://reviews.apache.org/r/22374/diff/3/?file=607530#file607530line533>
> >
> > Many of the parts seem to be similar to the codes of scheduleHashShuffledFetches. Is there any opportunity to make common methods?
Thanks Hyunsik.
I updated Repartitioner::scheduleHashShuffledFetches for handling scattered hash shuffle, and I removed Repartitioner:: scheduleScatteredHashShuffledFetches.
- Jung
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/#review45415
-----------------------------------------------------------
On June 23, 2014, 3:05 a.m., Jung JaeHwa wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22374/
> -----------------------------------------------------------
>
> (Updated June 23, 2014, 3:05 a.m.)
>
>
> Review request for Tajo.
>
>
> Bugs: TAJO-673
> https://issues.apache.org/jira/browse/TAJO-673
>
>
> Repository: tajo
>
>
> Description
> -------
>
> When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
>
>
> Diffs
> -----
>
> tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f
> tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d
> tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 6c000a1
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd
> tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
> tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0
> tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION
> tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION
> tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION
> tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION
> tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION
> tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION
> tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
> tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e
>
> Diff: https://reviews.apache.org/r/22374/diff/
>
>
> Testing
> -------
>
> mvn clean install
>
>
> Thanks,
>
> Jung JaeHwa
>
>
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/#review45415
-----------------------------------------------------------
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
<https://reviews.apache.org/r/22374/#comment80271>
For consistency, I'd like to suggest renaming it to 'scheduleScatteredHashShuffledFetches'.
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
<https://reviews.apache.org/r/22374/#comment80270>
Many of the parts seem to be similar to the codes of scheduleHashShuffledFetches. Is there any opportunity to make common methods?
- Hyunsik Choi
On June 12, 2014, 4:09 a.m., Jung JaeHwa wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22374/
> -----------------------------------------------------------
>
> (Updated June 12, 2014, 4:09 a.m.)
>
>
> Review request for Tajo.
>
>
> Bugs: TAJO-673
> https://issues.apache.org/jira/browse/TAJO-673
>
>
> Repository: tajo
>
>
> Description
> -------
>
> When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
>
>
> Diffs
> -----
>
> tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f
> tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java e508d2c
> tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 3a2e79f
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd
> tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
> tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java 1581372
> tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0
> tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION
> tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION
> tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION
> tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION
> tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION
> tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION
> tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
> tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e
>
> Diff: https://reviews.apache.org/r/22374/diff/
>
>
> Testing
> -------
>
> mvn clean install
>
>
> Thanks,
>
> Jung JaeHwa
>
>
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/#review47760
-----------------------------------------------------------
Ship it!
+1
The patch looks good to me. Could you more improve the description about the scattered hash shuffle before committing?
- Hyunsik Choi
On July 15, 2014, 2:27 a.m., Jung JaeHwa wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22374/
> -----------------------------------------------------------
>
> (Updated July 15, 2014, 2:27 a.m.)
>
>
> Review request for Tajo.
>
>
> Bugs: TAJO-673
> https://issues.apache.org/jira/browse/TAJO-673
>
>
> Repository: tajo
>
>
> Description
> -------
>
> When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
>
>
> Diffs
> -----
>
> tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java dd5327d
> tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java cf02ecd
> tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 4e27574
> tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java ce2194e
> tajo-core/src/main/java/org/apache/tajo/worker/Task.java ee3c40d
> tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java e073652
> tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
> tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java c34c3f4
> tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
> tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java 373642b
>
> Diff: https://reviews.apache.org/r/22374/diff/
>
>
> Testing
> -------
>
> mvn clean install
>
>
> Thanks,
>
> Jung JaeHwa
>
>
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------
(Updated July 14, 2014, 5:27 p.m.)
Review request for Tajo.
Changes
-------
I updated unit test cases.
Bugs: TAJO-673
https://issues.apache.org/jira/browse/TAJO-673
Repository: tajo
Description
-------
When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
Diffs (updated)
-----
tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java dd5327d
tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java cf02ecd
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 4e27574
tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745
tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07
tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java ce2194e
tajo-core/src/main/java/org/apache/tajo/worker/Task.java ee3c40d
tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java e073652
tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java c34c3f4
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java 373642b
Diff: https://reviews.apache.org/r/22374/diff/
Testing
-------
mvn clean install
Thanks,
Jung JaeHwa
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------
(Updated July 14, 2014, 9:53 a.m.)
Review request for Tajo.
Changes
-------
I updated the patch as follows
- TajoConf variable name
- Repartitioner comments
Bugs: TAJO-673
https://issues.apache.org/jira/browse/TAJO-673
Repository: tajo
Description
-------
When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
Diffs (updated)
-----
tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java dd5327d
tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java cf02ecd
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 4e27574
tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745
tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07
tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java ce2194e
tajo-core/src/main/java/org/apache/tajo/worker/Task.java ee3c40d
tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java e073652
tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java c34c3f4
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java 373642b
Diff: https://reviews.apache.org/r/22374/diff/
Testing
-------
mvn clean install
Thanks,
Jung JaeHwa
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/#review47486
-----------------------------------------------------------
tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
<https://reviews.apache.org/r/22374/#comment83342>
According to my understanding, the parameter determines the input volume that each task processes for table partition.
So, the config should belong to 'Distributed Query Execution Parameters'. Please take a look at the section 'Distributed Query Execution Parameters' in TajoConf.
In addition the config is too deep. According to our convention, I'd like to recommend 'tajo.dist-query.table-partition.task-volume-mb'
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
<https://reviews.apache.org/r/22374/#comment83363>
This comment explains the problem when hash shuffle is used for table partition. I think that It is enough that we just explain what is scattered hash shuffle.
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java
<https://reviews.apache.org/r/22374/#comment83362>
It works well because each query has only one query. But, it is not intuitive because a loop seems to overwrite the variable multiple times.
Why don't you traverse the MasterPlan via the graph visitor in order to find your interesting subquery?
- Hyunsik Choi
On July 4, 2014, 6:40 p.m., Jung JaeHwa wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22374/
> -----------------------------------------------------------
>
> (Updated July 4, 2014, 6:40 p.m.)
>
>
> Review request for Tajo.
>
>
> Bugs: TAJO-673
> https://issues.apache.org/jira/browse/TAJO-673
>
>
> Repository: tajo
>
>
> Description
> -------
>
> When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
>
>
> Diffs
> -----
>
> tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 6298d27
> tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d
> tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java edd5674
> tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f
> tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 80274e2
> tajo-core/src/main/java/org/apache/tajo/worker/Task.java c6e2b73
> tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java b1246ec
> tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
> tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 8c989b5
> tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
> tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e
>
> Diff: https://reviews.apache.org/r/22374/diff/
>
>
> Testing
> -------
>
> mvn clean install
>
>
> Thanks,
>
> Jung JaeHwa
>
>
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------
(Updated July 4, 2014, 9:40 a.m.)
Review request for Tajo.
Changes
-------
I updated the patch as follows:
- Divide fetch uris into the the proper number of tasks by IntermediateData output volume. The output volume is 256MB, but you can set it at tajo configuration file. This property name is tajo.scattered.hash.shuffle.split.volume.
- Adding shuffle output volume to TajoWorkerProtocol. If task complete, then Task::getTaskCompletionReport will set this property.
For reference, I tested lots of cases on TPC-H benchmarking cluster, and I found that it ran successfully.
Bugs: TAJO-673
https://issues.apache.org/jira/browse/TAJO-673
Repository: tajo
Description
-------
When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
Diffs (updated)
-----
tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 6298d27
tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java edd5674
tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745
tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07
tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 80274e2
tajo-core/src/main/java/org/apache/tajo/worker/Task.java c6e2b73
tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java b1246ec
tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 8c989b5
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e
Diff: https://reviews.apache.org/r/22374/diff/
Testing
-------
mvn clean install
Thanks,
Jung JaeHwa
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------
(Updated July 3, 2014, 6:14 p.m.)
Review request for Tajo.
Changes
-------
I updated the patch as follows:
- Apply IntermediateEntry total size to task size in scattered hash shuffle
- Remove unnecessary configurations
- Simplify unit test cases for inserting partitioned table
Bugs: TAJO-673
https://issues.apache.org/jira/browse/TAJO-673
Repository: tajo
Description
-------
When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
Diffs (updated)
-----
tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java edd5674
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 80274e2
tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 8c989b5
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e
Diff: https://reviews.apache.org/r/22374/diff/
Testing
-------
mvn clean install
Thanks,
Jung JaeHwa
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------
(Updated July 3, 2014, 12:45 p.m.)
Review request for Tajo.
Bugs: TAJO-673
https://issues.apache.org/jira/browse/TAJO-673
Repository: tajo
Description
-------
When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
Diffs (updated)
-----
tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 6298d27
tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java edd5674
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 80274e2
tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java d4c94e8
tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 8c989b5
tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION
tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION
tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION
tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION
tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION
tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e
Diff: https://reviews.apache.org/r/22374/diff/
Testing
-------
mvn clean install
Thanks,
Jung JaeHwa
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------
(Updated June 23, 2014, 3:05 a.m.)
Review request for Tajo.
Bugs: TAJO-673
https://issues.apache.org/jira/browse/TAJO-673
Repository: tajo
Description
-------
When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
Diffs (updated)
-----
tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f
tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 6c000a1
tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd
tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0
tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION
tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION
tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION
tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION
tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION
tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e
Diff: https://reviews.apache.org/r/22374/diff/
Testing
-------
mvn clean install
Thanks,
Jung JaeHwa
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------
(Updated June 11, 2014, 7:09 p.m.)
Review request for Tajo.
Changes
-------
I modified the patch as following:
- Renamed new shuffle type to scattered hash shuffle.
- Set TajoConf:SHUFFLE_TASK_NUM_VOLUME to 512MB
Bugs: TAJO-673
https://issues.apache.org/jira/browse/TAJO-673
Repository: tajo
Description
-------
When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
Diffs (updated)
-----
tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f
tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java e508d2c
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 3a2e79f
tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd
tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java 1581372
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0
tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION
tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION
tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION
tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION
tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION
tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e
Diff: https://reviews.apache.org/r/22374/diff/
Testing
-------
mvn clean install
Thanks,
Jung JaeHwa
Re: Review Request 22374: TAJO-673: Assign proper number of tasks when
inserting into partitioned table.
Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------
(Updated June 10, 2014, 12:11 a.m.)
Review request for Tajo.
Changes
-------
I added new shuffle type for inserting partitioned table because it need to be scheduled different with a hash shuffle and a range shuffle.
Bugs: TAJO-673
https://issues.apache.org/jira/browse/TAJO-673
Repository: tajo
Description
-------
When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
Diffs (updated)
-----
tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f
tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java e508d2c
tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 3a2e79f
tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd
tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13
tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java 1581372
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0
tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION
tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION
tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION
tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION
tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION
tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29
tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e
Diff: https://reviews.apache.org/r/22374/diff/
Testing
-------
mvn clean install
Thanks,
Jung JaeHwa