You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by Jung JaeHwa <bl...@apache.org> on 2014/06/09 13:11:06 UTC

Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------

Review request for Tajo.


Bugs: TAJO-673
    https://issues.apache.org/jira/browse/TAJO-673


Repository: tajo


Description
-------

When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.


Diffs
-----

  tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 3a2e79f 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java 1581372 
  tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION 
  tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION 

Diff: https://reviews.apache.org/r/22374/diff/


Testing
-------

mvn clean install


Thanks,

Jung JaeHwa


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Jung JaeHwa <bl...@apache.org>.

> On June 11, 2014, 7:25 p.m., Hyunsik Choi wrote:
> > tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java, line 533
> > <https://reviews.apache.org/r/22374/diff/3/?file=607530#file607530line533>
> >
> >     Many of the parts seem to be similar to the codes of scheduleHashShuffledFetches. Is there any opportunity to make common methods?

Thanks Hyunsik.

I updated Repartitioner::scheduleHashShuffledFetches for handling scattered hash shuffle, and I removed Repartitioner:: scheduleScatteredHashShuffledFetches.


- Jung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/#review45415
-----------------------------------------------------------


On June 23, 2014, 3:05 a.m., Jung JaeHwa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22374/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 3:05 a.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-673
>     https://issues.apache.org/jira/browse/TAJO-673
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 6c000a1 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd 
>   tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
>   tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0 
>   tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION 
>   tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION 
>   tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION 
>   tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION 
>   tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION 
>   tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION 
>   tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
>   tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e 
> 
> Diff: https://reviews.apache.org/r/22374/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Jung JaeHwa
> 
>


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/#review45415
-----------------------------------------------------------



tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
<https://reviews.apache.org/r/22374/#comment80271>

    For consistency, I'd like to suggest renaming it to 'scheduleScatteredHashShuffledFetches'.



tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
<https://reviews.apache.org/r/22374/#comment80270>

    Many of the parts seem to be similar to the codes of scheduleHashShuffledFetches. Is there any opportunity to make common methods?


- Hyunsik Choi


On June 12, 2014, 4:09 a.m., Jung JaeHwa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22374/
> -----------------------------------------------------------
> 
> (Updated June 12, 2014, 4:09 a.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-673
>     https://issues.apache.org/jira/browse/TAJO-673
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java e508d2c 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 3a2e79f 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd 
>   tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
>   tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java 1581372 
>   tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0 
>   tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION 
>   tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION 
>   tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION 
>   tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION 
>   tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION 
>   tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION 
>   tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
>   tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e 
> 
> Diff: https://reviews.apache.org/r/22374/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Jung JaeHwa
> 
>


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/#review47760
-----------------------------------------------------------

Ship it!


+1
The patch looks good to me. Could you more improve the description about the scattered hash shuffle before committing?

- Hyunsik Choi


On July 15, 2014, 2:27 a.m., Jung JaeHwa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22374/
> -----------------------------------------------------------
> 
> (Updated July 15, 2014, 2:27 a.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-673
>     https://issues.apache.org/jira/browse/TAJO-673
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java dd5327d 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java cf02ecd 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 4e27574 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java ce2194e 
>   tajo-core/src/main/java/org/apache/tajo/worker/Task.java ee3c40d 
>   tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java e073652 
>   tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
>   tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java c34c3f4 
>   tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
>   tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java 373642b 
> 
> Diff: https://reviews.apache.org/r/22374/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Jung JaeHwa
> 
>


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------

(Updated July 14, 2014, 5:27 p.m.)


Review request for Tajo.


Changes
-------

I updated unit test cases.


Bugs: TAJO-673
    https://issues.apache.org/jira/browse/TAJO-673


Repository: tajo


Description
-------

When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.


Diffs (updated)
-----

  tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java dd5327d 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java cf02ecd 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 4e27574 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java ce2194e 
  tajo-core/src/main/java/org/apache/tajo/worker/Task.java ee3c40d 
  tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java e073652 
  tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java c34c3f4 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java 373642b 

Diff: https://reviews.apache.org/r/22374/diff/


Testing
-------

mvn clean install


Thanks,

Jung JaeHwa


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------

(Updated July 14, 2014, 9:53 a.m.)


Review request for Tajo.


Changes
-------

I updated the patch as follows
- TajoConf variable name
- Repartitioner comments


Bugs: TAJO-673
    https://issues.apache.org/jira/browse/TAJO-673


Repository: tajo


Description
-------

When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.


Diffs (updated)
-----

  tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java dd5327d 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java cf02ecd 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 4e27574 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java ce2194e 
  tajo-core/src/main/java/org/apache/tajo/worker/Task.java ee3c40d 
  tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java e073652 
  tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java c34c3f4 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java 373642b 

Diff: https://reviews.apache.org/r/22374/diff/


Testing
-------

mvn clean install


Thanks,

Jung JaeHwa


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/#review47486
-----------------------------------------------------------



tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
<https://reviews.apache.org/r/22374/#comment83342>

    According to my understanding, the parameter determines the input volume that each task processes for table partition.
    
    So, the config should belong to 'Distributed Query Execution Parameters'. Please take a look at the section 'Distributed Query Execution Parameters' in TajoConf.
    
    In addition the config is too deep. According to our convention, I'd like to recommend 'tajo.dist-query.table-partition.task-volume-mb'



tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
<https://reviews.apache.org/r/22374/#comment83363>

    This comment explains the problem when hash shuffle is used for table partition. I think that It is enough that we just explain what is scattered hash shuffle.



tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java
<https://reviews.apache.org/r/22374/#comment83362>

    It works well because each query has only one query. But, it is not intuitive because a loop seems to overwrite the variable multiple times.
    
    Why don't you traverse the MasterPlan via the graph visitor in order to find your interesting subquery?


- Hyunsik Choi


On July 4, 2014, 6:40 p.m., Jung JaeHwa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22374/
> -----------------------------------------------------------
> 
> (Updated July 4, 2014, 6:40 p.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-673
>     https://issues.apache.org/jira/browse/TAJO-673
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 6298d27 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java edd5674 
>   tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f 
>   tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 80274e2 
>   tajo-core/src/main/java/org/apache/tajo/worker/Task.java c6e2b73 
>   tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java b1246ec 
>   tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
>   tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 8c989b5 
>   tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
>   tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e 
> 
> Diff: https://reviews.apache.org/r/22374/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Jung JaeHwa
> 
>


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------

(Updated July 4, 2014, 9:40 a.m.)


Review request for Tajo.


Changes
-------

I updated the patch as follows:

- Divide fetch uris into the the proper number of tasks by IntermediateData output volume. The output volume is 256MB, but you can set it at tajo configuration file. This property name is tajo.scattered.hash.shuffle.split.volume.
- Adding shuffle output volume to TajoWorkerProtocol. If task complete, then Task::getTaskCompletionReport will set this property.

For reference, I tested lots of cases on TPC-H benchmarking cluster, and I found that it ran successfully. 


Bugs: TAJO-673
    https://issues.apache.org/jira/browse/TAJO-673


Repository: tajo


Description
-------

When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.


Diffs (updated)
-----

  tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 6298d27 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java edd5674 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java 678b745 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnit.java 6cada07 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryUnitAttempt.java 361f88f 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 80274e2 
  tajo-core/src/main/java/org/apache/tajo/worker/Task.java c6e2b73 
  tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java b1246ec 
  tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 8c989b5 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e 

Diff: https://reviews.apache.org/r/22374/diff/


Testing
-------

mvn clean install


Thanks,

Jung JaeHwa


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------

(Updated July 3, 2014, 6:14 p.m.)


Review request for Tajo.


Changes
-------

I updated the patch as follows:
- Apply IntermediateEntry total size to task size in scattered hash shuffle
- Remove unnecessary configurations
- Simplify unit test cases for inserting partitioned table


Bugs: TAJO-673
    https://issues.apache.org/jira/browse/TAJO-673


Repository: tajo


Description
-------

When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.


Diffs (updated)
-----

  tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java edd5674 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 80274e2 
  tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 8c989b5 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e 

Diff: https://reviews.apache.org/r/22374/diff/


Testing
-------

mvn clean install


Thanks,

Jung JaeHwa


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------

(Updated July 3, 2014, 12:45 p.m.)


Review request for Tajo.


Bugs: TAJO-673
    https://issues.apache.org/jira/browse/TAJO-673


Repository: tajo


Description
-------

When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.


Diffs (updated)
-----

  tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 6298d27 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java edd5674 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 80274e2 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java d4c94e8 
  tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 8c989b5 
  tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION 
  tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION 
  tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION 
  tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION 
  tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION 
  tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e 

Diff: https://reviews.apache.org/r/22374/diff/


Testing
-------

mvn clean install


Thanks,

Jung JaeHwa


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------

(Updated June 23, 2014, 3:05 a.m.)


Review request for Tajo.


Bugs: TAJO-673
    https://issues.apache.org/jira/browse/TAJO-673


Repository: tajo


Description
-------

When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.


Diffs (updated)
-----

  tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java f41d61d 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 6c000a1 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd 
  tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0 
  tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION 
  tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION 
  tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION 
  tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION 
  tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION 
  tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e 

Diff: https://reviews.apache.org/r/22374/diff/


Testing
-------

mvn clean install


Thanks,

Jung JaeHwa


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------

(Updated June 11, 2014, 7:09 p.m.)


Review request for Tajo.


Changes
-------

I modified the patch as following:

- Renamed new shuffle type to scattered hash shuffle.
- Set TajoConf:SHUFFLE_TASK_NUM_VOLUME to 512MB


Bugs: TAJO-673
    https://issues.apache.org/jira/browse/TAJO-673


Repository: tajo


Description
-------

When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.


Diffs (updated)
-----

  tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java e508d2c 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 3a2e79f 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd 
  tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java 1581372 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0 
  tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION 
  tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION 
  tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION 
  tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION 
  tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION 
  tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e 

Diff: https://reviews.apache.org/r/22374/diff/


Testing
-------

mvn clean install


Thanks,

Jung JaeHwa


Re: Review Request 22374: TAJO-673: Assign proper number of tasks when inserting into partitioned table.

Posted by Jung JaeHwa <bl...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22374/
-----------------------------------------------------------

(Updated June 10, 2014, 12:11 a.m.)


Review request for Tajo.


Changes
-------

I added new shuffle type for inserting partitioned table because it need to be scheduled different with a hash shuffle and a range shuffle.


Bugs: TAJO-673
    https://issues.apache.org/jira/browse/TAJO-673


Repository: tajo


Description
-------

When inserting into partitioned table, if the number of partitions is smaller than cluster concurrency capacity, a query execution is too slow.


Diffs (updated)
-----

  tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java 3f2b16f 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java e508d2c 
  tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java 536dbd8 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 3a2e79f 
  tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java 22817bd 
  tajo-core/src/main/proto/TajoWorkerProtocol.proto 3bf6e13 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java 1581372 
  tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java 0ec7de0 
  tajo-core/src/test/resources/dataset/TestTablePartitions/customer_large/customer.tbl PRE-CREATION 
  tajo-core/src/test/resources/dataset/TestTablePartitions/lineitem_large/lineitem.tbl PRE-CREATION 
  tajo-core/src/test/resources/queries/TestJoinBroadcast/testBroadcastSubquery3.sql PRE-CREATION 
  tajo-core/src/test/resources/queries/TestTablePartitions/create_customer_large_ddl.sql PRE-CREATION 
  tajo-core/src/test/resources/queries/TestTablePartitions/create_lineitem_large_ddl.sql PRE-CREATION 
  tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery3.result PRE-CREATION 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java b8fda29 
  tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java cc3cb2e 

Diff: https://reviews.apache.org/r/22374/diff/


Testing
-------

mvn clean install


Thanks,

Jung JaeHwa