You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by Hyunsik Choi <hy...@apache.org> on 2014/02/01 05:02:59 UTC

Review Request 17633: TAJO-574: Add a sort-based physical executor for column partition store

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17633/
-----------------------------------------------------------

Review request for Tajo.


Bugs: TAJO-574
    https://issues.apache.org/jira/browse/TAJO-574


Repository: tajo


Description
-------

ColumnPartitionStoreExec keeps numerous open files while it is storing all data. In addition, its


Diffs
-----

  CHANGES.txt f038f979d1e8732fc380fe878217be90df8d0153 
  tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/statistics/StatisticsUtil.java b83681840d387adcb0859a76c4d81a3a1c28b063 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java 202c59db1740145e84df1925b4650ff2a1de8627 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/enforce/Enforcer.java d7c3ba411536275634c9f1092dc6df5ad06400d8 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/BasicPhysicalExecutorVisitor.java 67d6baa9192b5a4cd442071e27b415e4867cda58 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionedStoreExec.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColumnPartitionedTableStoreExec.java cee9bbafbe3585a7ac0c566669931f8c42336493 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalExecutorVisitor.java 9ede15d3228550cfec5ddaff9ec387b2ee79f56c 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SortBasedColPartitionStoreExec.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/proto/TajoWorkerProtocol.proto 9aa6d865e696404257df627354a94b77cf7eb9e1 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestPhysicalPlanner.java ab27a458633416cb8e28fadbf08685c085fdfa08 

Diff: https://reviews.apache.org/r/17633/diff/


Testing
-------


Thanks,

Hyunsik Choi


Re: Review Request 17633: TAJO-574: Add a sort-based physical executor for column partition store

Posted by Hyunsik Choi <hy...@apache.org>.

> On Feb. 1, 2014, 7:38 p.m., Jihoon Son wrote:
> > tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionedStoreExec.java, line 36
> > <https://reviews.apache.org/r/17633/diff/1/?file=462622#file462622line36>
> >
> >     Would you change the name to ColPartitionStoreExec to be consistent with HashBasedColPartitionStoreExec and SortBasedColPartitionStoreExec?

I missed one. I've renamed the class. Thank you for the review.


- Hyunsik


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17633/#review33385
-----------------------------------------------------------


On Feb. 1, 2014, 1:04 p.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17633/
> -----------------------------------------------------------
> 
> (Updated Feb. 1, 2014, 1:04 p.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-574
>     https://issues.apache.org/jira/browse/TAJO-574
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> ColumnPartitionStoreExec keeps numerous open files while it is storing all data. In addition, it's random write gives burden to HDFS namenode.
> 
> To solve this problem, I would like to propose a sort-based physical executor for column partition store. It assumes that input tuples are sorted in an ascending or descending order of partition keys. It means that it needs extra sort operation. But, it opens only one file simultaneously. It writes all data sequentially. In many cases, it would be the best choice for column partition store.
> 
> 
> Diffs
> -----
> 
>   CHANGES.txt f038f979d1e8732fc380fe878217be90df8d0153 
>   tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/statistics/StatisticsUtil.java b83681840d387adcb0859a76c4d81a3a1c28b063 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java 202c59db1740145e84df1925b4650ff2a1de8627 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/enforce/Enforcer.java d7c3ba411536275634c9f1092dc6df5ad06400d8 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/BasicPhysicalExecutorVisitor.java 67d6baa9192b5a4cd442071e27b415e4867cda58 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionedStoreExec.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColumnPartitionedTableStoreExec.java cee9bbafbe3585a7ac0c566669931f8c42336493 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalExecutorVisitor.java 9ede15d3228550cfec5ddaff9ec387b2ee79f56c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SortBasedColPartitionStoreExec.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/proto/TajoWorkerProtocol.proto 9aa6d865e696404257df627354a94b77cf7eb9e1 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestPhysicalPlanner.java ab27a458633416cb8e28fadbf08685c085fdfa08 
> 
> Diff: https://reviews.apache.org/r/17633/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>


Re: Review Request 17633: TAJO-574: Add a sort-based physical executor for column partition store

Posted by Jihoon Son <ji...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17633/#review33385
-----------------------------------------------------------

Ship it!


+1. I left a trivial comment. 


tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionedStoreExec.java
<https://reviews.apache.org/r/17633/#comment62844>

    Would you change the name to ColPartitionStoreExec to be consistent with HashBasedColPartitionStoreExec and SortBasedColPartitionStoreExec?


- Jihoon Son


On Feb. 1, 2014, 4:04 a.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17633/
> -----------------------------------------------------------
> 
> (Updated Feb. 1, 2014, 4:04 a.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-574
>     https://issues.apache.org/jira/browse/TAJO-574
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> ColumnPartitionStoreExec keeps numerous open files while it is storing all data. In addition, it's random write gives burden to HDFS namenode.
> 
> To solve this problem, I would like to propose a sort-based physical executor for column partition store. It assumes that input tuples are sorted in an ascending or descending order of partition keys. It means that it needs extra sort operation. But, it opens only one file simultaneously. It writes all data sequentially. In many cases, it would be the best choice for column partition store.
> 
> 
> Diffs
> -----
> 
>   CHANGES.txt f038f979d1e8732fc380fe878217be90df8d0153 
>   tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/statistics/StatisticsUtil.java b83681840d387adcb0859a76c4d81a3a1c28b063 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java 202c59db1740145e84df1925b4650ff2a1de8627 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/enforce/Enforcer.java d7c3ba411536275634c9f1092dc6df5ad06400d8 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/BasicPhysicalExecutorVisitor.java 67d6baa9192b5a4cd442071e27b415e4867cda58 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionedStoreExec.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColumnPartitionedTableStoreExec.java cee9bbafbe3585a7ac0c566669931f8c42336493 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalExecutorVisitor.java 9ede15d3228550cfec5ddaff9ec387b2ee79f56c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SortBasedColPartitionStoreExec.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/proto/TajoWorkerProtocol.proto 9aa6d865e696404257df627354a94b77cf7eb9e1 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestPhysicalPlanner.java ab27a458633416cb8e28fadbf08685c085fdfa08 
> 
> Diff: https://reviews.apache.org/r/17633/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>


Re: Review Request 17633: TAJO-574: Add a sort-based physical executor for column partition store

Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17633/
-----------------------------------------------------------

(Updated Feb. 1, 2014, 1:04 p.m.)


Review request for Tajo.


Bugs: TAJO-574
    https://issues.apache.org/jira/browse/TAJO-574


Repository: tajo


Description (updated)
-------

ColumnPartitionStoreExec keeps numerous open files while it is storing all data. In addition, it's random write gives burden to HDFS namenode.

To solve this problem, I would like to propose a sort-based physical executor for column partition store. It assumes that input tuples are sorted in an ascending or descending order of partition keys. It means that it needs extra sort operation. But, it opens only one file simultaneously. It writes all data sequentially. In many cases, it would be the best choice for column partition store.


Diffs
-----

  CHANGES.txt f038f979d1e8732fc380fe878217be90df8d0153 
  tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/statistics/StatisticsUtil.java b83681840d387adcb0859a76c4d81a3a1c28b063 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java 202c59db1740145e84df1925b4650ff2a1de8627 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/enforce/Enforcer.java d7c3ba411536275634c9f1092dc6df5ad06400d8 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/BasicPhysicalExecutorVisitor.java 67d6baa9192b5a4cd442071e27b415e4867cda58 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionedStoreExec.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ColumnPartitionedTableStoreExec.java cee9bbafbe3585a7ac0c566669931f8c42336493 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/PhysicalExecutorVisitor.java 9ede15d3228550cfec5ddaff9ec387b2ee79f56c 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SortBasedColPartitionStoreExec.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/proto/TajoWorkerProtocol.proto 9aa6d865e696404257df627354a94b77cf7eb9e1 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestPhysicalPlanner.java ab27a458633416cb8e28fadbf08685c085fdfa08 

Diff: https://reviews.apache.org/r/17633/diff/


Testing (updated)
-------

mvn clean install


Thanks,

Hyunsik Choi