You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by blrunner <gi...@git.apache.org> on 2016/04/04 18:48:46 UTC

[GitHub] tajo pull request: TAJO-2111: Optimize Partition Table Split Compu...

GitHub user blrunner opened a pull request:

    https://github.com/apache/tajo/pull/994

    TAJO-2111: Optimize Partition Table Split Computation for Amazon S3

    It depends on https://github.com/apache/tajo/pull/846 and https://github.com/apache/tajo/pull/953

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/blrunner/tajo s3-split-improvement

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/994.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #994
    
----
commit 0d2a634d2353efdeecced4729be9f585789acdb1
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-10-28T08:22:40Z

    Implement PartitionedFileFragment

commit 4d7e73b7b20d1e6721b0f6b2ee53c4d04eb278d4
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-10-28T09:10:53Z

    Add unit test cases for PartitionedFileFragment

commit 6fab5adadb303e690f7377547f842f84eb1f9286
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-10-29T07:25:47Z

    Add PartitionedTableUtil for finding filtered partition directories.

commit b3bbcd188b0afc3b977f85005c0dffa20a8312dc
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T06:57:39Z

    Remove the array of partition directories of rerwrite rule and apply PartitionedFileFragment.

commit 25163d0cdade5f45e7e524db4ceac4250b7ea805
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T07:01:56Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1952
    
    Conflicts:
    	tajo-core-tests/src/test/java/org/apache/tajo/engine/planner/physical/TestPhysicalPlanner.java

commit 4f711fa2ff7a18979198d80a70f283f73b91edf9
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T07:14:49Z

    Remove unnecessary method

commit 33dc1407a3d1417a81895e5e36d528f64c88bbbe
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T07:22:24Z

    Update comments

commit dede3e2957a2cee7bccd235a3f873aac0ab40377
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T07:40:15Z

    Remove unnecessary constructor parameter of PhysicalPlannerImpl

commit ccc4f6cb2e12bd642d00be08f393f6754e74db7f
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T07:55:16Z

    Remove unnecessary parameter of PartitionedTableUtil::buildTupleFromPartitionName

commit d5f563a1d6764f21f80e91a2540a9de5330a38cf
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T07:59:32Z

    Update wrong indent

commit 086b02beb700e125a6ba37cbe275965150a89183
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T07:59:57Z

    Remove unused package

commit 22731ec4a13f1ad0e75d7987966c17715afbeb52
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T08:20:01Z

    Update wrong comparison operator

commit 437f5ecdc7fad8b056bb638ea0897cd6e455b9b8
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-02T08:24:05Z

    Update log message

commit d76f41aac39e7536f4acac265559fb136aa05b71
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-03T00:33:16Z

    When rewriting PartitionedTableScanNode, set partition paths and table volume.

commit 126f5e06de3aa88563281fd0c382d03f4afab5bf
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-03T00:36:27Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1952

commit 9112ceb61547667020423bf4fbe18f99c07c2539
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-03T01:47:32Z

    Update the result message of partition pruning

commit 71d65a5dec1571852448f0b349e121a9a0268a5e
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-05T08:57:54Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1952

commit c53dfab62079707d1e106e81b1361bc5bc21d0ad
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-06T07:19:33Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1952

commit a3af9ad23f55f43e0af8e0f4f84159f8448c9795
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-10T00:43:47Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1952

commit 548bd4293fb3de0ab725bc2372d514ffd5a70a96
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-19T02:54:44Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1952
    
    Conflicts:
    	tajo-plan/src/main/java/org/apache/tajo/plan/logical/PartitionedTableScanNode.java

commit 66c1c496b9fb55ee2d872e843ffe2c2481adbd60
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-20T02:01:54Z

    Remove unused member variable.

commit 25e23666ce826c5e0f2a64726f8e4e73ab204c2e
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-20T02:06:54Z

    Remove unused method

commit c7f89f7b90cc65b8bdd294ab40426cebb73c99d0
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-20T02:14:53Z

    Separate partition processing logic from existing split method.

commit e670f25eab4965bd3d5bcfbaf0540194a7ed37d9
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-20T02:22:30Z

    Rename partitionName to partitionKeys in PartitionedFileFragmentProto

commit 83065d626fb2cc58e64fa17c3280191a44ddd471
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-21T15:16:13Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1952

commit c21d06538a7b99afd4d43ec73a8832d99754c598
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-21T15:26:50Z

    Rename PartitionedFileFragment to PartitionFileFragment

commit 9d92e540c1d65b073508e5f783dd6d8358663408
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-21T15:33:00Z

    Recover partition paths in LogicalNode

commit f9fcd273abb8960ff1663ffd181410d26a2a6681
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-21T15:35:43Z

    Add PartitionedTableWriter::buildTupleFromPartitionName

commit 344384b5e7708be4125ba7a3bf578f13030516b4
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-23T03:19:16Z

    PartitionedTableRewriter should set PartitionContent

commit b327993ba1c4749621f41fcdd67339551b64ef4e
Author: JaeHwa Jung <bl...@apache.org>
Date:   2015-11-23T03:22:21Z

    Remove unused packages

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2111: Optimize Partition Table Split Compu...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner closed the pull request at:

    https://github.com/apache/tajo/pull/994


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2111: Optimize Partition Table Split Compu...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/994#issuecomment-219369733
  
    I'll reopen this PR after finishing https://github.com/apache/tajo/pull/1020 and https://github.com/apache/tajo/pull/1024.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2111: Optimize Partition Table Split Compu...

Posted by blrunner <gi...@git.apache.org>.
Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/994#issuecomment-205623205
  
    Here is my benchmark results as follows.
    
    # Configuration
    
    * EC2 instance type : c3.xlarge
    * Cluster: 1 master, 3 worker
    * Dataset: TPC-H (factor = 1)
    * Partition table schema
    ```
    CREATE EXTERNAL TABLE lineitem_p (l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT, l_commitdate text, l_receiptdate text, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT)
    USING TEXT WITH ('text.delimiter'='|')
    PARTITION BY COLUMN(l_shipdate text)
    LOCATION 's3://Xyz';
    ```
    * Partition numbers of ``lineitem`` table: 2526 (each partitions includes just one file)
    
    # Queries
    * Q1: `` select * from lineitem_p limit 5; ``
    * Q2: `` select count(*) from lineitem_p; ``
    * Q3: `` select count(*) from lineitem_p where l_shipdate > '1994-09-25' and l_shipdate < '1994-10-10'; ``
    
    # Query Execution Time
    
    Query | No Optimized | Optimized | Improvement
    -------------------|----------------------|--------------------------|-------------------
    Q1 | 573.425 sec | 4.228 sec | 135.6x
    Q2 | 653.175 sec | 33.444 sec | 19.5x
    Q3 | 4.099 sec | 2.429 sec | 1.6x
    
    
    # Split Computation Time
    
    Query | No Optimized | Optimized | Improvement
    -------------------|----------------------|--------------------------|-------------------
    Q1 | 572921 ms  | 2233  ms | 256.5x
    Q2 | 599437 ms | 701 ms | 855.1x
    Q3 | 2537 ms | 388 ms | 6.5x


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---