You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by jihoonson <gi...@git.apache.org> on 2015/07/17 12:04:38 UTC

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

GitHub user jihoonson opened a pull request:

    https://github.com/apache/tajo/pull/634

    TAJO-1343: Improve the memory usage of physical executors

    Here are some highlights.
    * Cleanup duplicated tuple creation.
    * To maintain tuple creation easily, I've added three new data structures, called TupleMap, TupleList, and TupleSet. They automatically clone tuples whenever a new item is added.
    * Fix a bug in WindowAggregation and two wrong tests.
    * Fix a bug in timezone.
    
    I will conduct some tests and share the results.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jihoonson/tajo-2 TAJO-1343_2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #634
    
----
commit 6b8ccc2f36712bd6c8d4d5610cbfdac70d1afba0
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-13T06:56:49Z

    TAJO-1343_2

commit b9bec79372d010f8645067237c9656cc4c0e60d4
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-13T07:42:08Z

    TAJO-1343_2

commit e9a40da63ee7376cd60ba537d1f5ab06f636bca6
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-13T08:49:23Z

    TAJO-1343_2

commit 877c20250d196954edf49ccfb70fd20a520b2985
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-14T01:34:57Z

    Fix a bug in window function

commit f5fa81d82901304833ae7b65c7f3c82044c40acb
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-14T13:55:44Z

    Remaining OOM

commit f6193e6881eb15d1e2be8d7594206e2ea2071f9c
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-15T10:10:13Z

    Fix OOM

commit dd91fdd31ec0c7ca561f29f2debf402d8dcf2be3
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-15T13:18:14Z

    Fix test failures

commit 482072fd9eaa4eccc40c12c9f32c0c822d67d3a8
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-16T08:07:35Z

    Fixed RangeShuffleWriteExec bug

commit d3793e4c5bb8234bf31bd99c55590b7065d27386
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-16T16:46:54Z

    Passed all tests

commit 230d57033791dd3dd78e560f7db36a9bbc0451d5
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-17T09:53:58Z

    Fix every test failure and add comments

commit 5ebe05dc9f7464af040050e0ec17bf2ca0cf4fdc
Author: Jihoon Son <ji...@apache.org>
Date:   2015-07-17T09:55:11Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1343_2
    
    Conflicts:
    	tajo-core/src/test/java/org/apache/tajo/engine/planner/TestLogicalPlanner.java

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/634#issuecomment-122884134
  
    I've added a configuration ```tajo.executor.memory-tuple-slot-num``` to set the memory tuple slot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/634#discussion_r35181083
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/storage/VTuple.java ---
    @@ -82,6 +83,7 @@ public void clear() {
       // Setter
       //////////////////////////////////////////////////////
       public void put(int fieldId, Datum value) {
    +    this.offset = -1;
    --- End diff --
    
    Does every put needs the offset initialization?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/634


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on the pull request:

    https://github.com/apache/tajo/pull/634#issuecomment-123561701
  
    +1 Looks great to me!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/634#issuecomment-123562379
  
    +1
    The patch looks good to me. I leaved one trivial issue. It's up to you.
    
    In addition, you added the removal code in QueryContext.
    https://github.com/apache/tajo/pull/634/files#diff-e0e47538675376e2e0b96ce307a9676bR49
    
    I think that it would be handled in other places. But, the current workaround looks good to me. I'll solve it in another jira.
    
    Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/634#issuecomment-123630807
  
    Thanks guys for your review.
    I've just committed to master after removing implicit clear of offset per put() call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/634#issuecomment-122641665
  
    I've tested this patch on my laptop using YourKit.
    Here are some highlights of the results.
    
    ### Query and data
    I used the following query on TPC-H data set with the scale factor of 1.
    ```
    default> select count(distinct l_partkey) from lineitem;
    ```
    
    ### 1. Tuple creation
    The following numbers were randomly captured during executing the second phase of distinct aggregation. So, it would not be the exact comparison, but is valuable I think.
    
    #### Master
    | Class        | Objects    | Shallow Size  | Retained Size |
    | ------------- |:-------------:| ------------------:| -------------------:|
    |org.apache.tajo.storage.VTuple | 3,899,275 | 93,582,600 | 112,627,520 |
    |org.apache.tajo.datum.Datum[] | 3,899,275 | 87,335,008 | 96,857,392 |
    
    #### Tajo-1343
    | Class        | Objects    | Shallow Size  | Retained Size |
    | ------------- |:-------------:| ------------------:| -------------------:|
    |org.apache.tajo.engine.planner.physical.KeyTuple | 2,573,412 | 82,349,184 | 95,913,472 |
    |org.apache.tajo.storage.VTuple | 398,023 | 9,552,552 | 9,552,984 |
    |org.apache.tajo.datum.Datum[] | 2,971,439 | 71,314,440 | 78,096,672 |
    
    As can be seen in the above result, the total size of generated tuples in TAJO-1343 is less than that in master. The difference is 7,161,064.
    
    ### 2. Memory usage
    The following graphs show the changes of memory usage during query execution.
    
    ##### < Memory usage of with master >
    ![memory usage of with master](https://issues.apache.org/jira/secure/attachment/12745995/master-memory.png)
    ##### < Memory usage of with TAJO-1343 >
    ![memory usage of with TAJO-1343](https://issues.apache.org/jira/secure/attachment/12745994/1343-memory.png)
    
    The gray part in each graph represents the change of memory usage during query execution.
    As can be seen in the graphs, memory usage change with TAJO-1343 is more gracefully than that with master.
    
    The following numbers were captured when the query is finished because it's the time when the numbers become the maximum. 
    
    | branch name | Allocated all pools        | Used PS Eden Space    | Used PS Survivor Space  | Used PS Old Gen | # of GCs |
    | ------------- | ------------- |:-------------:| ------------------:| -------------------:| -------------------:|
    | master | 1006 | 634 | 22 | 63 | 4 |
    | TAJO-1343 | 1002 | 518 | 43 | 71 | 3 |
    
    The amount of ```allocated all pools``` with TAJO-1343 is similar to that of master. However, ```used PS eden space``` with TAJO-1343 is about 100 MB less than that with master. This means that the less objects are newly created during query execution. 
    
    In this test, the numbers of GCs with both branches are similar, so it is of less significance. However, I think that the difference in number of GCs will be increased with more complex queries, thereby more significantly affecting to the query performance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/634#issuecomment-122896826
  
    I've reverted the last changes because it makes unexpected errors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/634#issuecomment-123576450
  
    +1 ship it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1343: Improve the memory usage of physical...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/634#issuecomment-122748423
  
    @jinossy thanks for your review. 
    I've added ```clear()``` in ```put()``` method because I was considering unexpected errors when the datum array of different length is put.
    As you think, this may accompany large overhead. I will remove it after some test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---