You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/02/01 05:04:09 UTC

[jira] [Commented] (TAJO-574) Add a sort-based physical executor for column partition store

    [ https://issues.apache.org/jira/browse/TAJO-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888442#comment-13888442 ] 

Hyunsik Choi commented on TAJO-574:
-----------------------------------

Created a review request against branch master in reviewboard 
https://reviews.apache.org/r/17633/


> Add a sort-based physical executor for column partition store
> -------------------------------------------------------------
>
>                 Key: TAJO-574
>                 URL: https://issues.apache.org/jira/browse/TAJO-574
>             Project: Tajo
>          Issue Type: New Feature
>          Components: physical operator
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating
>
>         Attachments: TAJO-574.patch
>
>
> ColumnPartitionStoreExec keeps numerous open files while it is storing all data. In addition, it's random write gives burden to HDFS namenode.
> To solve this problem, I would like to propose a sort-based physical executor for column partition store. It assumes that input tuples are sorted in an ascending or descending order of partition keys. It means that it needs extra sort operation. But, it opens only one file simultaneously. It writes all data sequentially. In many cases, it would be the best choice for column partition store.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)