You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/07/10 13:32:04 UTC

[jira] [Created] (TAJO-931) Output file can be punctuated depending on the file size.

Hyunsik Choi created TAJO-931:
---------------------------------

             Summary: Output file can be punctuated depending on the file size.
                 Key: TAJO-931
                 URL: https://issues.apache.org/jira/browse/TAJO-931
             Project: Tajo
          Issue Type: Bug
          Components: physical operator
            Reporter: Hyunsik Choi
             Fix For: 0.9.0


There are some file formats (e.g., Parquet) which are not splittable. They can usually span multiple HDFS blocks if one file is very large. It causes remote HDFS access and limits the parallel degree, resulting in significant performance degradation.

We can solve this problem if StoreTableExec or {Col|SortBased}PartitionStoreExec can punctuate the final output file according to the written size.

In addition, we need to support a session variable to determine the per file size of final output files. So, TAJO-928 is a block of this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)