You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/09/01 16:42:45 UTC

[jira] [Commented] (TAJO-1340) Change the default output file format.

    [ https://issues.apache.org/jira/browse/TAJO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725492#comment-14725492 ] 

ASF GitHub Bot commented on TAJO-1340:
--------------------------------------

Github user jinossy commented on the pull request:

    https://github.com/apache/tajo/pull/671#issuecomment-136745852
  
    I've tested JDBC performance on laptop because my testing network environment is 1Gbps
    TPC-H scale-3 lineitem
    Query: select * from lineitem where 1=1
    
    Before
    ```
    Serialize per row + Text
    107 sec //avg 30MByte/sec
    ```
    After
    ```
    Serialize Row-Block + Text
    80 sec 
    Serialize Row-Block + DRAW
    30 sec
    ```


> Change the default output file format.
> --------------------------------------
>
>                 Key: TAJO-1340
>                 URL: https://issues.apache.org/jira/browse/TAJO-1340
>             Project: Tajo
>          Issue Type: Improvement
>            Reporter: Hyunsik Choi
>            Assignee: Jinho Kim
>             Fix For: 0.11.0
>
>
> Currently, the default output file is CSV. Due to its nature, CSV has mainly three problems:
>  * Its line or field delimiter can be duplicated to some character included in the result data.
>  * Plan text file is likely to be larger than other file formats.
>  * Its read and write performance is slow.
> We need to change the default output file format into other file formats. We also need to investigate which file format is the best for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)