You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/09/01 16:42:45 UTC
[jira] [Commented] (TAJO-1340) Change the default output file
format.
[ https://issues.apache.org/jira/browse/TAJO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725492#comment-14725492 ]
ASF GitHub Bot commented on TAJO-1340:
--------------------------------------
Github user jinossy commented on the pull request:
https://github.com/apache/tajo/pull/671#issuecomment-136745852
I've tested JDBC performance on laptop because my testing network environment is 1Gbps
TPC-H scale-3 lineitem
Query: select * from lineitem where 1=1
Before
```
Serialize per row + Text
107 sec //avg 30MByte/sec
```
After
```
Serialize Row-Block + Text
80 sec
Serialize Row-Block + DRAW
30 sec
```
> Change the default output file format.
> --------------------------------------
>
> Key: TAJO-1340
> URL: https://issues.apache.org/jira/browse/TAJO-1340
> Project: Tajo
> Issue Type: Improvement
> Reporter: Hyunsik Choi
> Assignee: Jinho Kim
> Fix For: 0.11.0
>
>
> Currently, the default output file is CSV. Due to its nature, CSV has mainly three problems:
> * Its line or field delimiter can be duplicated to some character included in the result data.
> * Plan text file is likely to be larger than other file formats.
> * Its read and write performance is slow.
> We need to change the default output file format into other file formats. We also need to investigate which file format is the best for it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)