You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/10/01 00:30:33 UTC

[jira] [Commented] (TAJO-1081) Non-forwarded (simple) query shows wrong rows.

    [ https://issues.apache.org/jira/browse/TAJO-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153890#comment-14153890 ] 

Hyunsik Choi commented on TAJO-1081:
------------------------------------

Thank you for sharing your investigation.

I also investigated the problem because I cannot figure out what is the main cause. 

Your investigation is partially right. We use the number of rows in the catalog that we can obtain the number by executing "\d". Sometimes, it is not available especially when users register external tables. As you mentioned, we need {{select count(*) from..} queries.

But, Tajo client uses the number of rows as supplementary information for LIMIT clause or displaying. Even though the number is not available, Tajo client can work well because Tajo client can read rows until scanner reaches out the end of tuples.

I also found the bug from PlannerUtil::getNonZeroLengthDataFiles() method. This method should have used AbstractStorageManager.hiddenFileFilter in order to skip hidden files which has prefix {{.}}. The current implementation reads all files even some files are not valid.

If you want, you can keep going this issue. Otherwise, I can take this issue.

Best regards,
Hyunsik

> Non-forwarded (simple) query shows wrong rows.
> ----------------------------------------------
>
>                 Key: TAJO-1081
>                 URL: https://issues.apache.org/jira/browse/TAJO-1081
>             Project: Tajo
>          Issue Type: Bug
>          Components: client, tajo master
>            Reporter: Hyunsik Choi
>            Assignee: Mai Hai Thanh
>            Priority: Blocker
>             Fix For: 0.9.0
>
>
> Non-forward queries show wrong rows. It is the very urgent and critical bug that must be resolved before 0.9.0 release.
> {code}
> default> \d region
> table name: default.region
> table path: file:/Users/hyunsik/tpch/region
> store type: CSV
> number of rows: 0
> volume: 494 B
> Options: 
> 	'csvfile.delimiter'='|'
> schema: 
> r_regionkey	INT8
> r_name	TEXT
> r_comment	TEXT
> default> 
> default> select * from region;
> r_regionkey,  r_name,  r_comment
> -------------------------------
> ,  ,  
> 2,  ,  
> ,  ,  
> ,  ,  
> ,  ,  
> ,  ,  
> ,  ,  
> ,  ,  
> ,  ,  
> ,  "
>    � (0,  
> 0,  AFRICA,  lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to 
> 1,  AMERICA,  hs use ironic, even requests. s
> 2,  ASIA,  ges. thinly even pinto beans ca
> 3,  EUROPE,  ly final courts cajole furiously final excuse
> 4,  MIDDLE EAST,  uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
> (15 rows, 0.03 sec, 494 B selected)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)