You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2016/05/06 14:40:13 UTC

[jira] [Commented] (KUDU-1440) Wrong result ordering for scanning a table with millions of rows

    [ https://issues.apache.org/jira/browse/KUDU-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274136#comment-15274136 ] 

Jean-Daniel Cryans commented on KUDU-1440:
------------------------------------------

Hi Martin,

Results are not guaranteed to be returned in order, they are instead returned per DiskRowSets, which are internally sorted, and if you stop inserting the RowSet compactions will eventually sort the DiskRowSets between themselves.

See this related patch that's up for review to add a way to get rows in order (per tablet) in the Java client: http://gerrit.cloudera.org:8080/#/c/2951/

But the main problem here is that, with hash partitioning, we still scan one tablet at a time so if you were to run a full table scan you'd still not get rows in total order.

> Wrong result ordering for scanning a table with millions of rows
> ----------------------------------------------------------------
>
>                 Key: KUDU-1440
>                 URL: https://issues.apache.org/jira/browse/KUDU-1440
>             Project: Kudu
>          Issue Type: Bug
>          Components: client, master, tablet
>    Affects Versions: 0.8.0
>         Environment: CentOS 7
>            Reporter: Martin Weindel
>            Priority: Critical
>         Attachments: CreateTableTimeSeriesBug.java
>
>
> I have following simple table with two columns:
> {code}
> time TIMESTAMP,
> value FLOAT
> {code}
> The time column is used as range partition key.
> If I have understand the architecture of Kudu correctly, the rows should then be returned in ascending order for the time column.
> This works as long as not more than about 600000 rows are inserted.
> If the number of inserted rows is above 1 mio, the order is messed up globally. On a microlevel it is still correct 99.9% if you look on successive rows.
> My setup is single master / single tablet server on a linux server. The table is created, filled and read with the Kudu Java client version 0.8.0.
> See attached Java code to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)