You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Andy Grove (JIRA)" <ji...@apache.org> on 2016/06/24 14:41:16 UTC

[jira] [Comment Edited] (KUDU-1493) Spark read fails if key columns are not leading columns

    [ https://issues.apache.org/jira/browse/KUDU-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348352#comment-15348352 ] 

Andy Grove edited comment on KUDU-1493 at 6/24/16 2:41 PM:
-----------------------------------------------------------

One application could write many DataFrames with different column ordering to the same table. The read operation should always return the columns in the order that you specify in your projection. If you don't provide a projection then I would expect the columns to be returned in the order they are defined in the kudu schema. As far as I know, this is the current behavior and is correct, in my opinion.

If you rely on ordering you should apply a projection onto the RDD that you read from Kudu e.g. "SELECT c, b, a FROM kudu_table" if using Spark SQL, rather than "SELECT * FROM kudu_table".

Databases usually make no guarantees about row or column ordering unless you are explicit in your query.




was (Author: andygrove):
One application could write many DataFrames with different column ordering to the same table. The read operation should always return the columns in the order that you specify in your projection. If you don't provide a projection then I would expect the columns to be returned in the order they are defined in the kudu schema. As far as I know, this is the current behavior and is correct, in my opinion.

If you rely on ordering you should apply a projection onto the RDD that you read from Kudu e.g. "SELECT c, b, a FROM kudu_table" if using SparkSQL, rather than "SELECT * FROM kudu_table".

SQL databases usually make no guarantees about row or column ordering unless you are explicit in your query.



> Spark read fails if key columns are not leading columns
> -------------------------------------------------------
>
>                 Key: KUDU-1493
>                 URL: https://issues.apache.org/jira/browse/KUDU-1493
>             Project: Kudu
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.9.0
>            Reporter: Tom White
>            Assignee: Andy Grove
>
> If the Spark dataframe schema is (A, B, C) then reading will fail if the Kudu keys are (A, C). Keys (A, B) work fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)