You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Csaba Ringhofer (Jira)" <ji...@apache.org> on 2022/06/15 19:16:00 UTC

[jira] [Assigned] (IMPALA-886) Always display HBase cols in same order as CREATE TABLE statement

     [ https://issues.apache.org/jira/browse/IMPALA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Csaba Ringhofer reassigned IMPALA-886:
--------------------------------------

    Assignee: Csaba Ringhofer

> Always display HBase cols in same order as CREATE TABLE statement
> -----------------------------------------------------------------
>
>                 Key: IMPALA-886
>                 URL: https://issues.apache.org/jira/browse/IMPALA-886
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 1.3
>            Reporter: John Russell
>            Assignee: Csaba Ringhofer
>            Priority: Minor
>              Labels: catalog-server, hbase, usability
>
> I noticed a discrepancy with Hive, in how Impala handles column order for HBase tables.
> I think it would be preferable to use the same behavior as Hive, otherwise life becomes
> more complicated for anyone doing INSERT or SELECT * with an HBase table through Impala.
> (And I have to add caveats and usage notes in the docs.)
> Repro:
> In HBase shell, create a table with a single column family. I think most Impala tests use 1 column family per column, where you won't notice this behavior.
> hbase(main):008:0> create 'sample_data_fast','cols'
> 0 row(s) in 71.8750 seconds
> In Hive shell, create a mapping table. Notice how DESCRIBE repeats back the columns in the same order as in CREATE TABLE.
> hive> create external table sample_data_fast (id string, val int, zfill string, name string, assertion boolean)
>     > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     > WITH SERDEPROPERTIES (
>     > "hbase.columns.mapping" =
>     > ":key,cols:val,cols:zfill,cols:name,cols:assertion")
>     > TBLPROPERTIES("hbase.table.name" = "sample_data_fast")
>     > ;
> OK
> Time taken: 1.7 seconds
> hive> desc sample_data_fast;
> OK
> id  string  from deserializer
> val int from deserializer
> zfill string  from deserializer
> name  string  from deserializer
> assertion boolean from deserializer
> Time taken: 0.302 seconds
> Now try the same DESCRIBE in impala-shell. The key column (id) is listed first. Then all the other columns, part of the same column family, are listed in alphabetical order rather than the order from CREATE TABLE:
> [localhost:21000] > desc sample_data_fast;
> Query: describe sample_data_fast
> +-----------+---------+---------+
> | name      | type    | comment |
> +-----------+---------+---------+
> | id        | string  |         |
> | assertion | boolean |         |
> | name      | string  |         |
> | val       | int     |         |
> | zfill     | string  |         |
> +-----------+---------+---------+
> Returned 5 row(s) in 0.02s
> Thus if you already had Hive code that was doing SELECT * from an HBase table like this, you would get a different result set (different column order) in Impala.
> If you tried to copy from an HDFS table via 'INSERT INTO hbase_table SELECT * FROM hdfs_table', you would get an error because the columns don't match. If you made a separate column family for each column, the discrepancy is masked because you need more than one column per column family to experience the alphabetical ordering.
> Since Hive is preserving the column order, the relevant info must be there in the metastore.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org