You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "WangSheng (Jira)" <ji...@apache.org> on 2020/04/08 03:18:00 UTC

[jira] [Comment Edited] (IMPALA-9621) Support iceberg on hdfs

    [ https://issues.apache.org/jira/browse/IMPALA-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077775#comment-17077775 ] 

WangSheng edited comment on IMPALA-9621 at 4/8/20, 3:17 AM:
------------------------------------------------------------

Here are some of my thoughts:
* Refer to the implementation of kudu related operation, we need to implement IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, iceberg-scan-node.cc and so on;
* Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType;
* Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. If we use HiveCatalog, we can directly call the API of iceberg to create the table, impala does't need to create hms table independently. If we use HadoopTables, impala should create hms table firstly, and then call HadoopTables to create iceberg table, just like create kudu table;
* Iceberg now only support parquet file format, but may support orc in the future, so I'm not sure it is necessary to implement IcebergFileFormat like HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the second method is significantly simpler to implement.

And I try to split this task as some sub-task:
* Implement metadata related modification, such as create/drop/alter and so on;
* Support query/insert iceberg table;
* Do some query optimization to improve query performance;
* Other related work;

These are some of my simple ideas, more details still need to think. Hope you guys can give me some advice [~stigahuang][~tarmstrong]. Others are very welcome to give me more suggestions, thanks a lot!


was (Author: skyyws):
Here are some of my thoughts:
* Refer to the implementation of kudu related operation, we need to implement IcebergTable.java, IcebergColumn.java, IcebergScanNode.java, iceberg-scan-node.cc and so on;
* Use _ICEBERG_ as a new THdfsFileFormat, ICEBERG_TABLE as a new TTableType;
* Iceberg support two kinds api to create table: HiveCatalog and HadoopTables. If we use HiveCatalog, we can directly call the API of iceberg to create the table, impala does't need to create hms table independently. If we use HadoopTables, impala should create hms table firstly, and then call HadoopTables to create iceberg table, just like create kudu table;
* Iceberg now only support parquet file format, but may support orc in the future, so I'm not sure it is necessary to implement IcebergFileFormat like HdfsFileFormat, or just use 'STORED AS ICEBERG' at the beginning. And the second method is significantly simpler to implement.

And I try to split this task as some sub-task:
* Implement metadata related modification, such as create/drop/alter and so on;
* Support query/insert iceberg table;
* Do some query optimization to improve query performance;
* Other related work;
These are some of my simple ideas, more details still need to think. Hope you guys can give me some advice [~stigahuang][~tarmstrong]. Others are very welcome to give me more suggestions, thanks a lot!

> Support iceberg on hdfs
> -----------------------
>
>                 Key: IMPALA-9621
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9621
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: WangSheng
>            Assignee: WangSheng
>            Priority: Major
>
> We are investigating iceberg recently, and preparing to implement select iceberg data by impala. Our production use hdfs, so we will try to support iceberg on hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org