You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/10/14 22:49:00 UTC
[jira] [Commented] (IMPALA-9967) Scan orc failed when table
contains timestamp column
[ https://issues.apache.org/jira/browse/IMPALA-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17214324#comment-17214324 ]
ASF subversion and git services commented on IMPALA-9967:
---------------------------------------------------------
Commit 0c0985a825fba8d9702639e3e679d2e1b9070fe1 in impala's branch refs/heads/master from skyyws
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0c0985a ]
IMPALA-10159: Supporting ORC file format for Iceberg table
This patch mainly realizes querying Iceberg table with ORC
file format. We can using following SQL to create table with
ORC file format:
CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg.file_format'='orc', 'iceberg.catalog'='hadoop.tables');
But pay attention, there still some problems when scan ORC files
with Timestamp, more details please refer IMPALA-9967. We may add
new tests with Timestmap type after this JIRA fixed.
Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
Change-Id: Ib579461aa57348c9893a6d26a003a0d812346c4d
Reviewed-on: http://gerrit.cloudera.org:8080/16568
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
> Scan orc failed when table contains timestamp column
> ----------------------------------------------------
>
> Key: IMPALA-9967
> URL: https://issues.apache.org/jira/browse/IMPALA-9967
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.0
> Reporter: WangSheng
> Priority: Minor
> Labels: impala-iceberg
> Attachments: 00031-31-26ff2064-c8f2-467f-ab7e-1949cb30d151-00000.orc, 00031-31-334beaba-ef4b-4d13-b338-e715cdf0ef85-00000.orc
>
>
> Recently, when I test impala query orc table, I found that scanning failed when table contains timestamp column, here is there exception:
> {code:java}
> I0717 08:31:47.179124 78759 status.cc:129] 68436a6e0883be84:53877f7200000002] Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-00000.orc: Unknown type kind
> @ 0x1c9f753 impala::Status::Status()
> @ 0x27aa049 impala::HdfsOrcScanner::ProcessFileTail()
> @ 0x27a7fb3 impala::HdfsOrcScanner::Open()
> @ 0x27365fe impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
> @ 0x28cb379 impala::HdfsScanNode::ProcessSplit()
> @ 0x28caa7d impala::HdfsScanNode::ScannerThread()
> @ 0x28c9de5 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @ 0x28cc19e _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @ 0x2053333 boost::function0<>::operator()()
> @ 0x2675d93 impala::Thread::SuperviseThread()
> @ 0x267dd30 boost::_bi::list5<>::operator()<>()
> @ 0x267dc54 boost::_bi::bind_t<>::operator()()
> @ 0x267dc15 boost::detail::thread_data<>::run()
> @ 0x3e3c3c1 thread_proxy
> @ 0x7f32360336b9 start_thread
> @ 0x7f3232bfe41c clone
> I0717 08:31:47.325670 78759 hdfs-scan-node.cc:490] 68436a6e0883be84:53877f7200000002] Error preparing scanner for scan range hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-00000.orc(0:582). Encountered parse error in tail of ORC file hdfs://localhost:20500/test-warehouse/orc_scanner_test/00031-31-ac3cccf1-3ce7-40c6-933c-4fbd7bd57550-00000.orc: Unknown type kind
> {code}
> When I remove timestamp colum from table, and generate test data, query success. By the way, my test data is generated by spark.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org