You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Daniel Becker (Jira)" <ji...@apache.org> on 2022/07/22 13:16:00 UTC
[jira] [Assigned] (IMPALA-11345) Query failed when creating equal conjunction map for Parquet bloom filter

     [ https://issues.apache.org/jira/browse/IMPALA-11345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Becker reassigned IMPALA-11345:
--------------------------------------

    Assignee: Daniel Becker

> Query failed when creating equal conjunction map for Parquet bloom filter
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-11345
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11345
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Distributed Exec
>    Affects Versions: Impala 4.1.0
>         Environment: CentOS-7, Impala-4.1
>            Reporter: Yuchen Fan
>            Assignee: Daniel Becker
>            Priority: Critical
>
> When querying Hive table was added columns without using 'cascade', Impala will encounter error like "Unable to find SchemaNode for path 'db.table.column' in the schema of file 'hdfs://xxx/path/to/parquet_file_before_add_column'." I checked parquet file in error log and found that the schema is not compatible with table metadata. Call stack is attached as below. Path and table name is masked: 
> {code:java}
> I0609 18:04:25.970052 115413 status.cc:129] c94d0ab3fdf8f943:3203006100000002] Unable to find SchemaNode for path 'xxx_db.xxx_table.xxx_column' in the schema of file 'hdfs://xxx_nn/xxx_table_path/000000_0'.
>     @           0xea543b  impala::Status::Status()
>     @          0x1e3225c  impala::HdfsParquetScanner::CreateColIdx2EqConjunctMap()
>     @          0x1e363ea  impala::HdfsParquetScanner::Open()
>     @          0x19b40d0  impala::HdfsScanNodeBase::CreateAndOpenScannerHelper()
>     @          0x1b5cbae  impala::HdfsScanNode::ProcessSplit()
>     @          0x1b5e12a  impala::HdfsScanNode::ScannerThread()
>     @          0x1b5e9c6  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
>     @          0x18eafa9  impala::Thread::SuperviseThread()
>     @          0x18ee11a  boost::detail::thread_data<>::run()
>     @          0x2385510  thread_proxy
>     @     0x7fb5b0745162  start_thread
>     @     0x7fb5ad21df6c  __clone{code}
> The error may be relation with [IMPALA-10640|https://issues.apache.org/jira/browse/IMPALA-10640]. Bloom filter requires right  hand values of equal conjunction matches with current file schema. The filter will be unavailable if the column does not exist in all parquet files scanned. I think we can disable parquet bloom filter for this single query or scan node when discovered such situation.
> How to reproduce (using impala-shell):
>  # create table parquet_test (id INT) stored as parquet;
>  # insert into parquet_test values (1),(2),(3);
>  # alter table parquet_test add columns (name STRING);
>  # insert into parquet_test values (4, "James");
>  # select * from parquet_test where name in ("Lily");
>  # Error occured.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org