You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/10/12 16:50:00 UTC

[jira] [Commented] (IMPALA-10233) Hit DCHECK in DmlExecState::AddPartition when inserting to a partitioned table with zorder

    [ https://issues.apache.org/jira/browse/IMPALA-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212500#comment-17212500 ] 

Tim Armstrong commented on IMPALA-10233:
----------------------------------------

[~luksan] [~boroknagyz] this seems bad.

> Hit DCHECK in DmlExecState::AddPartition when inserting to a partitioned table with zorder
> ------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10233
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10233
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Blocker
>              Labels: crash
>
> Hit the DCHECK when inserting to a partitioned parquet table with zorder. I'm on master branch (commit=b8a2b75).
> {code:java}
> F1012 15:04:27.726274  3868 dml-exec-state.cc:432] a6479cc4725101fd:b86db2a100000003] Check failed: per_partition_status_.find(name) == per_partition_status_.end() 
> *** Check failure stack trace: *** 
>     @          0x51ff3cc  google::LogMessage::Fail()
>     @          0x5200cbc  google::LogMessage::SendToLog()
>     @          0x51fed2a  google::LogMessage::Flush()
>     @          0x5202928  google::LogMessageFatal::~LogMessageFatal()
>     @          0x234ba18  impala::DmlExecState::AddPartition()
>     @          0x2817786  impala::HdfsTableSink::GetOutputPartition()
>     @          0x2813151  impala::HdfsTableSink::WriteClusteredRowBatch()
>     @          0x28156c4  impala::HdfsTableSink::Send()
>     @          0x23139dd  impala::FragmentInstanceState::ExecInternal()
>     @          0x230fe10  impala::FragmentInstanceState::Exec()
>     @          0x227bb79  impala::QueryState::ExecFInstance()
>     @          0x2279f7b  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
>     @          0x227e2c2  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
>     @          0x2137699  boost::function0<>::operator()()
>     @          0x2715d7d  impala::Thread::SuperviseThread()
>     @          0x271dd1a  boost::_bi::list5<>::operator()<>()
>     @          0x271dc3e  boost::_bi::bind_t<>::operator()()
>     @          0x271dbff  boost::detail::thread_data<>::run()
>     @          0x3f05f01  thread_proxy
>     @     0x7fb18bebb6b9  start_thread
>     @     0x7fb188a474dc  clone {code}
> It seems the zorder sort node doesn't keep the rows sorted by partition keys. Thus violates the assumption of HdfsTableSink::WriteClusteredRowBatch() that input must be ordered by the partition key expressions. So a partition key was deleted and then inserted again to the {{partition_keys_to_output_partitions_}} map.
> {code:c++}
>   /// Maps all rows in 'batch' to partitions and appends them to their temporary Hdfs
>   /// files. The input must be ordered by the partition key expressions.
>   Status WriteClusteredRowBatch(RuntimeState* state, RowBatch* batch) WARN_UNUSED_RESULT;
> {code}
> The key got removed here: https://github.com/apache/impala/blob/b8a2b754669eb7f8d164e8112e594ac413e436ef/be/src/exec/hdfs-table-sink.cc#L334 when processing a new partition key.
> It got reinserted here: https://github.com/apache/impala/blob/b8a2b754669eb7f8d164e8112e594ac413e436ef/be/src/exec/hdfs-table-sink.cc#L590 so hit the DCHECK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org