You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "guojingfeng (Jira)" <ji...@apache.org> on 2020/11/18 06:25:00 UTC
[jira] [Closed] (IMPALA-10310) Couldn't skip rows in parquet file
[ https://issues.apache.org/jira/browse/IMPALA-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
guojingfeng closed IMPALA-10310.
--------------------------------
Resolution: Fixed
> Couldn't skip rows in parquet file
> ----------------------------------
>
> Key: IMPALA-10310
> URL: https://issues.apache.org/jira/browse/IMPALA-10310
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.4.0
> Reporter: guojingfeng
> Assignee: guojingfeng
> Priority: Critical
>
> When hdfs-parquet-scanner thread assigned ScanRanges that contains multi RowGroups,
> error process skip rows logic with PageIndex.
> Below is the error log:
> {code:java}
> I1028 17:59:16.694046 1414911 status.cc:68] 1447f227b73a4d78:92d9a82600000fd1] Could not read definition level, even though metadata states there are 0 values remaining in data page. file=hdfs://path/to/file
> @ 0xbf4286
> @ 0x17bc0eb
> @ 0x17737f7
> @ 0x1773a0e
> @ 0x1773d8a
> @ 0x1774028
> @ 0x17b9517
> @ 0x174a22b
> @ 0x17526fe
> @ 0x140a78a
> @ 0x1525908
> @ 0x1526a03
> @ 0x10e6169
> @ 0x10e84c9
> @ 0x10c7a86
> @ 0x13750ba
> @ 0x1375f89
> @ 0x1b49679
> @ 0x7ffb2eee1e24
> @ 0x7ffb2bad935c
> I1028 17:59:16.694074 1414911 status.cc:126] 1447f227b73a4d78:92d9a82600000fd1] Couldn't skip rows in file hdfs://path/to/file
> @ 0xbf5259
> @ 0x1773a8a
> @ 0x1773d8a
> @ 0x1774028
> @ 0x17b9517
> @ 0x174a22b
> @ 0x17526fe
> @ 0x140a78a
> @ 0x1525908
> @ 0x1526a03
> @ 0x10e6169
> @ 0x10e84c9
> @ 0x10c7a86
> @ 0x13750ba
> @ 0x1375f89
> @ 0x1b49679
> @ 0x7ffb2eee1e24
> @ 0x7ffb2bad935c
> I1028 17:59:16.694101 1414911 runtime-state.cc:207] 1447f227b73a4d78:92d9a82600000fd1] Error from query 1447f227b73a4d78:92d9a82600000000: Couldn't skip rows in file hdfs://path/to/file.
> {code}
> On debug build the error log is that:
> {code:java}
> F1030 14:06:38.700459 3148733 parquet-column-readers.cc:1258] 994968c01171b0bc:eab92b3f0000000a] Check failed: num_buffered_values_ >= num_rows (20000 vs. 40000)
> *** Check failure stack trace: ***
> @ 0x4e9322c google::LogMessage::Fail()
> @ 0x4e94ad1 google::LogMessage::SendToLog()
> @ 0x4e92c06 google::LogMessage::Flush()
> @ 0x4e961cd google::LogMessageFatal::~LogMessageFatal()
> @ 0x2bfa2c3 impala::BaseScalarColumnReader::SkipTopLevelRows()
> @ 0x2bf9fcc impala::BaseScalarColumnReader::StartPageFiltering()
> @ 0x2bf99b4 impala::BaseScalarColumnReader::ReadDataPage()
> @ 0x2bfbad8 impala::BaseScalarColumnReader::NextPage()
> @ 0x2c5bc8c impala::ScalarColumnReader<>::ReadValueBatch<>()
> @ 0x2c1a67a impala::ScalarColumnReader<>::ReadNonRepeatedValueBatch()
> @ 0x2bae010 impala::HdfsParquetScanner::AssembleRows()
> @ 0x2ba8934 impala::HdfsParquetScanner::GetNextInternal()
> @ 0x2ba68ac impala::HdfsParquetScanner::ProcessSplit()
> @ 0x27d8d0b impala::HdfsScanNode::ProcessSplit()
> @ 0x27d7ee0 impala::HdfsScanNode::ScannerThread()
> @ 0x27d723d _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @ 0x27d9831 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @ 0x1fc4d9b boost::function0<>::operator()()
> @ 0x258590e impala::Thread::SuperviseThread()
> @ 0x258db92 boost::_bi::list5<>::operator()<>()
> @ 0x258dab6 boost::_bi::bind_t<>::operator()()
> @ 0x258da79 boost::detail::thread_data<>::run()
> @ 0x3db95c9 thread_proxy
> @ 0x7febc66e6e24 start_thread
> @ 0x7febc313135c __clone
> Picked up JAVA_TOOL_OPTIONS: -Xms34359738368 -Xmx34359738368 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/28ecfee554b03954bac9e77a73f4ce0c_pid2802027.hprof
> Wrote minidump to /path/to/minidumps/74dae046-c19d-4ad5-ea2603ae-ff139f7e.dmp
> {code}
>
> All parquet files are generated by spark with 128MB size of row group as default configuration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org