You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (JIRA)" <ji...@apache.org> on 2019/01/08 15:34:00 UTC

[jira] [Commented] (IMPALA-7816) Race condition in HdfsScanNodeBase::StopAndFinalizeCounters

    [ https://issues.apache.org/jira/browse/IMPALA-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737237#comment-16737237 ] 

Sahil Takiar commented on IMPALA-7816:
--------------------------------------

Would it make more sense to just wait for all the scanners to call {{HdfsParquetScanner::Close}} before calling {{HdfsScanNodeBase::StopAndFinalizeCounters}}? Seems odd that a scan-node can be closed before its corresponding scanners get closed. Doing this would make the code easier to understand, I would guess most devs would assume this to be true, which is probably how the bug was introduced in the first place. There might be other race conditions in the code as well due to this behavior, although I haven't been able to produce any more.

> Race condition in HdfsScanNodeBase::StopAndFinalizeCounters
> -----------------------------------------------------------
>
>                 Key: IMPALA-7816
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7816
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>              Labels: parquet
>
> While working on IMPALA-6964, I noticed that sometimes the runtime profile for a {{HDFS_SCAN_NODE}} will include {{File Formats: PARQUET/NONE:2}} and sometimes it won't (depending on the query). However, looking at the code, any scan of Parquet files should include this line.
> I debugged the code and there seems to a be a race condition where {{HdfsScanNodeBase::StopAndFinalizeCounters}} can be called before {{HdfsParquetScanner::Close}} is called for all the scan ranges. This causes the {{File Formats}} issue above because {{HdfsParquetScanner::Close}} calls {{HdfsScanNodeBase::RangeComplete}} which updates the shared object {{file_type_counts_}}, which is read in {{StopAndFinalizeCounters}} (so {{StopAndFinalizeCounters}} will write out the contents of {{file_type_counts_}} before all scanners can update it).
> {{StopAndFinalizeCounters}} can be called in two places: {{HdfsScanNodeBase::Close}} and in {{HdfsScanNode::GetNext}}. It can be called in {{GetNext}} when {{GetNextInternal}} reads enough rows to cross the query defined limit. So {{GetNext}} will call {{StopAndFinalizeCounters}} once the limit is reached, but not necessarily before the scanners are closed.
> I'm able to re-produce this locally by using the queries:
> {code:java}
>  select * from functional_parquet.lineitem_sixblocks limit 10 {code}
> The runtime profile does not include {{File Formats}}
> {code:java}
>  select * from functional_parquet.lineitem_sixblocks order by l_orderkey limit 10 {code}
> The runtime profile does include {{File Formats}}
> I tried to simply remove the call to {{StopAndFinalizeCounters}} from {{GetNext}} but that doesn't seem to work. It actually caused several other RP messages to get deleted (not entirely sure why).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org