You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (JIRA)" <ji...@apache.org> on 2018/11/06 15:41:00 UTC

[jira] [Updated] (IMPALA-7816) Race condition in HdfsScanNodeBase::StopAndFinalizeCounters

     [ https://issues.apache.org/jira/browse/IMPALA-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahil Takiar updated IMPALA-7816:
---------------------------------
    Description: 
While working on IMPALA-6964, I noticed that sometimes the runtime profile for a {{HDFS_SCAN_NODE}} will include {{File Formats: PARQUET/NONE:2}} and sometimes it won't (depending on the query). However, looking at the code, any scan of Parquet files should include this line.

I debugged the code and there seems to a be a race condition where {{HdfsScanNodeBase::StopAndFinalizeCounters}} can be called before {{HdfsParquetScanner::Close}} is called for all the scan ranges. This causes the {{File Formats}} issue above because {{HdfsParquetScanner::Close}} calls {{HdfsScanNodeBase::RangeComplete}} which updates the shared object {{file_type_counts_}}, which is read in {{StopAndFinalizeCounters}} (so {{StopAndFinalizeCounters}} will write out the contents of {{file_type_counts_}} before all scanners can update it).

{{StopAndFinalizeCounters}} can be called in two places: {{HdfsScanNodeBase::Close}} and in {{HdfsScanNode::GetNext}}. It can be called in {{GetNext}} when {{GetNextInternal}} reads enough rows to cross the query defined limit. So {{GetNext}} will call {{StopAndFinalizeCounters}} once the limit is reached, but not necessarily before the scanners are closed.

I'm able to re-produce this locally by using the queries:
{code:java}
 select * from functional_parquet.lineitem_sixblocks limit 10 {code}
The runtime profile does not include {{File Formats}}
{code:java}
 select * from functional_parquet.lineitem_sixblocks order by l_orderkey limit 10 {code}
The runtime profile does include {{File Formats}}

I tried to simply remove the call to {{StopAndFinalizeCounters}} from {{GetNext}} but that doesn't seem to work. It actually caused several other RP messages to get deleted (not entirely sure why).

  was:
While working on IMPALA-6964, I noticed that sometimes the runtime profile for a {{HDFS_SCAN_NODE}} will include {{File Formats: PARQUET/NONE:2}} and sometimes it won't (depending on the query). However, looking at the code, any scan of Parquet files should include this line.

I debugged the code and there seems to a be a race condition where {{HdfsScanNodeBase::StopAndFinalizeCounters}} can be called before {{HdfsParquetScanner::Close}} is called for all the scan ranges. This causes the {{File Formats}} issue above because {{HdfsParquetScanner::Close}} calls {{HdfsScanNodeBase::RangeComplete}} which updates the shared object {{file_type_counts_}}, which is read in {{StopAndFinalizeCounters}} (so {{StopAndFinalizeCounters}} will write out the contents of {{file_type_counts_}} before all scanners can update it).

{{StopAndFinalizeCounters}} can be called in two places: {{HdfsScanNodeBase::Close}} and in {{HdfsScanNode::GetNext}}. It can be called in {{GetNext}} when {{GetNextInternal}} reads enough rows to cross the query defined limit. So {{GetNext}} will call {{StopAndFinalizeCounters}} once the limit is reached, but not necessarily before the scanners are closed.

I'm able to re-produce this locally by using the queries:
{code:java}
 select * from functional_parquet.lineitem_sixblocks limit 10 {code}
The runtime profile does not include {{File Formats}}
{code:java}
 select * from functional_parquet.lineitem_sixblocks order by l_orderkey limit 10 {code}
The runtime profile does include {{File Formats}} I tried to simply remove the call to {{StopAndFinalizeCounters}} from {{GetNext}} but that doesn't seem to work. It actually caused several other RP messages to get deleted (not entirely sure why).


> Race condition in HdfsScanNodeBase::StopAndFinalizeCounters
> -----------------------------------------------------------
>
>                 Key: IMPALA-7816
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7816
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> While working on IMPALA-6964, I noticed that sometimes the runtime profile for a {{HDFS_SCAN_NODE}} will include {{File Formats: PARQUET/NONE:2}} and sometimes it won't (depending on the query). However, looking at the code, any scan of Parquet files should include this line.
> I debugged the code and there seems to a be a race condition where {{HdfsScanNodeBase::StopAndFinalizeCounters}} can be called before {{HdfsParquetScanner::Close}} is called for all the scan ranges. This causes the {{File Formats}} issue above because {{HdfsParquetScanner::Close}} calls {{HdfsScanNodeBase::RangeComplete}} which updates the shared object {{file_type_counts_}}, which is read in {{StopAndFinalizeCounters}} (so {{StopAndFinalizeCounters}} will write out the contents of {{file_type_counts_}} before all scanners can update it).
> {{StopAndFinalizeCounters}} can be called in two places: {{HdfsScanNodeBase::Close}} and in {{HdfsScanNode::GetNext}}. It can be called in {{GetNext}} when {{GetNextInternal}} reads enough rows to cross the query defined limit. So {{GetNext}} will call {{StopAndFinalizeCounters}} once the limit is reached, but not necessarily before the scanners are closed.
> I'm able to re-produce this locally by using the queries:
> {code:java}
>  select * from functional_parquet.lineitem_sixblocks limit 10 {code}
> The runtime profile does not include {{File Formats}}
> {code:java}
>  select * from functional_parquet.lineitem_sixblocks order by l_orderkey limit 10 {code}
> The runtime profile does include {{File Formats}}
> I tried to simply remove the call to {{StopAndFinalizeCounters}} from {{GetNext}} but that doesn't seem to work. It actually caused several other RP messages to get deleted (not entirely sure why).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org