You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/11/03 00:37:00 UTC

[jira] [Commented] (IMPALA-12376) DataSourceScanNode drop some returned rows if FLAGS_data_source_batch_size is greater than default value

    [ https://issues.apache.org/jira/browse/IMPALA-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782363#comment-17782363 ] 

ASF subversion and git services commented on IMPALA-12376:
----------------------------------------------------------

Commit a5a99adcd27caf9906a4836a79a3547ce2229905 in impala's branch refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a5a99adcd ]

IMPALA-12376: DataSourceScanNode drop some returned rows

DataSourceScanNode does not handle eos properly in function
DataSourceScanNode::GetNext(). Rows, which are returned from
external data source, could be dropped if data_source_batch_size
is set with value which is greater than default value 1024.

Testing:
 - Added end-to-end test with data_source_batch_size as 2048.
   The test failed without fixing, passed with fixing.
   Also added test with data_source_batch_size as 512.
 - Passed core tests.

Change-Id: I978d0a65faa63a47ec86a0127c0bee8dfb79530b
Reviewed-on: http://gerrit.cloudera.org:8080/20636
Reviewed-by: Abhishek Rawat <ar...@cloudera.com>
Tested-by: Wenzhe Zhou <wz...@cloudera.com>


> DataSourceScanNode drop some returned rows if FLAGS_data_source_batch_size is greater than default value
> --------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12376
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12376
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>             Fix For: Impala 4.4.0
>
>
> Backend DataSourceScanNode (be/src/exec/data-source-scan-node.cc) does not handle eos properly in function DataSourceScanNode::GetNext().  Rows, which are returned from external data source, could be dropped if FLAGS_data_source_batch_size is set with value which is greater than default value 1024.
> In following code: 
>       if (row_batch->AtCapacity() || input_batch_->eos || ReachedLimit()) {
>         *eos = input_batch_->eos || ReachedLimit();
> eos could be set as true when some rows in input batch are not processed if row_batch->AtCapacity() return true. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org