You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/02/26 04:11:00 UTC

[jira] [Assigned] (DRILL-5826) UnorderedReceiverBatch fails to detect a schema change within a map

     [ https://issues.apache.org/jira/browse/DRILL-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers reassigned DRILL-5826:
----------------------------------

    Assignee:     (was: Paul Rogers)

> UnorderedReceiverBatch fails to detect a schema change within a map
> -------------------------------------------------------------------
>
>                 Key: DRILL-5826
>                 URL: https://issues.apache.org/jira/browse/DRILL-5826
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> Run the following HBase query using:
> {code}
> select * from `hbase`.browser_action2 a
> {code}
> Table is defined as:
> {code}
> > create 'browser_action2', 'v', {SPLITS => ['0','1','2','3','4','5','6','7','8','9']}
> ...
> > scan 'browser_action2'
> ROW                                   COLUMN+CELL                                                                                               
>  1                                    column=v:e0, timestamp=1506560555979, value=abc1                                                          
>  2                                    column=v:e0, timestamp=1506560564807, value=abc2
> {code}
> Step through the {{UnorderedReceiverBatch}} with a parallelization of 1. Observe the following (behavior is random):
> * The first batch has schema (row_key, v) where v is an empty map (corresponding to a column family), but no data (zero rows.)
> * Because the first batch has columns, it is sent downstream with {{OK_NEW_SCHEMA}}.
> * The second batch has schema (row_key, v{e0}), where v is a map with column e0 (corresponding to a column family with one column) and one row.
> * The code loads the batch, asking the batch itself if it has a new schema.
> * The batch does not have a new schema so returns false.
> * The {{UnorderedReceiverBatch}} returns {OK}, indicating to the downstream operator that the second batch has the same schema as the first (which, in this case, turns out to not be true.)
> Code in question:
> {code}
>       final boolean schemaChanged = batchLoader.load(rbd, batch.getBody());
> {code}
> In point of fact, each sender has no visibility to the schema of other senders, and the order of receiving batches is undefined. Therefore, an input batch has no way of knowing if it has the same schema as the previous output batch.
> The obvious, correct, logic is to compare the incoming batch schema with the current receiver schema, and send {{OK}} or {{OK_NEW_SCHEMA}} based on the result of that comparison.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)