You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Jinfeng Ni (JIRA)" <ji...@apache.org> on 2017/05/05 23:18:04 UTC

[jira] [Created] (DRILL-5480) Empty batch returning from HBase may cause SchemChangeException or incorrect query result

Jinfeng Ni created DRILL-5480:
---------------------------------

             Summary: Empty batch returning from HBase may cause SchemChangeException or incorrect query result
                 Key: DRILL-5480
                 URL: https://issues.apache.org/jira/browse/DRILL-5480
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni


The following repo was provided by [~haozhu].

1. Create a Hbase table with 4 regions
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
{code}

One region has cf1.col1.  One region has column family 'cf1', but does not have 'col1' under 'cf1'. One region has only column family 'cf2'. And last region is complete empty.

2. Prepare a csv file.
{code}
select * from dfs.tmp.`joinhbase.csv`;
+-------------------+
|      columns      |
+-------------------+
| ["1","somedata"]  |
| ["2","somedata"]  |
| ["3","somedata"]  |
{code}

Now run the following query on drill 1.11.0-SNAPSHOT:

{code}
select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 'UTF8') as col1
from 
hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C
ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1]
;
{code}

The correct query result show be:
{code}
+---------+-----------+
| keyCol  |   col1    |
+---------+-----------+
| a       | somedata  |
| a       | somedata  |
| a       | somedata  |
+---------+-----------+
{code}

Turn off broadcast join, then we will see SchemaChangeException, or incorrect result randomly.

{code}
alter session set `planner.enable_broadcast_join`=false;
{code}

{code}

select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 'UTF8') as col1
. . . . . . . . . . . . . . . . . .> from
. . . . . . . . . . . . . . . . . .> hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C
. . . . . . . . . . . . . . . . . .> ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1]
. . . . . . . . . . . . . . . . . .> ;
Error: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema changes
{code}

{code}

+---------+-------+
| keyCol  | col1  |
+---------+-------+
+---------+-------+
No rows selected (0.302 seconds)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)