You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Maryann Xue (JIRA)" <ji...@apache.org> on 2015/12/02 21:54:11 UTC
[jira] [Updated] (PHOENIX-2480) SQL Query with multiple projection
selections over multiple tables having LEFT OUTER JOINS returns completely
null for random columns even when data is present
[ https://issues.apache.org/jira/browse/PHOENIX-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maryann Xue updated PHOENIX-2480:
---------------------------------
Attachment: PHOENIX-2480.patch
[~jamestaylor], Could you help review this patch?
The problem is basically about the ValueBitSet.isVarLength() being test to false vs. true between pre-join schema and post-join schema. And it happens only with simultaneous many-way joins, join between two tables is fine even without the fix.
Another change I've made in the patch is to move all index-irrelevant test cases from parameterized HashJoinIT and SortMergeJoinIT into non-parameterized HashJoinMoreIT and SortMergeJoinMoreIT, in order to avoid running the same test cases multiple times.
> SQL Query with multiple projection selections over multiple tables having LEFT OUTER JOINS returns completely null for random columns even when data is present
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-2480
> URL: https://issues.apache.org/jira/browse/PHOENIX-2480
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.4.0
> Environment: Linux CentOS release 6.6/ Hadoop 2.2.0 / Hbase 0.98 / JDK 1.7.0_55 / Apache Phoenix 4.4.0
> Reporter: Vivek K T
> Assignee: Maryann Xue
> Priority: Blocker
> Attachments: PHOENIX-2480.patch
>
>
> Please do the following to reproduce the issue.
> ***********************************************************************************
> Create following test tables :
> ***********************************************************************************
> CREATE TABLE master_businessunit (
> code varchar(255) PRIMARY KEY,
> name varchar(255)
> );
> CREATE TABLE master_company (
> code varchar(255) PRIMARY KEY,
> name varchar(255)
> );
> CREATE TABLE master_costcenter (
> code varchar(255) PRIMARY KEY,
> name varchar(255)
> );
> CREATE TABLE master_location (
> code varchar(255) PRIMARY KEY,
> name varchar(255)
> );
> CREATE TABLE master_product (
> id int(11) PRIMARY KEY,
> product_name varchar(255)
> );
> CREATE TABLE master_purchaseorder (
> purchaseOrderNumber varchar(255),
> companyCode varchar(255) ,
> businessUnitCode varchar(255),
> locationCode varchar(255) ,
> purchaseOrderId varchar(255) PRIMARY KEY,
> releasedOn date ,
> name varchar(255)
> );
> CREATE TABLE trans_purchaseorderitem (
> purchaseOrderItemId varchar(255) PRIMARY KEY,
> purchaseOrderId varchar(255),
> lineNo varchar(255),
> name varchar(255)
> );
> CREATE TABLE trans_purchaseorderitem_costing (
> purchaseorderItem_costing_id varchar(255) primary key,
> purchaseorderItemId varchar(255) ,
> purchaseorderId varchar(255) ,
> costcenterCode varchar(255)
> );
> *************************************************************************************
> Upsert following test values :
> **************************************************************************************
> upsert into master_businessunit(code,name) values ('1','BU1');
> upsert into master_businessunit(code,name) values ('2','BU2');
> upsert into master_company(code,name) values ('1','Company1');
> upsert into master_company(code,name) values ('2','Company2');
> upsert into master_costcenter(code,name) values ('1','CC1');
> upsert into master_costcenter(code,name) values ('2','CC2');
> upsert into master_location(code,name) values ('1','Location1');
> upsert into master_location(code,name) values ('2','Location2');
> upsert into master_product(id,product_name) values (1,'ProductName1');
> upsert into master_product(id,product_name) values (2,'Product2');
> upsert into master_purchaseorder(purchaseOrderNumber,companyCode,businessUnitCode,locationCode,purchaseOrderId,releasedOn,name) values ('1','1','1','1','1','2015-12-01','1');
> upsert into master_purchaseorder(purchaseOrderNumber,companyCode,businessUnitCode,locationCode,purchaseOrderId,releasedOn,name) values ('2','2','2','2','2','2015-12-02','2');
> upsert into trans_purchaseorderitem(purchaseOrderItemId,purchaseOrderId,lineNo,name) values ('1','1','1','1');
> upsert into trans_purchaseorderitem(purchaseOrderItemId,purchaseOrderId,lineNo,name) values ('2','2','2','2');
> upsert into trans_purchaseorderitem_costing(purchaseorderItem_costing_id,purchaseorderItemId,purchaseorderId,costcenterCode) values ('1','1','1','1');
> upsert into trans_purchaseorderitem_costing(purchaseorderItem_costing_id,purchaseorderItemId,purchaseorderId,costcenterCode) values ('2','2','2','2');
> ********************************************************************************************
> Now execute the following query :
> SELECT
> DISTINCT
> COALESCE( a1.name, 'N.A.'),
> COALESCE( a2.name, 'N.A.'),
> COALESCE( a3.name, 'N.A.'),
> COALESCE( a4.purchaseOrderNumber, 'N.A.'),
> COALESCE( a1.name, 'N.A.'),
> COALESCE( a4.name, 'N.A.'),
> COALESCE( a5.lineNo, 'N.A.'),
> COALESCE( a5.name, 'N.A.'),
> COALESCE( a7.name,'N.A.')
> FROM
> (
> master_purchaseorder a4 LEFT OUTER
> JOIN master_company a1 ON a4.companyCode = a1.code LEFT OUTER
> JOIN master_businessunit a2 ON a4.businessUnitCode = a2.code LEFT OUTER
> JOIN master_location a3 ON a4.locationCode = a3.code LEFT OUTER
> JOIN trans_purchaseorderitem a5 ON a5.purchaseOrderId = a4.purchaseOrderId LEFT OUTER
> JOIN trans_purchaseorderitem_costing a6 ON a6.purchaseOrderItemId = a5.purchaseOrderItemId
> AND a6.purchaseOrderId = a5.purchaseOrderId LEFT OUTER
> JOIN master_costcenter a7 ON a6.costCenterCode = a7.code
> )
> ************************************************************************************
> The first three columns are displays 'N.A' in all the rows even though data is present.
> surprisingly they return the actual values when the last three projections are commented out in the select clause.
> For e.g. the following query (same as above, just reduced number of projections) returns proper data
> SELECT
> DISTINCT
> COALESCE( a1.name, 'N.A.'),
> COALESCE( a2.name, 'N.A.'),
> COALESCE( a3.name, 'N.A.'),
> COALESCE( a4.purchaseOrderNumber, 'N.A.'),
> COALESCE( a1.name, 'N.A.'),
> COALESCE( a4.name, 'N.A.')
> FROM
> (
> master_purchaseorder a4 LEFT OUTER
> JOIN master_company a1 ON a4.companyCode = a1.code LEFT OUTER
> JOIN master_businessunit a2 ON a4.businessUnitCode = a2.code LEFT OUTER
> JOIN master_location a3 ON a4.locationCode = a3.code LEFT OUTER
> JOIN trans_purchaseorderitem a5 ON a5.purchaseOrderId = a4.purchaseOrderId LEFT OUTER
> JOIN trans_purchaseorderitem_costing a6 ON a6.purchaseOrderItemId = a5.purchaseOrderItemId
> AND a6.purchaseOrderId = a5.purchaseOrderId LEFT OUTER
> JOIN master_costcenter a7 ON a6.costCenterCode = a7.code
> )
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)