You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "James Turton (Jira)" <ji...@apache.org> on 2022/06/02 10:37:00 UTC

[jira] [Updated] (DRILL-8182) Excel format plugin sheet scan overwriting bug

     [ https://issues.apache.org/jira/browse/DRILL-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Turton updated DRILL-8182:
--------------------------------
    Description: 
When a query creates multiple scans against a workbook, targeting different sheets using TABLE functions then the resulting datasets appear to get mixed with one overwriting the other.  To reproduce, run the following query against the attachment and note that the value returned from the Products sheet is a name from the Customers sheet.

 
{code:java}
with cust as (
    select * from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type => 'excel', sheetName => 'Customers'))
),
prod as (
    select * from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type => 'excel', sheetName => 'Products'))
)
select * from cust join prod on cust.Id = prod.Id;
{code}

  was:
When a query creates multiple scans against a workbook, targeting different sheets using TABLE functions then the resulting datasets appear to get mixed with one overwriting the other.  To reproduce, run the following query against the attachment and note that the value returned from the Products sheet is a name from the Customers sheet.

 
{code:java}
SELECT 
    c.Name AS Customer, p.Name AS Product, o.Quantity
FROM
    TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type => 'excel', sheetName => 'Orders')) o
INNER JOIN
    TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type => 'excel', sheetName => 'Customers')) c ON o.Customer_Id = c.Id
INNER JOIN
    TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type => 'excel', sheetName => 'Products')) p ON o.Product_Id = p.Id
WHERE 
    o.Id = 5;{code}


> Excel format plugin sheet scan overwriting bug
> ----------------------------------------------
>
>                 Key: DRILL-8182
>                 URL: https://issues.apache.org/jira/browse/DRILL-8182
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Other
>    Affects Versions: 1.20.0
>            Reporter: James Turton
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: Products_Customers_Orders.xlsx
>
>
> When a query creates multiple scans against a workbook, targeting different sheets using TABLE functions then the resulting datasets appear to get mixed with one overwriting the other.  To reproduce, run the following query against the attachment and note that the value returned from the Products sheet is a name from the Customers sheet.
>  
> {code:java}
> with cust as (
>     select * from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type => 'excel', sheetName => 'Customers'))
> ),
> prod as (
>     select * from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type => 'excel', sheetName => 'Products'))
> )
> select * from cust join prod on cust.Id = prod.Id;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)