You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "James Turton (Jira)" <ji...@apache.org> on 2022/11/02 15:29:00 UTC

[jira] [Updated] (DRILL-8331) UNION ALL for multiple sheets in excel giving unexpected result when used with a subquery in not in

     [ https://issues.apache.org/jira/browse/DRILL-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Turton updated DRILL-8331:
--------------------------------
    Priority: Major  (was: Critical)

> UNION ALL for multiple sheets in excel giving unexpected result when used with a subquery in not in
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8331
>                 URL: https://issues.apache.org/jira/browse/DRILL-8331
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Excel
>    Affects Versions: 1.20.2
>            Reporter: Anshuman Mishra
>            Priority: Major
>         Attachments: TestingMerge.xlsx
>
>
> Hi All,
> I am having a query regarding a query I am firing up in drill and not getting the expected result set. Basically what I am doing is merging the data from three sheets and eliminating only those records present in the 4th sheet.
> The query:
> {quote}{_}select * from (SELECT EmpId,Test FROM                         TABLE(dfs.{_}{{{}_[C://TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type => 'excel', sheetName        =>'Sheet1')) UNION ALL SELECT EmpId,Test FROM  TABLE(dfs.{_}{{{}_[C://TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type => 'excel', sheetName =>'Sheet2')) UNION ALL SELECT EmpId,Test FROM  TABLE(dfs.{_}{{{}_[C://TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type => 'excel', sheetName =>'Sheet3'))) where EmpID not in (select EmpID from TABLE(dfs.{_}{{{}_[C://TestingMerge.xlsx|file:///C://ATOP//all_files//XLS_XLSX//TestingMerge.xlsx]_{}}}{_}(type => 'excel', sheetName    =>'Sheet4'))){_}
> {quote}
> The query when we execute it, returns no data, when it should just eliminate the data present in sheet4 from the excel sheet. Strangely enough when we do a union All on two sheets this query works perfectly fine and returns the expected result. One more finding is, if we use static values in 'not in' instead of 'subquery', the query works fine with all the 3 sheets as union all. I even gave a null check in the subquery used in 'not in' as it might consider null values as well, still no luck.
> NOTE: Instead of UNION ALL if UNION is applied then it is giving the expected result set.
> I have attached a sample file I am using to recreate the issue. [^TestingMerge.xlsx]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)