You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "berkaysynnada (via GitHub)" <gi...@apache.org> on 2023/06/05 10:10:15 UTC

[GitHub] [arrow-datafusion] berkaysynnada opened a new issue, #6543: Support columns having the same alias

berkaysynnada opened a new issue, #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543

   ### Describe the bug
   
   When we give the same aliases for multiple columns (`SELECT ts as c1, inc_col as c1 FROM annotated_data_infinite`), builder gives such an error: 
   `Plan("Projections require unique expression names but the expression \"annotated_data_infinite.ts AS c1\" at position 0 and \"annotated_data_infinite.inc_col AS c1\" at position 1 have the same name. Consider aliasing (\"AS\") one of them.")`
   
   Postgre can handle it and gives result with two columns having the same name. I don't know this is an intentional behaviour in Datafusion or a bug, but I would like to open an issue.
   
   ### To Reproduce
   
   ```
           ctx.sql(
               "CREATE EXTERNAL TABLE annotated_data_infinite (
                 ts INTEGER,
                 inc_col INTEGER,
                 desc_col INTEGER,
               )
               STORED AS CSV
               WITH HEADER ROW
               WITH ORDER (ts ASC)
               LOCATION '/Users/berkaysahin/Desktop/arrow-datafusion/datafusion/core/tests/data/window_1.csv'",
           )
           .await?;
           let sql = "SELECT ts as c1, inc_col as c1 FROM annotated_data_infinite";
           let dataframe = ctx.sql(sql).await.expect(&msg);
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #6543: Support columns having the same alias

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1605667382

   that is good idea btw, currently we got bunch of issues
   - Arrow schema uniqueness violation
   - DFSchema uniqueness violation
   this solution can potentially address both issues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Support columns having the same alias [arrow-datafusion]

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-2040081825

   Yes,I  was thinking the other day to allow query like that and check uniqueness from outer queries only


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #6543: Support columns having the same alias

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1605836762

   This is a legacy issue.
   
   Generally, we won't raise an error for having columns with the same name unless an outer subquery references that column name.
   
   In terms of this issue itself, we should fix it in the planner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Support columns having the same alias [arrow-datafusion]

Posted by "tv42 (via GitHub)" <gi...@apache.org>.
tv42 commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-2059925046

   @alamb I don't think you can directly run SQLite's test suite against just datafusion, there's a lot of `CREATE INDEX` etc going on.
   
   I have an early-state OLTP database project using datafusion and *that* survives a decent fraction of sqllogictest. This issue is one of the remaining big limitations, along with `AVG(DISTINCT)` https://github.com/apache/arrow-datafusion/issues/2408 -- I've been filing bugs on test cases that failed due to datafusion, and they've largely been fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Support columns having the same alias [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1938553985

   I wonder if someone has time to file a ticket with the idea to re(use) sqlite's sqllogictest suite?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6543: Support columns having the same alias

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1605444127

   I also filed https://github.com/apache/arrow-datafusion/issues/6758 to think about the problem with large column names.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6543: Support columns having the same alias

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1608132397

   🤯  I didn't realize this worked
   
   ```sql
   (arrow_dev) alamb@MacBook-Pro-8:~/Software/influxdb_iox2$ datafusion-cli
   DataFusion CLI v26.0.0
   ❯ create table foo (x int) as values (1), (2), (3);
   0 rows in set. Query took 0.003 seconds.
   ❯ select x as "my_col", x as "my col" from foo;
   +--------+--------+
   | my_col | my col |
   +--------+--------+
   | 1      | 1      |
   | 2      | 2      |
   | 3      | 3      |
   +--------+--------+
   3 rows in set. Query took 0.005 seconds.
   ❯ select x as "my_col", x+1 as "my col" from foo;
   +--------+--------+
   | my_col | my col |
   +--------+--------+
   | 1      | 2      |
   | 2      | 3      |
   | 3      | 4      |
   ```
   
   However, using `c1` as the alias for some reason fails:
   
   ```sql
   ❯ select x as c1, x as c1 from foo;
   Error during planning: Projections require unique expression names but the expression "foo.x AS c1" at position 0 and "foo.x AS c1" at position 1 have the same name. Consider aliasing ("AS") one of them.+--------+--------+
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6543: Support columns having the same alias

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1634897525

   > first test didn't fail because it has different aliases
   
   Oh man, 🤦  -- I missed the `_` and ` `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Support columns having the same alias [arrow-datafusion]

Posted by "tv42 (via GitHub)" <gi...@apache.org>.
tv42 commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1937990765

   An extra motivation to get this right is that sqlite's sqllogictest suite has a *lot* of these. You'll get better test coverage if you support this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on issue #6543: Support columns having the same alias

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1578083223

   > Thanks @berkaysynnada for raising. This is quite old problem,DF has unique column name check in the planner. We planned to move this check one level upper, so the query will fail if outer query references the inner query containing duplicated aliases. Fir you scenario is it real world one?
   
   Not actually, it was a hypothetical trial. Your plan makes sense and thanks for letting me know about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #6543: Support columns having the same alias

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1633142650

   > 🤯 I didn't realize this worked
   > 
   > ```sql
   > (arrow_dev) alamb@MacBook-Pro-8:~/Software/influxdb_iox2$ datafusion-cli
   > DataFusion CLI v26.0.0
   > ❯ create table foo (x int) as values (1), (2), (3);
   > 0 rows in set. Query took 0.003 seconds.
   > ❯ select x as "my_col", x as "my col" from foo;
   > +--------+--------+
   > | my_col | my col |
   > +--------+--------+
   > | 1      | 1      |
   > | 2      | 2      |
   > | 3      | 3      |
   > +--------+--------+
   > 3 rows in set. Query took 0.005 seconds.
   > ❯ select x as "my_col", x+1 as "my col" from foo;
   > +--------+--------+
   > | my_col | my col |
   > +--------+--------+
   > | 1      | 2      |
   > | 2      | 3      |
   > | 3      | 4      |
   > ```
   > 
   > However, using `c1` as the alias for some reason fails:
   > 
   > ```sql
   > ❯ select x as c1, x as c1 from foo;
   > Error during planning: Projections require unique expression names but the expression "foo.x AS c1" at position 0 and "foo.x AS c1" at position 1 have the same name. Consider aliasing ("AS") one of them.+--------+--------+
   > ```
   
   its failed as DF has projection uniqueness column name check.
   first test didn't fail because it has different aliases


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #6543: Support columns having the same alias

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1577867322

   Thanks @berkaysynnada for raising. This is quite old problem,DF has unique column name check in the planner.  We planned to move this check one level upper, so the query will fail if outer query references the inner query containing duplicated aliases. Fir you scenario is it real world one? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6543: Support columns having the same alias

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-1605444526

   I wonder if some potential solution for this issue would be to automatically add a string to make the columns unique in the arrow schema? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Support columns having the same alias [arrow-datafusion]

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.
Jefffrey commented on issue #6543:
URL: https://github.com/apache/arrow-datafusion/issues/6543#issuecomment-2039347382

   Another case is when selecting same value literals
   
   ```sql
   DataFusion CLI v37.0.0
   ❯ select 1, 1;
   Error during planning: Projections require unique expression names but the expression "Int64(1)" at position 0 and "Int64(1)" at position 1 have the same name. Consider aliasing ("AS") one of them.
   ❯
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org