You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/27 20:23:06 UTC

[GitHub] [arrow-datafusion] comphead opened a new issue, #3990: [DISCUSSION] Naming convention for non-aliased columns

comphead opened a new issue, #3990:
URL: https://github.com/apache/arrow-datafusion/issues/3990

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   Related to #3882  #3722 
   As of now, not aliased column names gets their name generated using function, arguments and other information ending up with not user-friendly aliases like `btrim(test.a,Utf8(\"ab\"))`
   
   
   **Describe the solution you'd like**
   Discuss the naming convention for non-aliased columns.
   As an example we can consider 
   - Postgres based
   ```
   select trim("123") 
   trim |
   -----+
   123  |
   ```
   
   - Trino based
   ```
   select t.*, 'col_again' from ( select 'col', 2 a, count(1) ) t
   
   _col0|a|_col2|_col3    |
   -----+-+-----+---------+
   col  |2|    1|col_again|
   ```
   - Or any other solutions contributors considers as better solutions
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on issue #3990: [DISCUSSION] Naming convention for non-aliased columns

Posted by GitBox <gi...@apache.org>.

comphead commented on issue #3990:
URL: https://github.com/apache/arrow-datafusion/issues/3990#issuecomment-1369341752

   Spark has another naming convention
   ```
   scala> spark.sql("select 1, 2, 1, 2, 1 + 1, (select 1), (select 2), sum(1), sum(1) over (), current_date(), current_date()").show(false)
   
   +---+---+---+---+-------+----------------+----------------+------+----------------------------------------------------------------------+--------------+--------------+
   |1  |2  |1  |2  |(1 + 1)|scalarsubquery()|scalarsubquery()|sum(1)|sum(1) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)|current_date()|current_date()|
   +---+---+---+---+-------+----------------+----------------+------+----------------------------------------------------------------------+--------------+--------------+
   |1  |2  |1  |2  |2      |1               |2               |1     |1                                                                     |2023-01-02    |2023-01-02    |
   +---+---+---+---+-------+----------------+----------------+------+----------------------------------------------------------------------+--------------+--------------+
   ```
   
   it uses scalar values, or function names from plan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on issue #3990: [DISCUSSION] Naming convention for non-aliased columns

Posted by GitBox <gi...@apache.org>.

comphead commented on issue #3990:
URL: https://github.com/apache/arrow-datafusion/issues/3990#issuecomment-1370146262

   > I tried making datafusion consistent with postgres in this regard a while back, and I ran into a problem. In the PG example above, the result of the query has two columns named "sum". DataFusion does not allow a result set to have two columns with the same name. Perhaps we need to allow this and only throw an error if attempting to reference a column that is ambiguous.
   
   Yes, DF checks the column uniqueness in logical plan builder in `validate_unique_names`. I agree we can try to move the check to later phase when outer query references an ambiguous column


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #3990: [DISCUSSION] Naming convention for non-aliased columns

Posted by GitBox <gi...@apache.org>.

alamb commented on issue #3990:
URL: https://github.com/apache/arrow-datafusion/issues/3990#issuecomment-1294993844

   Thanks @comphead  -- I don't have any strong opinions on this matter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on issue #3990: [DISCUSSION] Naming convention for non-aliased columns

Posted by GitBox <gi...@apache.org>.

comphead commented on issue #3990:
URL: https://github.com/apache/arrow-datafusion/issues/3990#issuecomment-1369336262

   **PG query**
   
       select 1, 
       	   2, 
              1 + 1,
              sum(1), 
              sum(2), 
              cast(1 as numeric), 
              cast(1 as varchar), 
              now(), 
              current_time, 
              sum(3) over (),
              (select now()),
              (select 1+1);
   
   | ?column? | ?column? | ?column? | sum | sum | numeric | varchar | now                      | current_time       | sum | now                      | ?column? |
   | -------- | -------- | -------- | --- | --- | ------- | ------- | ------------------------ | ------------------ | --- | ------------------------ | -------- |
   | 1        | 2        | 2        | 1   | 2   | 1       | 1       | 2023-01-03T02:28:35.325Z | 02:28:35.325773+00 | 3   | 2023-01-03T02:28:35.325Z | 2        |
   
   ---
   
   @alamb here is some investigation on how pg names unaliased columns. We can follow this approach


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] andygrove commented on issue #3990: [DISCUSSION] Naming convention for non-aliased columns

Posted by GitBox <gi...@apache.org>.

andygrove commented on issue #3990:
URL: https://github.com/apache/arrow-datafusion/issues/3990#issuecomment-1370083470

   I am +1 for having simpler names.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on issue #3990: [DISCUSSION] Naming convention for non-aliased columns

Posted by GitBox <gi...@apache.org>.

comphead commented on issue #3990:
URL: https://github.com/apache/arrow-datafusion/issues/3990#issuecomment-1294024492

   @andygrove @alamb @Dandandan created a separate ticket on naming convention discussion as you already familiar with this as part of #3882 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] andygrove commented on issue #3990: [DISCUSSION] Naming convention for non-aliased columns

Posted by GitBox <gi...@apache.org>.

andygrove commented on issue #3990:
URL: https://github.com/apache/arrow-datafusion/issues/3990#issuecomment-1370083263

   I tried making datafusion consistent with postgres in this regard a while back, and I ran into a problem. In the PG example above, the result of the query has two columns named "sum". DataFusion does not allow a result set to have two columns with the same name. Perhaps we need to allow this and only throw an error if attempting to reference a column that is ambiguous.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on issue #3990: [DISCUSSION] Naming convention for non-aliased columns

Posted by GitBox <gi...@apache.org>.

comphead commented on issue #3990:
URL: https://github.com/apache/arrow-datafusion/issues/3990#issuecomment-1295166118

   thanks @alamb so you happy for both, lets wait @andygrove or @Dandandan to share their thoughts. This is a breaking change and will be good to decide once


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org