You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "osawyerr (via GitHub)" <gi...@apache.org> on 2023/05/31 21:29:21 UTC

[GitHub] [arrow-datafusion] osawyerr opened a new issue, #6508: octet_length(char) doesn't behave as expected.

osawyerr opened a new issue, #6508:
URL: https://github.com/apache/arrow-datafusion/issues/6508

   ### Describe the bug
   
   Hi there,
   
   ``select octet_length(char)`` doesn't behave as expected. 
   
   In the example below -  in Postgres (``text`` column) the below SQL would return 25 as the result however in DataFusion (``StringArray`` column ) it just returns the original length of the values in ``n_name``. 
   
   ```sql
   select octet_length(n_name::char(25)) from nation;
   ```
   
   I'm not sure if this is because the cast to a ``char(25)`` is not behaving as expected
   
   ### To Reproduce
   
   create a StringColumn in DF, populate it with some values and run the following SQL on it. 
   
   ```sql
   select octet_length(n_name::char(25)) from nation;
   ```
   
   ### Expected behavior
   
   Should return 25, but it returns the length of each of the values in the column.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6508: octet_length(char) doesn't behave as expected.

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6508:
URL: https://github.com/apache/arrow-datafusion/issues/6508#issuecomment-1572597041

   Thanks for the report @osawyerr 
   
   TLDR is that DataFusion treats `VARCHAR` and `CHAR` the same (because they use the same underlying Arrow type)
   
   🤔  I am not quite sure what to do here. Postgres has the notion of a max width character column `CHAR` but arrow does not . DataFusion maps the SQL type `CHAR` --> arrow type `Utf8` 
   
   Thus I am no sure we were be able to replicate the behavior of postgres in this instance. 
   
   For anyone following along, here is what postgres does
   
   ```sql
   postgres=# create table example(name varchar(20));
   CREATE TABLE
   postgres=# insert into example values ('foo'), ('barrr');
   INSERT 0 2
   postgres=# select octet_length(name) from example;
    octet_length
   --------------
               3
               5
   (2 rows)
   postgres=# select octet_length(name::char(25)) from example;
    octet_length
   --------------
              25
              25
   (2 rows)
   postgres=# select octet_length(name::varchar(25)) from example;
    octet_length
   --------------
               3
               5
   (2 rows)
   
   postgres=#
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] osawyerr commented on issue #6508: octet_length(char) doesn't behave as expected.

Posted by "osawyerr (via GitHub)" <gi...@apache.org>.
osawyerr commented on issue #6508:
URL: https://github.com/apache/arrow-datafusion/issues/6508#issuecomment-1572835492

   @alamb There is also a related point RE how ``char``s are handles in joins and equality in postgres vs. ``varchar``as well. In postgres if I recall, the trailing spaces are ignored when comparing ``char``s but are taken into account when comparing ``varchar``s. 
   
   From your explanation, DataFusion will always take into account the trailing spaces if a "``char``" is used in a join / equality. i.e. it will behave like a postgres ``varchar``.
   
   i.e. 
   ```sql
   -- note the trailing spaces below
   select * from example where name::char(25) = 'foo    '::char(25) 
   ```
   
   Will return no results in DataFusion but in postgres the result will be returned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org