You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Csaba Ringhofer (Jira)" <ji...@apache.org> on 2020/03/30 20:52:00 UTC

[jira] [Updated] (IMPALA-9575) Add basic BINARY support

     [ https://issues.apache.org/jira/browse/IMPALA-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Csaba Ringhofer updated IMPALA-9575:
------------------------------------
    Description: 
An initial testable implementation of BINARY would contain the following:
- DDL support for BINARY, e.g. create table
- read support from text file (stored with base64 encoding)
- basic client support (hs2, beeswax)
- cast from/to STRING
- basic operators (=,<,>), all should work the same way as for STRING

Optional in the first step:
- write support for text file
- joins on BINARY columns
- aggregates on BINARY columns

Hive also allows binary columns for partitioning, but it seems buggy (HIVE-12680) and I would prefer to avoid it in Impala. 

The last time a new type (DATE) was added in Impala was a massive change:
https://gerrit.cloudera.org/#/c/12481/

I hope that BINARY will be much simpler, as:
- It should be handled by the backend exactly the same way as STRING, which can mean that the backend work will be minimal (only the file readers/writers have to differentiate between them). This is different in Hive, where STRING is treated UTF-8, and binary is not. 
- The frontend should also treat it similarly to STRING, just with much less capabilities, e.g. no casts to other types than STRING and it shouldn't be accepted by UDFs that expect STRING.
- As BINARY supports very few features, tests also need to cover much less cases.

  was:
An initial testable implementation of BINARY would contain the following:
- DDL support for BINARY, e.g. create table
- read support from text file (stored with base64 encoding)
- basic client support (hs2, beeswax)
- cast from/to STRING
- basic operators (=,<,>), all should work the same way as for STRING

Optional in the first step:
- write support for text file
- joins on BINARY columns
- aggregates on BINARY columns

Hive also allows binary columns for partitioning, but it seems buggy and I would prefer to avoid it in Impala. 

The last time a new type (DATE) was added in Impala was a massive change:
https://gerrit.cloudera.org/#/c/12481/

I hope that BINARY will be much simpler, as:
- It should be handled by the backend exactly the same way as STRING, which can mean that the backend work will be minimal (only the file readers/writers have to differentiate between them). This is different in Hive, where STRING is treated UTF-8, and binary is not. 
- The frontend should also treat it similarly to STRING, just with much less capabilities, e.g. no casts to other types than STRING and it shouldn't be accepted by UDFs that expect STRING.
- As BINARY supports very few features, tests also need to cover much less cases.


> Add basic BINARY support
> ------------------------
>
>                 Key: IMPALA-9575
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9575
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> An initial testable implementation of BINARY would contain the following:
> - DDL support for BINARY, e.g. create table
> - read support from text file (stored with base64 encoding)
> - basic client support (hs2, beeswax)
> - cast from/to STRING
> - basic operators (=,<,>), all should work the same way as for STRING
> Optional in the first step:
> - write support for text file
> - joins on BINARY columns
> - aggregates on BINARY columns
> Hive also allows binary columns for partitioning, but it seems buggy (HIVE-12680) and I would prefer to avoid it in Impala. 
> The last time a new type (DATE) was added in Impala was a massive change:
> https://gerrit.cloudera.org/#/c/12481/
> I hope that BINARY will be much simpler, as:
> - It should be handled by the backend exactly the same way as STRING, which can mean that the backend work will be minimal (only the file readers/writers have to differentiate between them). This is different in Hive, where STRING is treated UTF-8, and binary is not. 
> - The frontend should also treat it similarly to STRING, just with much less capabilities, e.g. no casts to other types than STRING and it shouldn't be accepted by UDFs that expect STRING.
> - As BINARY supports very few features, tests also need to cover much less cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org