You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "waynexia (via GitHub)" <gi...@apache.org> on 2023/02/03 07:22:11 UTC

[GitHub] [arrow-datafusion] waynexia opened a new issue, #5169: Substrait: Add support for unsigned primitive and literal

waynexia opened a new issue, #5169:
URL: https://github.com/apache/arrow-datafusion/issues/5169

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   Extend substrait's literal type to unsigned variants
   
   **Describe the solution you'd like**
   
   The literal types in the following two systems are not aligned:
   
    -  Substrait: https://docs.rs/substrait/latest/substrait/protobuf/expression/literal/enum.LiteralType.html
    -  Arrow: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html
   
   One gap is unsigned primitives are not in Substrait's definition. In https://github.com/substrait-io/substrait/discussions/2#discussioncomment-1297609 they were cataloged as "third party extension defined types". But it's widely used in Arrow and DataFusion. I'd like to discuss how they would be integrated here.
   
   For types we can use type extension, as [I did](https://github.com/GreptimeTeam/greptimedb/blob/8959dbcef83507ccd76aaaffd2b44cab6426e68f/src/common/substrait/src/types.rs) in greptimedb , I occupy the "1" variations for those types (I8, I16, I32, I64) and translate them to the unsigned version. I think we can do it this way here.
   
           Kind::I8(desc) => substrait_kind!(desc, int8_datatype, uint8_datatype),
           Kind::I16(desc) => substrait_kind!(desc, int16_datatype, uint16_datatype),
           Kind::I32(desc) => substrait_kind!(desc, int32_datatype, uint32_datatype),
           Kind::I64(desc) => substrait_kind!(desc, int64_datatype, uint64_datatype),
   
   For literals we can do it similarly, but one difference is they also ship the data values. I prepare to [transmute](https://doc.rust-lang.org/std/mem/fn.transmute.html) between the signed and unsigned values. But not sure if this is the best way we can achieve it. As I investigated, duckdb also doesn't cover [this part](https://github.com/duckdblabs/substrait/blob/a08fe49de78926a3112daddcb64d45a12e43950a/substrait/from_substrait.cpp#L94-L102)
   
   ```
     case substrait::Expression_Literal::LiteralTypeCase::kI8:
       dval = Value::TINYINT(slit.i8());
       break;
     case substrait::Expression_Literal::LiteralTypeCase::kI32:
       dval = Value::INTEGER(slit.i32());
       break;
     case substrait::Expression_Literal::LiteralTypeCase::kI64:
       dval = Value::BIGINT(slit.i64());
       break;
   ```
   **Describe alternatives you've considered**
   N/A
   
   **Additional context**
   This issue is originally posted in https://github.com/datafusion-contrib/datafusion-substrait/issues/41
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #5169: Substrait: Add support for unsigned primitive and literal

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #5169: Substrait: Add support for unsigned primitive and literal
URL: https://github.com/apache/arrow-datafusion/issues/5169


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org