You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/08 20:08:29 UTC

[GitHub] [arrow-rs] alamb opened a new issue, #3821: Implement `FromStr` for DataType / Parse DataType description

alamb opened a new issue, #3821:
URL: https://github.com/apache/arrow-rs/issues/3821

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   As part of implementing https://github.com/apache/arrow-datafusion/issues/5016 in DataFusion I needed some way to convert from a string passed in by the user to a `DataType`. 
   
   Since we already had a function `arrow_typeof` that provides a useful human readable type name by calling `DataType::to_string()` I wanted the opposite: A way to take the output of `DataType::to_string()` and make a `DataType`
   
   
   **Describe the solution you'd like**
   
   I think having `FromStr` implementation that matches the  https://docs.rs/arrow/34.0.0/arrow/datatypes/enum.DataType.html#impl-Display-for-DataType implementation for DataType would be very nice.
   
   Example usage:
   
   ```rust
   let data_type = DataType:Int32;
   
   // use (existing) Display impl to get a string representaiton
   let data_type_string = data_type.to_string();
   assert_eq!(data_type_string, "Int32");
   
   // use proposed FromStr impl to get the DataType back
   let parsed_datatype: DataType = data_type_string.parse()?
   assert_eq!(parsed_datatype, DataType::Int32);
   ```
   
   **Describe alternatives you've considered**
   @tustvold  pointed out that there is already a way to encode data types in String for the IPC format https://arrow.apache.org/docs/format/CDataInterface.html#data-type-description-format-strings
   
   While this format is (designed to be) easy to parse by computers, I don't think it is easy to parse by Humans (quick quiz, what type does `tdD` represent?)
   
   **Additional context**
   See https://github.com/apache/arrow-datafusion/pull/5166#discussion_r1129692579
   
   
   We can probably lift the implementation from https://github.com/apache/arrow-datafusion/pull/5166 into Arrow-rs 
   
   The implementation in  https://github.com/apache/arrow-datafusion/pull/5166 currently does not cover:
   - [ ] Parsing Timezones in Timestamp types (@waitingkuo  expressed interest in this)
   - [ ] Parsing Struct, Union, Map or List types (though it does handle `Dictionary`)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3821: Implement `FromStr` for DataType / Parse DataType description

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3821:
URL: https://github.com/apache/arrow-rs/issues/3821#issuecomment-1460801134

   I think this is a good first issue because the code and tests already exists, so this ticket would be a matter of porting the code, and adjusting the interface


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Weijun-H commented on issue #3821: Implement `FromStr` for DataType / Parse DataType description

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on issue #3821:
URL: https://github.com/apache/arrow-rs/issues/3821#issuecomment-1463737924

   > > Should we also keep the part like,Int32, Int64 in arrow-rs?
   > 
   > I'm not sure I follow what you mean, but it should be able to parse the output of `DataType: Display`
   
   Sorry for the ambiguity. I mean, should we migrate parts that have been completed in `arrow-datafusion` to `arrow-rs`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3821: Implement `FromStr` for DataType / Parse DataType description

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3821:
URL: https://github.com/apache/arrow-rs/issues/3821#issuecomment-1463751617

   > Sorry for the ambiguity. I mean, should we migrate parts that have been completed in arrow-datafusion to arrow-rs?
   
   that would be my suggestion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Weijun-H commented on issue #3821: Implement `FromStr` for DataType / Parse DataType description

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on issue #3821:
URL: https://github.com/apache/arrow-rs/issues/3821#issuecomment-1462871785

   This is nice extention. I want to take this ticket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Weijun-H commented on issue #3821: Implement `FromStr` for DataType / Parse DataType description

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on issue #3821:
URL: https://github.com/apache/arrow-rs/issues/3821#issuecomment-1462877642

   Should we also keep the part like,`Int32`, `Int64` in arrow-rs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3821: Implement `FromStr` for DataType / Parse DataType description

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3821:
URL: https://github.com/apache/arrow-rs/issues/3821#issuecomment-1463617167

   > Should we also keep the part like,Int32, Int64 in arrow-rs?
   
   I'm not sure I follow what you mean, but it should be able to parse the output of `DataType: Display`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3821: Implement `FromStr` for DataType / Parse DataType description

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3821:
URL: https://github.com/apache/arrow-rs/issues/3821#issuecomment-1463752289

   Maybe we can port the implementation in arrow-datafusion first and then add support for `Timezone`s and struct types as follow on PRs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org