You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/08 04:02:25 UTC

[GitHub] [arrow-rs] sunchao opened a new pull request #1407: Add dictionary support for C data interface

sunchao opened a new pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407


   # Which issue does this PR close?
   
   <!---
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1397.
   
   # Rationale for this change
    
    <!---
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   Currently the Rust implementation of [C data interface](https://arrow.apache.org/docs/format/CDataInterface.html) doesn't support dictionary type yet, which is necessary if we want to pass data between Rust and other languages such as Java/C++.
   
   # What changes are included in this PR?
   
   <!---
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   - Added dictionary support in Rust FFI.
   - Added roundtrip tests within Rust (i.e., Rust -> Rust -> Rust)
   
   I kept the `DictionaryArray` untouched so the dictionary (i.e., values) is still stored as the only child of the `ArrayData` it is associated to.
   
   # Are there any user-facing changes?
   
   Yes, now we can use C data interface with dictionary type. No API change.
   
   <!---
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao commented on pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#issuecomment-1062149939


   We should also be able to pass `is_ordered` flag via the C data interface, but it appears that property is not supported in the Rust implementation yet (perhaps it should be part of `DataType::Dictionary`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#discussion_r822025954



##########
File path: arrow/src/ffi.rs
##########
@@ -1075,4 +1133,33 @@ mod tests {
         // (drop/release)
         Ok(())
     }
+
+    #[test]
+    fn test_dictionary() -> Result<()> {
+        // create an array natively
+        let values = vec!["a", "aaa", "aaa"];
+        let dict_array: DictionaryArray<Int8Type> = values.into_iter().collect();
+
+        // export it
+        let array = ArrowArray::try_from(dict_array.data().clone())?;
+
+        // (simulate consumer) import it
+        let data = ArrayData::try_from(array)?;
+        let array = make_array(data);
+
+        // perform some operation
+        let array = kernels::concat::concat(&[array.as_ref(), array.as_ref()]).unwrap();
+        let actual = array
+            .as_any()
+            .downcast_ref::<DictionaryArray<Int8Type>>()
+            .unwrap();
+
+        // verify
+        let new_values = vec!["a", "aaa", "aaa", "a", "aaa", "aaa"];
+        let expected: DictionaryArray<Int8Type> = new_values.into_iter().collect();
+        assert_eq!(actual, &expected);

Review comment:
       🆗 

##########
File path: arrow-pyarrow-integration-testing/tests/test_sql.py
##########
@@ -122,14 +123,6 @@ def test_type_roundtrip_raises(pyarrow_type):
     with pytest.raises(pa.ArrowException):
         rust.round_trip_type(pyarrow_type)
 
-
-def test_dictionary_type_roundtrip():

Review comment:
       🎉 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#issuecomment-1062223896


   Yea, I will review this PR. Thanks @alamb @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao commented on a change in pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#discussion_r822123240



##########
File path: arrow/src/array/ffi.rs
##########
@@ -127,4 +128,14 @@ mod tests {
         let data = array.data();
         test_round_trip(data)
     }
+
+    #[test]
+    fn test_dictionary() -> Result<()> {

Review comment:
       Good idea. Will add.

##########
File path: arrow/src/ffi.rs
##########
@@ -508,14 +541,20 @@ pub trait ArrowArrayRef {
         let buffers = self.buffers()?;
         let null_bit_buffer = self.null_bit_buffer();
 
-        let child_data = (0..self.array().n_children as usize)
+        let mut child_data: Vec<ArrayData> = (0..self.array().n_children as usize)
             .map(|i| {
                 let child = self.child(i);
                 child.to_data()
             })
             .map(|d| d.unwrap())
             .collect();
 
+        if let Some(d) = self.dictionary() {
+            // For dictionary type there should only be a single child, so we don't need to worry if

Review comment:
       Sure

##########
File path: arrow/src/ffi.rs
##########
@@ -555,18 +594,22 @@ pub trait ArrowArrayRef {
     // for variable-sized buffers, such as the second buffer of a stringArray, we need
     // to fetch offset buffer's len to build the second buffer.
     fn buffer_len(&self, i: usize) -> Result<usize> {
-        // Inner type is not important for buffer length.
-        let data_type = &self.data_type()?;
+        // Special handling for dictionary type as we only care about the key type in the case.
+        let data_type = match &self.data_type()? {
+            DataType::Dictionary(key_data_type, _) => key_data_type.as_ref().clone(),

Review comment:
       Yea we can do this. I need to move `&self.data_type()?` into a separate assignment though since otherwise I'm getting "temporary value dropped while borrowed" error.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#discussion_r822095510



##########
File path: arrow/src/ffi.rs
##########
@@ -508,14 +541,20 @@ pub trait ArrowArrayRef {
         let buffers = self.buffers()?;
         let null_bit_buffer = self.null_bit_buffer();
 
-        let child_data = (0..self.array().n_children as usize)
+        let mut child_data: Vec<ArrayData> = (0..self.array().n_children as usize)
             .map(|i| {
                 let child = self.child(i);
                 child.to_data()
             })
             .map(|d| d.unwrap())
             .collect();
 
+        if let Some(d) = self.dictionary() {
+            // For dictionary type there should only be a single child, so we don't need to worry if

Review comment:
       Should we add assert for that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#issuecomment-1061410453


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1407](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7ed4d37) into [master](https://codecov.io/gh/apache/arrow-rs/commit/1fbe618eb2240d8a8dbd0c03baa8d726d7a45472?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1fbe618) will **increase** coverage by `0.02%`.
   > The diff coverage is `82.60%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/1407/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1407      +/-   ##
   ==========================================
   + Coverage   83.13%   83.16%   +0.02%     
   ==========================================
     Files         182      182              
     Lines       53321    53394      +73     
   ==========================================
   + Hits        44330    44404      +74     
   + Misses       8991     8990       -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/datatypes/ffi.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9mZmkucnM=) | `73.65% <68.75%> (+2.80%)` | :arrow_up: |
   | [arrow/src/ffi.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2ZmaS5ycw==) | `85.78% <90.19%> (+1.16%)` | :arrow_up: |
   | [arrow/src/array/ffi.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2ZmaS5ycw==) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.40% <0.00%> (-0.40%)` | :arrow_down: |
   | [arrow/src/datatypes/field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9maWVsZC5ycw==) | `53.79% <0.00%> (-0.31%)` | :arrow_down: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `65.98% <0.00%> (-0.23%)` | :arrow_down: |
   | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `93.52% <0.00%> (-0.20%)` | :arrow_down: |
   | [arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2FycmF5X2JpbmFyeS5ycw==) | `93.28% <0.00%> (+0.05%)` | :arrow_up: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `86.43% <0.00%> (+0.11%)` | :arrow_up: |
   | [arrow/src/array/equal/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL21vZC5ycw==) | `93.24% <0.00%> (+0.33%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [1fbe618...7ed4d37](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#issuecomment-1062268795


   > Yes, now we can use C data interface with dictionary type. No API change.
   
   `FFI_ArrowSchema.try_new` is a public function, so I wonder this should be an API change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#issuecomment-1061410453


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1407](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b86a377) into [master](https://codecov.io/gh/apache/arrow-rs/commit/1fbe618eb2240d8a8dbd0c03baa8d726d7a45472?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1fbe618) will **increase** coverage by `0.02%`.
   > The diff coverage is `80.95%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/1407/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1407      +/-   ##
   ==========================================
   + Coverage   83.13%   83.16%   +0.02%     
   ==========================================
     Files         182      182              
     Lines       53321    53382      +61     
   ==========================================
   + Hits        44330    44394      +64     
   + Misses       8991     8988       -3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/datatypes/ffi.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9mZmkucnM=) | `73.65% <68.75%> (+2.80%)` | :arrow_up: |
   | [arrow/src/ffi.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2ZmaS5ycw==) | `85.71% <90.19%> (+1.09%)` | :arrow_up: |
   | [arrow/src/array/ffi.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2ZmaS5ycw==) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/datatypes/field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9maWVsZC5ycw==) | `53.79% <0.00%> (-0.31%)` | :arrow_down: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `65.98% <0.00%> (-0.23%)` | :arrow_down: |
   | [arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2FycmF5X2JpbmFyeS5ycw==) | `93.28% <0.00%> (+0.05%)` | :arrow_up: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `86.43% <0.00%> (+0.11%)` | :arrow_up: |
   | [arrow/src/array/equal/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1407/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL21vZC5ycw==) | `93.24% <0.00%> (+0.33%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [1fbe618...b86a377](https://codecov.io/gh/apache/arrow-rs/pull/1407?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao commented on pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#issuecomment-1062326817


   > `FFI_ArrowSchema.try_new` is a public function, so I wonder this should be an API change?
   
   Hmm you are right. Looks like `try_new` doesn't have to be public though. I think people should just use `ArrowArray` directly.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on a change in pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#discussion_r822023775



##########
File path: arrow/src/ffi.rs
##########
@@ -555,18 +594,22 @@ pub trait ArrowArrayRef {
     // for variable-sized buffers, such as the second buffer of a stringArray, we need
     // to fetch offset buffer's len to build the second buffer.
     fn buffer_len(&self, i: usize) -> Result<usize> {
-        // Inner type is not important for buffer length.
-        let data_type = &self.data_type()?;
+        // Special handling for dictionary type as we only care about the key type in the case.
+        let data_type = match &self.data_type()? {
+            DataType::Dictionary(key_data_type, _) => key_data_type.as_ref().clone(),

Review comment:
       maybe it would be possible to not clone here by returning a reference?

##########
File path: arrow/src/array/ffi.rs
##########
@@ -127,4 +128,14 @@ mod tests {
         let data = array.data();
         test_round_trip(data)
     }
+
+    #[test]
+    fn test_dictionary() -> Result<()> {

Review comment:
       would it be worth to have an example with validity (in both the keys and values), so that we cover the most complex use-case?

##########
File path: arrow-pyarrow-integration-testing/tests/test_sql.py
##########
@@ -263,3 +256,13 @@ def test_decimal_python():
     assert a == b
     del a
     del b
+
+def test_dictionary_python():
+    """
+    Python -> Rust -> Python
+    """
+    a = pa.array(["a", "b", "a"], type=pa.dictionary(pa.int8(), pa.string()))

Review comment:
       same here - with validities?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao commented on pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#issuecomment-1062143289


   cc @alamb @jorgecarleitao @kszucs could you help to review this? thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao commented on pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#issuecomment-1062628486


   Merged, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao commented on pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407#issuecomment-1062444657


   Thanks @jorgecarleitao @alamb @viirya ! I've addressed your comments. Let me know if you have more feedback. Otherwise I'll merge this tonight.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao merged pull request #1407: Add dictionary support for C data interface

Posted by GitBox <gi...@apache.org>.
sunchao merged pull request #1407:
URL: https://github.com/apache/arrow-rs/pull/1407


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org