You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 16:57:00 UTC

[GitHub] [arrow-rs] alamb opened a new pull request #1300: Alamb/better dictionary array creation

alamb opened a new pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300


   Draft as it builds on https://github.com/apache/arrow-rs/pull/1263
   
   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-rs/issues/1299
   
   # Rationale for this change
    
   Inspired by the challenge writing tests on https://github.com/apache/arrow-rs/pull/1263 from @viirya
   
   Otherwise known as "no one should have to write the following to work with `DictionaryArrays`":
   
   <details>
     <summary>Click to expand!</summary>
     
   
   ```rust
     fn get_dict_arraydata(
           keys: Buffer,
           key_type: DataType,
           value_data: ArrayData,
       ) -> ArrayData {
           let value_type = value_data.data_type().clone();
           let dict_data_type =
               DataType::Dictionary(Box::new(key_type), Box::new(value_type));
           ArrayData::builder(dict_data_type)
               .len(3)
               .add_buffer(keys)
               .add_child_data(value_data)
               .build()
               .unwrap()
       }
   
       #[test]
       fn test_eq_dyn_dictionary_i8_array() {
           let key_type = DataType::Int8;
           // Construct a value array
           let value_data = ArrayData::builder(DataType::Int8)
               .len(8)
               .add_buffer(Buffer::from(
                   &[10_i8, 11, 12, 13, 14, 15, 16, 17].to_byte_slice(),
               ))
               .build()
               .unwrap();
   
           let keys1 = Buffer::from(&[2_i8, 3, 4].to_byte_slice());
           let keys2 = Buffer::from(&[2_i8, 4, 4].to_byte_slice());
           let dict_array1: DictionaryArray<Int8Type> = Int8DictionaryArray::from(
               get_dict_arraydata(keys1, key_type.clone(), value_data.clone()),
           );
           let dict_array2: DictionaryArray<Int8Type> =
               Int8DictionaryArray::from(get_dict_arraydata(keys2, key_type, value_data));
   
           let result = eq_dyn(&dict_array1, &dict_array2);
           assert!(result.is_ok());
           assert_eq!(result.unwrap(), BooleanArray::from(vec![true, false, true]));
       }
   ```
   </details>
   
   # What changes are included in this PR?
   
   1. Add `DictionaryArray::try_new()` function
   2. Tests
   2. Doc tests
   3. Update tests from https://github.com/apache/arrow-rs/pull/1263 to use the new API
   
   
   
   <!---
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1040744719


   Filed https://github.com/apache/arrow-rs/issues/1313 to track performance improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1040745303


   Thanks for the comments and review @jhorstmann  and @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1037978864


   Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1037978864


   Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#discussion_r803965759



##########
File path: arrow/src/array/array_dictionary.rs
##########
@@ -69,6 +86,27 @@ pub struct DictionaryArray<K: ArrowPrimitiveType> {
 }
 
 impl<'a, K: ArrowPrimitiveType> DictionaryArray<K> {
+    /// Attempt to create a new DictionaryArray with a specified keys
+    /// (dictionary) and values (indexes into that dictionary)

Review comment:
       🤦  yes
   
   ```suggestion
       /// (indexes into the dictionary) and values (dictionary)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1038126396


   > Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.
   
   I agree it is unfortunately
   
   I'll see if I can make a `new_unchecked()` and then also minimize the checking in `try_new()` to only verify that all indexes are within the range of the dictionary, which Ido think is required for this to be a `safe` API. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb merged pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1038126396


   > Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.
   
   I agree it is unfortunately
   
   I'll see if I can make a `new_unchecked()` and then also minimize the checking in `try_new()` to only verify that all indexes are within the range of the dictionary, which Ido think is required for this to be a `safe` API. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] viirya commented on a change in pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#discussion_r803948219



##########
File path: arrow/src/array/array_dictionary.rs
##########
@@ -69,6 +86,27 @@ pub struct DictionaryArray<K: ArrowPrimitiveType> {
 }
 
 impl<'a, K: ArrowPrimitiveType> DictionaryArray<K> {
+    /// Attempt to create a new DictionaryArray with a specified keys
+    /// (dictionary) and values (indexes into that dictionary)

Review comment:
       typo? keys (indexes into the dictionary) and values (dictionary).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org