You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 15:32:50 UTC

[GitHub] [arrow-rs] alamb opened a new issue #1299: Improve ergonomics to construct `DictionaryArrays` from `Key` and `Value` arrays

alamb opened a new issue #1299:
URL: https://github.com/apache/arrow-rs/issues/1299


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I am trying to create a dictionary array where I already know its keys and values so that I can ensure that the dictionaries are actually shared or I can avoid rebuilding a dictionary when I already have dictionary encoded data.
   
   Right now if you need to do this, you end up with code that is fairly messy, as you can see in https://github.com/apache/arrow-rs/pull/1263#discussion_r803783498:
   
   ```rust
   
   ```
     fn get_dict_arraydata(
           keys: Buffer,
           key_type: DataType,
           value_data: ArrayData,
       ) -> ArrayData {
           let value_type = value_data.data_type().clone();
           let dict_data_type =
               DataType::Dictionary(Box::new(key_type), Box::new(value_type));
           ArrayData::builder(dict_data_type)
               .len(3)
               .add_buffer(keys)
               .add_child_data(value_data)
               .build()
               .unwrap()
       }
   
       #[test]
       fn test_eq_dyn_dictionary_i8_array() {
           let key_type = DataType::Int8;
           // Construct a value array
           let value_data = ArrayData::builder(DataType::Int8)
               .len(8)
               .add_buffer(Buffer::from(
                   &[10_i8, 11, 12, 13, 14, 15, 16, 17].to_byte_slice(),
               ))
               .build()
               .unwrap();
   
           let keys1 = Buffer::from(&[2_i8, 3, 4].to_byte_slice());
           let keys2 = Buffer::from(&[2_i8, 4, 4].to_byte_slice());
           let dict_array1: DictionaryArray<Int8Type> = Int8DictionaryArray::from(
               get_dict_arraydata(keys1, key_type.clone(), value_data.clone()),
           );
           let dict_array2: DictionaryArray<Int8Type> =
               Int8DictionaryArray::from(get_dict_arraydata(keys2, key_type, value_data));
   
           let result = eq_dyn(&dict_array1, &dict_array2);
           assert!(result.is_ok());
           assert_eq!(result.unwrap(), BooleanArray::from(vec![true, false, true]));
       }
   ```
   
   **Describe the solution you'd like**
   It would be nice to have a way to create a DictionaryArray directly from the key and values
   
   ```
   let dict_array1 = DictionaryArray<Int8Type>::try_new(keys1, values.clone()).unwrap();
   ```
   
   So the entire test would look like
   ```rust
       #[test]
       fn test_eq_dyn_dictionary_i8_array() {
           let key_type = DataType::Int8;
           // Construct a value array
           let value_data = Int8Array::from_iter_values([10_i8, 11, 12, 13, 14, 15, 16, 17]);
   
           let keys1 = Int8Array::from_iter_values([2_i8, 3, 4]);
           let keys2 = Int8Array::from_iter_values([2_i8, 4, 4]);
           let dict_array1 = DictionaryArray<Int8Type>::try_new(keys1, values.clone()).unwrap();
           let dict_array2 = DictionaryArray<Int8Type>::try_new(keys1, values);
   
           let result = eq_dyn(&dict_array1, &dict_array2);
           assert!(result.is_ok());
           assert_eq!(result.unwrap(), BooleanArray::from(vec![true, false, true]));
       }
   ```
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb closed issue #1299: Improve ergonomics to construct `DictionaryArrays` from `Key` and `Value` arrays

Posted by GitBox <gi...@apache.org>.
alamb closed issue #1299:
URL: https://github.com/apache/arrow-rs/issues/1299


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org