You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 16:57:00 UTC
[GitHub] [arrow-rs] alamb opened a new pull request #1300: Alamb/better dictionary array creation
alamb opened a new pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300
Draft as it builds on https://github.com/apache/arrow-rs/pull/1263
# Which issue does this PR close?
Closes https://github.com/apache/arrow-rs/issues/1299
# Rationale for this change
Inspired by the challenge writing tests on https://github.com/apache/arrow-rs/pull/1263 from @viirya
Otherwise known as "no one should have to write the following to work with `DictionaryArrays`":
<details>
<summary>Click to expand!</summary>
```rust
fn get_dict_arraydata(
keys: Buffer,
key_type: DataType,
value_data: ArrayData,
) -> ArrayData {
let value_type = value_data.data_type().clone();
let dict_data_type =
DataType::Dictionary(Box::new(key_type), Box::new(value_type));
ArrayData::builder(dict_data_type)
.len(3)
.add_buffer(keys)
.add_child_data(value_data)
.build()
.unwrap()
}
#[test]
fn test_eq_dyn_dictionary_i8_array() {
let key_type = DataType::Int8;
// Construct a value array
let value_data = ArrayData::builder(DataType::Int8)
.len(8)
.add_buffer(Buffer::from(
&[10_i8, 11, 12, 13, 14, 15, 16, 17].to_byte_slice(),
))
.build()
.unwrap();
let keys1 = Buffer::from(&[2_i8, 3, 4].to_byte_slice());
let keys2 = Buffer::from(&[2_i8, 4, 4].to_byte_slice());
let dict_array1: DictionaryArray<Int8Type> = Int8DictionaryArray::from(
get_dict_arraydata(keys1, key_type.clone(), value_data.clone()),
);
let dict_array2: DictionaryArray<Int8Type> =
Int8DictionaryArray::from(get_dict_arraydata(keys2, key_type, value_data));
let result = eq_dyn(&dict_array1, &dict_array2);
assert!(result.is_ok());
assert_eq!(result.unwrap(), BooleanArray::from(vec![true, false, true]));
}
```
</details>
# What changes are included in this PR?
1. Add `DictionaryArray::try_new()` function
2. Tests
2. Doc tests
3. Update tests from https://github.com/apache/arrow-rs/pull/1263 to use the new API
<!---
If there are user-facing changes then we may require documentation to be updated before approving the PR.
-->
<!---
If there are any breaking changes to public APIs, please add the `breaking change` label.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1040744719
Filed https://github.com/apache/arrow-rs/issues/1313 to track performance improvement
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1040745303
Thanks for the comments and review @jhorstmann and @viirya
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] jhorstmann commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays
Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1037978864
Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] jhorstmann commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays
Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1037978864
Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on a change in pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#discussion_r803965759
##########
File path: arrow/src/array/array_dictionary.rs
##########
@@ -69,6 +86,27 @@ pub struct DictionaryArray<K: ArrowPrimitiveType> {
}
impl<'a, K: ArrowPrimitiveType> DictionaryArray<K> {
+ /// Attempt to create a new DictionaryArray with a specified keys
+ /// (dictionary) and values (indexes into that dictionary)
Review comment:
🤦 yes
```suggestion
/// (indexes into the dictionary) and values (dictionary)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1038126396
> Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.
I agree it is unfortunately
I'll see if I can make a `new_unchecked()` and then also minimize the checking in `try_new()` to only verify that all indexes are within the range of the dictionary, which Ido think is required for this to be a `safe` API.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb merged pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays
Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#issuecomment-1038126396
> Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.
I agree it is unfortunately
I'll see if I can make a `new_unchecked()` and then also minimize the checking in `try_new()` to only verify that all indexes are within the range of the dictionary, which Ido think is required for this to be a `safe` API.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] viirya commented on a change in pull request #1300: Add `DictionaryArray::try_new()` to create dictionaries from pre existing arrays
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #1300:
URL: https://github.com/apache/arrow-rs/pull/1300#discussion_r803948219
##########
File path: arrow/src/array/array_dictionary.rs
##########
@@ -69,6 +86,27 @@ pub struct DictionaryArray<K: ArrowPrimitiveType> {
}
impl<'a, K: ArrowPrimitiveType> DictionaryArray<K> {
+ /// Attempt to create a new DictionaryArray with a specified keys
+ /// (dictionary) and values (indexes into that dictionary)
Review comment:
typo? keys (indexes into the dictionary) and values (dictionary).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org