You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/18 17:40:22 UTC

[GitHub] [arrow] kszucs opened a new pull request #10093: ARROW-12420: [C++/Dataset] Reading null columns as dictionary not longer possible

kszucs opened a new pull request #10093:
URL: https://github.com/apache/arrow/pull/10093


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10093: ARROW-12420: [C++/Dataset] Reading null columns as dictionary not longer possible

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10093:
URL: https://github.com/apache/arrow/pull/10093#discussion_r615430752



##########
File path: cpp/src/arrow/array/array_dict.cc
##########
@@ -80,7 +80,6 @@ int64_t DictionaryArray::GetValueIndex(int64_t i) const {
 DictionaryArray::DictionaryArray(const std::shared_ptr<ArrayData>& data)
     : dict_type_(checked_cast<const DictionaryType*>(data->type.get())) {
   ARROW_CHECK_EQ(data->type->id(), Type::DICTIONARY);
-  ARROW_CHECK_NE(data->dictionary, nullptr);

Review comment:
       `array->dictionary()` lazily initializes the dictionary if it is not set




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #10093: ARROW-12420: [C++/Dataset] Reading null columns as dictionary not longer possible

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #10093:
URL: https://github.com/apache/arrow/pull/10093#issuecomment-822068562


   I reproduced the problem using a simple cast test from null array to a dictionary array:
   
   ```cpp
   TEST(Cast, FromNullToDictionary) {
     auto from = std::make_shared<NullArray>(10);
     auto to_type = dictionary(int8(), boolean());
   
     ASSERT_OK_AND_ASSIGN(auto expected, MakeArrayOfNull(to_type, 10));
     CheckCast(from, expected);
   }
   ```
   
   Somewhere `ArrayData::dictionary` gets set to nullptr or not set at all. I don't have a solution yet, it's a but hard to debug due to the Datum variant. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs closed pull request #10093: ARROW-12420: [C++/Dataset] Reading null columns as dictionary not longer possible

Posted by GitBox <gi...@apache.org>.
kszucs closed pull request #10093:
URL: https://github.com/apache/arrow/pull/10093


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs edited a comment on pull request #10093: ARROW-12420: [C++/Dataset] Reading null columns as dictionary not longer possible

Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #10093:
URL: https://github.com/apache/arrow/pull/10093#issuecomment-822068562


   I reproduced the problem using a simple cast test from null array to a dictionary array:
   
   ```cpp
   TEST(Cast, FromNullToDictionary) {
     auto from = std::make_shared<NullArray>(10);
     auto to_type = dictionary(int8(), boolean());
   
     ASSERT_OK_AND_ASSIGN(auto expected, MakeArrayOfNull(to_type, 10));
     CheckCast(from, expected);
   }
   ```
   
   Somewhere `ArrayData::dictionary` gets set to nullptr or not set at all. I don't have a solution yet, it's a but hard to debug due to the Datum variant. 
   
   cc @bkietz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10093: ARROW-12420: [C++/Dataset] Reading null columns as dictionary not longer possible

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10093:
URL: https://github.com/apache/arrow/pull/10093#discussion_r615430752



##########
File path: cpp/src/arrow/array/array_dict.cc
##########
@@ -80,7 +80,6 @@ int64_t DictionaryArray::GetValueIndex(int64_t i) const {
 DictionaryArray::DictionaryArray(const std::shared_ptr<ArrayData>& data)
     : dict_type_(checked_cast<const DictionaryType*>(data->type.get())) {
   ARROW_CHECK_EQ(data->type->id(), Type::DICTIONARY);
-  ARROW_CHECK_NE(data->dictionary, nullptr);

Review comment:
       `array->dictionary()` lazily initializes the dictionary if it is not set




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #10093: ARROW-12420: [C++/Dataset] Reading null columns as dictionary not longer possible

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #10093:
URL: https://github.com/apache/arrow/pull/10093#discussion_r615455214



##########
File path: cpp/src/arrow/array/array_dict.cc
##########
@@ -80,7 +80,6 @@ int64_t DictionaryArray::GetValueIndex(int64_t i) const {
 DictionaryArray::DictionaryArray(const std::shared_ptr<ArrayData>& data)
     : dict_type_(checked_cast<const DictionaryType*>(data->type.get())) {
   ARROW_CHECK_EQ(data->type->id(), Type::DICTIONARY);
-  ARROW_CHECK_NE(data->dictionary, nullptr);

Review comment:
       This must be restored.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10093: ARROW-12420: [C++/Dataset] Reading null columns as dictionary not longer possible

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10093:
URL: https://github.com/apache/arrow/pull/10093#issuecomment-822029949


   https://issues.apache.org/jira/browse/ARROW-12420


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org