You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "mapleFU (via GitHub)" <gi...@apache.org> on 2023/05/20 05:40:02 UTC

[GitHub] [arrow] mapleFU opened a new issue, #35697: [C++][Parquet] NotImplemented: Lists with non-zero length null components are not supported

mapleFU opened a new issue, #35697:
URL: https://github.com/apache/arrow/issues/35697

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   When I tests the code below:
   
   ```
   TEST(ArrowReadWrite, ListOfStructOfList2) {
     using ::arrow::field;
     using ::arrow::list;
     using ::arrow::struct_;
   
     auto type =
         list(field("item",
                    struct_({field("a", ::arrow::int16(), /*nullable=*/false),
                             field("b", list(::arrow::int64()), /*nullable=*/false)}),
                    /*nullable=*/false));
   
     const char* json = R"([
         [{"a": 123, "b": [1, 2, 3]}],
         null,
         [],
         [{"a": 456, "b": []}, {"a": 789, "b": [null]}, {"a": 876, "b": [4, 5, 6]}]])";
     auto array = ::arrow::ArrayFromJSON(type, json);
     auto table = ::arrow::Table::Make(::arrow::schema({field("root", type)}), {array});
     CheckSimpleRoundtrip(table, 2);
   }
   ```
   
   It reports that `" NotImplemented: Lists with non-zero length null components are not supported"`
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35697: [C++][Parquet] NotImplemented: Lists with non-zero length null components are not supported

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35697:
URL: https://github.com/apache/arrow/issues/35697#issuecomment-1557059093

   @emkornfield @pitrou Would you mind take a look at https://github.com/apache/arrow/issues/35697#issuecomment-1557051817 ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] fennerm commented on issue #35697: [C++][Parquet] NotImplemented: Lists with non-zero length null components are not supported

Posted by "fennerm (via GitHub)" <gi...@apache.org>.
fennerm commented on issue #35697:
URL: https://github.com/apache/arrow/issues/35697#issuecomment-1714190291

   I'm seeing (I think) the same issue in rust - https://github.com/pola-rs/polars/issues/10983. Let me know if a separate ticket would be helpful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35697: [C++][Parquet] NotImplemented: Lists with non-zero length null components are not supported

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35697:
URL: https://github.com/apache/arrow/issues/35697#issuecomment-1714287376

   @fennerm You can try to use List to workaround first, currently I don't have enough bandwidth to working on this during this month


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] fennerm commented on issue #35697: [C++][Parquet] NotImplemented: Lists with non-zero length null components are not supported

Posted by "fennerm (via GitHub)" <gi...@apache.org>.
fennerm commented on issue #35697:
URL: https://github.com/apache/arrow/issues/35697#issuecomment-1714289476

   Thanks @mapleFU no worries : )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35697: [C++][Parquet] NotImplemented: Lists with non-zero length null components are not supported

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35697:
URL: https://github.com/apache/arrow/issues/35697#issuecomment-1557051817

   ```c++
   TEST(ArrowReadWrite, NestedNonFixedSizeList3) {
     using ::arrow::field;
     using ::arrow::list;
     using ::arrow::struct_;
   
     auto type = list(list(::arrow::int16()));
   
     const char* json = R"([
         [[1, 2], [3, 4]],
         null,
         [[5, 6], null],
         [null, [7, 8]]])";
     auto array = ::arrow::ArrayFromJSON(type, json);
     auto table = ::arrow::Table::Make(::arrow::schema({field("root", type)}), {array});
     auto props_store_schema = ArrowWriterProperties::Builder().store_schema()->build();
     CheckSimpleRoundtrip(table, 2, props_store_schema);
   }
   ```
   
   By the way, this case can pass the test. I gothrough the code, and I guess I've find out the reason. The test arrow expect to write batch with size "2"
   
   The batch1:
   
   ```
         [[1, 2], [3, 4]],
         null
   ```
   
   The batch2:
   
   ```
         [[5, 6], null],
         [null, [7, 8]]
   ```
   
   Now, for `List` ( not fixed-size list ), the underlying data (in array) are:
   
   ```
   1 2 3 4 5 6 7 8
   ```
   
   So, when calling `WritePath` in `src/parquet/arrow/path_internal.cc`, the underlying data is successive, so `RecordPostListVisit` will concat them together:
   
   ```
     // Incorporates |range| into visited elements. If the |range| is contiguous
     // with the last range, extend the last range, otherwise add |range| separately
     // to the list.
     void RecordPostListVisit(const ElementRange& range) {
       if (!visited_elements.empty() && range.start == visited_elements.back().end) {
         visited_elements.back().end = range.end;
         return;
       }
       visited_elements.push_back(range);
     }
   ```
   
   However, for `FixedSizeList`, the underlying data is:
   
   ```
   1 2 ? ? ? ? 3 4 5 6 ? ? ? ? 7 8
   ```
   
   So, underlying data is **not** successive, and `WritePath` will trigger:
   
   ```
                 size_t visited_component_size = result.post_list_visited_elements.size();
                 DCHECK_GT(visited_component_size, 0);
                 if (visited_component_size != 1) {
                   return Status::NotImplemented(
                       "Lists with non-zero length null components are not supported");
                 }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] emkornfield commented on issue #35697: [C++][Parquet] NotImplemented: Lists with non-zero length null components are not supported

Posted by "emkornfield (via GitHub)" <gi...@apache.org>.
emkornfield commented on issue #35697:
URL: https://github.com/apache/arrow/issues/35697#issuecomment-1565743136

   Yes, this is a current short-coming, and you've summarized the issue nicely.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org