You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "yah01 (via GitHub)" <gi...@apache.org> on 2023/05/16 11:29:45 UTC

[GitHub] [arrow-rs] yah01 opened a new pull request, #4226: Support to read/write parquet for FixedSizeList type

yah01 opened a new pull request, #4226:
URL: https://github.com/apache/arrow-rs/pull/4226

   # Which issue does this PR close?
   
   Closes #4214 
   
   # Rationale for this change
   
   As mentioned in the issue
   
   # What changes are included in this PR?
   
   - Build levels for `FixedSizeList`
   - Impl write `FixedSizeList` data for ArrowWriter
   - Impl read `ListArray` to `FixedSizeList` for ArrowReader
   
   # Are there any user-facing changes?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4226: Support to read/write parquet for FixedSizeList type

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4226:
URL: https://github.com/apache/arrow-rs/pull/4226#discussion_r1200307735


##########
parquet/src/arrow/array_reader/list_array.rs:
##########
@@ -227,8 +228,13 @@ impl<OffsetSize: OffsetSizeTrait> ArrayReader for ListArrayReader<OffsetSize> {
 
         let list_data = unsafe { data_builder.build_unchecked() };
 
-        let result_array = GenericListArray::<OffsetSize>::from(list_data);
-        Ok(Arc::new(result_array))
+        let result_array: ArrayRef = match *list_data.data_type() {
+            ArrowType::FixedSizeList(_, _) => {
+                Arc::new(FixedSizeListArray::from(list_data))

Review Comment:
   I can add it to my list, its not as trivial as I had first thought 😅



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] yah01 commented on a diff in pull request #4226: Support to read/write parquet for FixedSizeList type

Posted by "yah01 (via GitHub)" <gi...@apache.org>.
yah01 commented on code in PR #4226:
URL: https://github.com/apache/arrow-rs/pull/4226#discussion_r1200268851


##########
parquet/src/arrow/array_reader/list_array.rs:
##########
@@ -227,8 +228,13 @@ impl<OffsetSize: OffsetSizeTrait> ArrayReader for ListArrayReader<OffsetSize> {
 
         let list_data = unsafe { data_builder.build_unchecked() };
 
-        let result_array = GenericListArray::<OffsetSize>::from(list_data);
-        Ok(Arc::new(result_array))
+        let result_array: ArrayRef = match *list_data.data_type() {
+            ArrowType::FixedSizeList(_, _) => {
+                Arc::new(FixedSizeListArray::from(list_data))

Review Comment:
   > This is actually significantly more complicated, as the null padding logic is different for fixed size types. I'll have a think about how this could be supported
   
   Thank you @tustvold, I know not much about arrow implementation, please let me know if it's too complex so that I can't handle



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4226: Support to read/write parquet for FixedSizeList type

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4226:
URL: https://github.com/apache/arrow-rs/pull/4226#discussion_r1195040927


##########
parquet/src/arrow/array_reader/list_array.rs:
##########
@@ -227,8 +228,13 @@ impl<OffsetSize: OffsetSizeTrait> ArrayReader for ListArrayReader<OffsetSize> {
 
         let list_data = unsafe { data_builder.build_unchecked() };
 
-        let result_array = GenericListArray::<OffsetSize>::from(list_data);
-        Ok(Arc::new(result_array))
+        let result_array: ArrayRef = match *list_data.data_type() {
+            ArrowType::FixedSizeList(_, _) => {
+                Arc::new(FixedSizeListArray::from(list_data))

Review Comment:
   This is not correct, you are constructing a FixedSizeListArray from an ArrayData that contains an offsets buffer. In particular this code needs to verify that the values array is the right length. I'll see if I can't finish up #3879 so that you can make use of that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] yah01 closed pull request #4226: Support to read/write parquet for FixedSizeList type

Posted by "yah01 (via GitHub)" <gi...@apache.org>.
yah01 closed pull request #4226: Support to read/write parquet for FixedSizeList type
URL: https://github.com/apache/arrow-rs/pull/4226


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4226: Support to read/write parquet for FixedSizeList type

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4226:
URL: https://github.com/apache/arrow-rs/pull/4226#discussion_r1200043530


##########
parquet/src/arrow/array_reader/list_array.rs:
##########
@@ -227,8 +228,13 @@ impl<OffsetSize: OffsetSizeTrait> ArrayReader for ListArrayReader<OffsetSize> {
 
         let list_data = unsafe { data_builder.build_unchecked() };
 
-        let result_array = GenericListArray::<OffsetSize>::from(list_data);
-        Ok(Arc::new(result_array))
+        let result_array: ArrayRef = match *list_data.data_type() {
+            ArrowType::FixedSizeList(_, _) => {
+                Arc::new(FixedSizeListArray::from(list_data))

Review Comment:
   This is actually significantly more complicated, as the null padding logic is different for fixed size types. I'll have a think about how this could be supported



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] mapleFU commented on a diff in pull request #4226: Support to read/write parquet for FixedSizeList type

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #4226:
URL: https://github.com/apache/arrow-rs/pull/4226#discussion_r1200272593


##########
parquet/src/arrow/array_reader/list_array.rs:
##########
@@ -227,8 +228,13 @@ impl<OffsetSize: OffsetSizeTrait> ArrayReader for ListArrayReader<OffsetSize> {
 
         let list_data = unsafe { data_builder.build_unchecked() };
 
-        let result_array = GenericListArray::<OffsetSize>::from(list_data);
-        Ok(Arc::new(result_array))
+        let result_array: ArrayRef = match *list_data.data_type() {
+            ArrowType::FixedSizeList(_, _) => {
+                Arc::new(FixedSizeListArray::from(list_data))

Review Comment:
   Maybe you should padding some null data? As https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org