You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/07 10:49:04 UTC

[GitHub] [arrow] nevi-me opened a new pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

nevi-me opened a new pull request #8608:
URL: https://github.com/apache/arrow/pull/8608


   This changes a list datatype to use `Box<Field>` instead of `Box<DataType>`.
   
   This change is needed in order to make both Parquet and IPC roundtrips pass.
   The C++ implementation uses `Field`, as it allows for preservice list field names and nullability.
   
   There are some implementation details which we did not cover in this PR, and will work on as a follow-up, and these are:
   * Documenting the naming conventions of Arrow and Parquet on list names (Arrow C++ uses "item", Parquet has different compatibility options)
   * Fixing Parquet failing list tests that have been ignored (I'd like to do this separately, as this isn't a full fix)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8608:
URL: https://github.com/apache/arrow/pull/8608#discussion_r519169297



##########
File path: rust/arrow/src/ipc/convert.rs
##########
@@ -537,59 +529,62 @@ pub(crate) fn get_fb_field_type<'a: 'b, 'b>(
             }
         }
         List(ref list_type) => {
-            let inner_types = get_fb_field_type(list_type, fbb);
-            let child = ipc::Field::create(
-                fbb,
-                &ipc::FieldArgs {
-                    name: None,
-                    nullable: false,
-                    type_type: inner_types.type_type,
-                    type_: Some(inner_types.type_),
-                    children: inner_types.children,
-                    dictionary: None,
-                    custom_metadata: None,
-                },
-            );
+            let child = build_field(fbb, list_type);
+            // let inner_types = get_fb_field_type(list_type, fbb);

Review comment:
       Forgot about it(them), I was struggling with compiling due to lifetime issues, so I commented out those lines in case I needed to revert them. I've now removed them




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8608:
URL: https://github.com/apache/arrow/pull/8608#discussion_r519166944



##########
File path: rust/arrow/src/datatypes.rs
##########
@@ -1182,6 +1183,18 @@ impl Field {
         self.nullable
     }
 
+    /// IReturns the dictionary ID

Review comment:
       ```suggestion
       /// Returns the dictionary ID
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8608:
URL: https://github.com/apache/arrow/pull/8608#discussion_r519169243



##########
File path: rust/arrow/src/datatypes.rs
##########
@@ -1182,6 +1183,18 @@ impl Field {
         self.nullable
     }
 
+    /// IReturns the dictionary ID

Review comment:
       Thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8608:
URL: https://github.com/apache/arrow/pull/8608#issuecomment-723437224


   > I have a question related to why do we use a `Box<>` instead of just putting it into the stack, but other parts of the code have that, and thus makes sense to keep it as is.
   
   We need some indirection to break the potential infinite recursion
   
   ```rust
   error[E0072]: recursive type `datatypes::Field` has infinite size
      --> arrow\src\datatypes.rs:188:1
       |
   188 | pub struct Field {
       | ^^^^^^^^^^^^^^^^ recursive type has infinite size
   189 |     name: String,
   190 |     data_type: DataType,
       |     ------------------- recursive without indirection
       |
       = help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `datatypes::Field` representable
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8608:
URL: https://github.com/apache/arrow/pull/8608#issuecomment-723436980


   Ready to merge if the tests pass. I went through all the changes and they seem reasonable. Most issues I had were the same that @alamb suggested, and the rest seems straightforward given the changes on the DataType.
   
   I have a question related to why do we use a `Box<>` instead of just putting it into the stack, but other parts of the code have that, and thus makes sense to keep it as is.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8608:
URL: https://github.com/apache/arrow/pull/8608#issuecomment-723436040


   > We can always add such a convenience function later, however. It is definitely not required in this PR
   
   Yes @alamb, we can address it as a follow-up. There's more work that I also need to do on top of this, I've hardcoded "item" in a bunch of places, and on the Parquet side, there's some compatibility tests that I need to write.
   
   @jorgecarleitao happy that we merge this if/when tests pass, or would you like another reviewer to look at this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8608:
URL: https://github.com/apache/arrow/pull/8608#issuecomment-723431612


   https://issues.apache.org/jira/browse/ARROW-10261


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8608:
URL: https://github.com/apache/arrow/pull/8608#discussion_r519166183



##########
File path: rust/arrow/src/ipc/convert.rs
##########
@@ -537,59 +529,62 @@ pub(crate) fn get_fb_field_type<'a: 'b, 'b>(
             }
         }
         List(ref list_type) => {
-            let inner_types = get_fb_field_type(list_type, fbb);
-            let child = ipc::Field::create(
-                fbb,
-                &ipc::FieldArgs {
-                    name: None,
-                    nullable: false,
-                    type_type: inner_types.type_type,
-                    type_: Some(inner_types.type_),
-                    children: inner_types.children,
-                    dictionary: None,
-                    custom_metadata: None,
-                },
-            );
+            let child = build_field(fbb, list_type);
+            // let inner_types = get_fb_field_type(list_type, fbb);

Review comment:
       I wonder if there is any reason to leave this commented out? As in why not remove the old version?
   
   I have the same question for the other instances in this file




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8608: ARROW-10261: [Rust] [Breaking] Change List datatype to Box

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8608:
URL: https://github.com/apache/arrow/pull/8608


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org