You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/16 02:04:43 UTC

[GitHub] [arrow] nevi-me opened a new pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

nevi-me opened a new pull request #8200:
URL: https://github.com/apache/arrow/pull/8200


   This enables more integration tests, and makes the necessary changes to make them pass. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8200:
URL: https://github.com/apache/arrow/pull/8200


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-693125479


   https://issues.apache.org/jira/browse/ARROW-8883


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-706552752


   Thanks a lot for driving this and CCing. This is definitely important.
   
   I have myself hit that `capacity` problem multiple times! One was when I was trying to simplify the `equal.rs`, for what you wrote on which buffers are different when the capacity is different, the second one was on #8401 , since when we have a buffer that receives data via the c data interface, we do not even know (or care) about its capacity.
   
   AFAI can tell, we have two use-cases of `capacity` atm:
   
   * deallocating the region when it is no longer used
   * computing the total size in bytes of arrays
   
   Because arrays share buffers, the total size of an array is currently misleading. For example, when the array is computed from `is_not_null` of another array, both the null buffer (buffer 0) and the `value` (buffer 1) share the same memory region, and thus IMO the total size computation based on `capacity` is incorrect. This is also true for complex structs on which buffers are shared within the same array.
   
   Thus, IMO `capacity` main use-case is for bookkeeping, on how to de-allocate the region. #8401 systematizes that idea, on which `BufferData` (renamed `Bytes` there) no longer has `capacity`, but an `enum` about how to de-allocate itself.
   
   Regardless, I would say that the vast majority of the use-cases on which we want to compare buffers is when we want to compare its contents, irrespectively of how they should be deallocated / capacity. Therefore, I would be happy to have buffer comparison be based on their actual content (in bytes). We just need to be careful about bitmaps, on which the comparison should be made in bits.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-699117248


   Remember to update the [implementation status](https://github.com/apache/arrow/blob/master/docs/source/status.rst) when done ;-)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#discussion_r512922566



##########
File path: rust/arrow/src/ipc/convert.rs
##########
@@ -609,10 +628,45 @@ pub(crate) fn get_fb_field_type<'a: 'b, 'b>(
                 children: Some(fbb.create_vector(&children[..])),
             }
         }
+        Dictionary(_, value_type) => {
+            // In this library, the dictionary "type" is a logical construct. Here we
+            // pass through to the value type, as we've already captured the index
+            // type in the DictionaryEncoding metadata in the parent field

Review comment:
       I also struggled with this a bit in the c-data interface.

##########
File path: rust/arrow/src/util/integration_util.rs
##########
@@ -97,12 +144,43 @@ impl ArrowJsonSchema {
         for i in 0..field_len {
             let json_field = &self.fields[i];
             let field = schema.field(i);
-            assert_eq!(json_field, &field.to_json());
+            if !json_field.equals_field(field) {
+                return false;
+            }
         }
         true
     }
 }
 
+impl ArrowJsonField {
+    /// Compare the Arrow JSON field with the Arrow `Field`
+    fn equals_field(&self, field: &Field) -> bool {
+        dbg!((&self, &field));

Review comment:
       dbg

##########
File path: rust/arrow/src/array/data.rs
##########
@@ -209,6 +209,61 @@ impl ArrayData {
     }
 }
 
+impl PartialEq for ArrayData {
+    fn eq(&self, other: &Self) -> bool {
+        assert_eq!(

Review comment:
       does't `assert_eq!` panic?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-724490574


   From what I have read in this PR, it looks good so far. Is there anything blocking it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-724504342


   @jorgecarleitao we're down to 3 failures.
   
   ```rust
   thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', arrow/src/array/equal/variable_size.rs:31:21
   ```
   
   Looks like we're not comparing zero length arrays correctly


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-724906422


   @jorgecarleitao finally passed! There's still some tests that don't pass, but I've updated the status documenation with them (cc @andygrove as you had opened a JIRA for this)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-706553767


   > Thus, IMO `capacity` main use-case is for bookkeeping, on how to de-allocate the region. #8401 systematizes that idea, on which `BufferData` (renamed `Bytes` there) no longer has `capacity`, but an `enum` about how to de-allocate itself.
   
   I agree with your thinking here, especially on equality and arrays with offsets. One of the things we don't test (extensively or at all) is whether we're able to write arrays with unaligned offsets to the IPC format. I'll put in time to work on this whenever I get to v5 of the Arrow format.
   
   > Regardless, I would say that the vast majority of the use-cases on which we want to compare buffers is when we want to compare its contents, irrespectively of how they should be deallocated / capacity. Therefore, I would be happy to have buffer comparison be based on their actual content (in bytes). We just need to be careful about bitmaps, on which the comparison should be made in bits.
   
   I'd also prefer not to compare using buffer capacity, as I'm mainly interested in the buffer contents, especially when dealing with a frozen buffer where we won't be modifying contents.
   
   On bitmaps, I'll have a look, but I think that the approach of using the bit iterators should cover this.
   
   Apologies again for not having gotten to #8401 yet, I hope you're not in a rush to get it implemented (esp as we're now looking at 3.0 for all new work).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-724493404


   > From what I have read in this PR, it looks good so far. Is there anything blocking it?
   
   @jorgecarleitao the actual integration tests not passing, are the blocker 😅
   
   There were 10 failures in the last run, logical equality related. It's actually odd, because on my local testing I get 120+ failures between Rust and the other langs (except C++, that passes).
   
   I'll monitor https://github.com/apache/arrow/pull/8200/checks?check_run_id=1378263093 in the background, if it passes; we can merge this (if you don't have any outstanding queries/concerns).
   
   I used your suggested implementation of logical equality (https://github.com/apache/arrow/pull/8200/commits/dde96c6d2628c7c0eedc4f7c15061e746b86fb36)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8200: ARROW-8883: [Rust] [Integration] Enable more tests

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-724489474


   @jorgecarleitao @carols10cents I'm contemplating closing this, and breaking it up into smaller PRs (if this round of changes don't pass integration). I'll work on it this week in my evenings (GMT+2) so I can get it out of the way, and go back to working on Parquet.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org