You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/27 16:02:17 UTC

[GitHub] [arrow] sweb opened a new pull request #8784: WIP: ARROW-10674: [Rust] Add IPC Reader/Writer for Decimal type to allow integration tests

sweb opened a new pull request #8784:
URL: https://github.com/apache/arrow/pull/8784


   This is a follow up to #8640 
   
   Currently, there is a first working IPC reader test using data from `testing/arrow-ipc-stream/integration/0.14.1/generated_decimal.arrow_file`
   
   However, this lead me to discover that my first decimal type implementation is wrong, in that it uses BigEndian, whereas this is parquet specific and therefore should not be used in arrow/array and so on. I will try to address this in this PR as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8784: WIP: ARROW-10674: [Rust] Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#issuecomment-734900515


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #8784: WIP: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#discussion_r534818902



##########
File path: rust/arrow/src/ipc/convert.rs
##########
@@ -97,6 +97,12 @@ pub fn fb_to_schema(fb: ipc::Schema) -> Schema {
     let len = c_fields.len();
     for i in 0..len {
         let c_field: ipc::Field = c_fields.get(i);
+        match c_field.type_type() {

Review comment:
       @alamb could you take another look at my attempt to add the unimplemented path for big endian?
   
   I am not happy with placing the check in `fb_to_schema` and would have preferred to put it in `get_data_type` but I found no way to pass on the endianness from the schema.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8784: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#issuecomment-737728209


   https://issues.apache.org/jira/browse/ARROW-10674


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8784: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#issuecomment-738732868


   I just double checked though and the CI integration test is still failing here:
   
   https://github.com/apache/arrow/pull/8784/checks?check_run_id=1497340378
   ```
   go: github.com/prometheus/client_model@v0.0.0-20190812154241-14fe0d1b01d4: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in /go/pkg/mod/cache/vcs/2a98e665081184f4ca01f0af8738c882495d1fb131b7ed20ad844d3ba1bb6393: exit status 128:
   	error: RPC failed; curl 18 transfer closed with outstanding read data remaining
   	fatal: error reading section header 'shallow-info'
   Fetching https://golang.org/x/exp?go-get=1
   ```
   
   Which seems maybe unrelated  (an error fetching go?)
   
   I restarted that test to see if the problem was some intermittent infrastructure error
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #8784: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#discussion_r535429404



##########
File path: rust/arrow/src/ipc/convert.rs
##########
@@ -97,6 +97,12 @@ pub fn fb_to_schema(fb: ipc::Schema) -> Schema {
     let len = c_fields.len();
     for i in 0..len {
         let c_field: ipc::Field = c_fields.get(i);
+        match c_field.type_type() {

Review comment:
       @alamb thank you for being so nice about it - I was just too lazy to add a test and should receive full scrutiny ;)
   
   This is partly due to the fact that I am not very familiar with flatbuffers and still do not fully understand how to create the appropriate flatbuffer to test this. As a temporary solution, I have added two tests to `ipc::reader` that uses the BigEndian files in `arrow-ipc-stream/integration/1.0.0-bigendian`. The one for decimal fails, the others work. I hope this is okay for now, until I am able to construct the correct schema message to test this directly in `ipc::convert`.
   
   While adding the big endian test for the other types I noticed that the contents are not equal to the json content. That is why the test does not contain an equality check. Thus, there may be problems with Big Endian for other types as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8784: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#issuecomment-738772091


   CI passed!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #8784: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#issuecomment-739071404


   I'm on the road over the weekend, but I'll try look at this maybe on Sunday evening.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8784: WIP: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#issuecomment-737356705


   @sweb  -- it seems from the arrow definition that the endianness *may* be big or little endiain:
   https://github.com/apache/arrow/blob/master/format/Schema.fbs#L175-L178
   
   ```
   /// Exact decimal value represented as an integer value in two's
   /// complement. Currently only 128-bit (16-byte) and 256-bit (32-byte) integers
   /// are used. The representation uses the endianness indicated
   /// in the Schema.
   ```
   
   
   So I suggest at least validating the schema in the IPC message and error'ing with a "unimplemented" type error if it the schema is Big Endian. 
   
   The endianness can be checked:
   https://docs.rs/arrow/2.0.0/arrow/ipc/gen/Schema/struct.Schema.html#method.endianness
   
   Note that I still think this implementation (that now passes a new test) is still ok to merge in as is (as it is *more* correct for one case) but gracefully handling an unimplemented endianness would be better. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #8784: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8784:
URL: https://github.com/apache/arrow/pull/8784


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8784: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#discussion_r535138494



##########
File path: rust/arrow/src/ipc/convert.rs
##########
@@ -97,6 +97,12 @@ pub fn fb_to_schema(fb: ipc::Schema) -> Schema {
     let len = c_fields.len();
     for i in 0..len {
         let c_field: ipc::Field = c_fields.get(i);
+        match c_field.type_type() {

Review comment:
       I see the problem -- yes, there since the `endianness` is on the shema object, not the field, since the field is all that is passed around there is no way to know what the details of the schema are.
   
   I personally think this code is fine, if a bit un-indeal and could be cleaned up in the future. My only worry is that it would get lost / broken during such cleanup
   
   What would you think about adding a test that triggers the error? Then we could be sure that any future cleanups will not break the check?

##########
File path: rust/arrow/src/ipc/convert.rs
##########
@@ -97,6 +97,12 @@ pub fn fb_to_schema(fb: ipc::Schema) -> Schema {
     let len = c_fields.len();
     for i in 0..len {
         let c_field: ipc::Field = c_fields.get(i);
+        match c_field.type_type() {

Review comment:
       I see the problem -- yes, there since the `endianness` is on the shema object, not the field, since the field is all that is passed around there is no way to know what the details of the schema are.
   
   I personally think this code is fine, if a bit un-indeal and could be cleaned up in the future. My only worry is that it would get lost / broken during such cleanup
   
   What would you think about adding a test that triggers the error? Then we could be sure that any future cleanups will not break the check?
   
   Thanks again @sweb 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] removed a comment on pull request #8784: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] removed a comment on pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#issuecomment-734900515


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8784: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#discussion_r536028724



##########
File path: rust/arrow/src/ipc/reader.rs
##########
@@ -1008,6 +1008,48 @@ mod tests {
         });
     }
 
+    #[test]
+    #[should_panic(expected = "Big Endian is not supported for Decimal!")]
+    fn read_decimal_file_be() {

Review comment:
       👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on pull request #8784: WIP: ARROW-10674: [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests

Posted by GitBox <gi...@apache.org>.
sweb commented on pull request #8784:
URL: https://github.com/apache/arrow/pull/8784#issuecomment-737448455


   Hey @alamb thank you for the review!
   
   I will add an unimplemented path to indicate a potential misuse - thank you for your hint on how to check endianness - I was not aware that this was available.
   
   I am currently trying to add the required conversions to convert from parquet (big endian, fixed size) to arrow (little endian, 128bit), but maybe this is something I will add in a separate PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org