You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/03 21:44:10 UTC

[GitHub] [arrow] carols10cents opened a new pull request #8826: ARROW-10801: [Rust] [Flight] Support sending FlightData for Dictionaries with that of a RecordBatch

carols10cents opened a new pull request #8826:
URL: https://github.com/apache/arrow/pull/8826


   This PR contains a series of refactorings to extract code from the IPC `FileWriter` and `StreamWriter` that generates `EncodedData` instances. I highly recommend reviewing commit-by-commit, I tried to make the changes fairly mechanical and easy to understand at each step.
   
   With those refactorings, it becomes easier to reuse the code with Flight too, so that when we're generating `FlightData` for a `RecordBatch`, we get the `FlightData` for the batch's dictionaries too.
   
   This will help in getting the Flight integration tests that include dictionaries to pass... I'm also having some unrelated protocol problems over in [this PR I'm working on](https://github.com/apache/arrow/pull/8641) but this work will help there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8826: ARROW-10801: [Rust] [Flight] Support sending FlightData for Dictionaries with that of a RecordBatch

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8826:
URL: https://github.com/apache/arrow/pull/8826


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] carols10cents commented on a change in pull request #8826: ARROW-10801: [Rust] [Flight] Support sending FlightData for Dictionaries with that of a RecordBatch

Posted by GitBox <gi...@apache.org>.
carols10cents commented on a change in pull request #8826:
URL: https://github.com/apache/arrow/pull/8826#discussion_r537577588



##########
File path: rust/arrow-flight/src/utils.rs
##########
@@ -26,30 +26,40 @@ use arrow::error::{ArrowError, Result};
 use arrow::ipc::{convert, reader, writer, writer::IpcWriteOptions};
 use arrow::record_batch::RecordBatch;
 
-/// Convert a `RecordBatch` to `FlightData` by converting the header and body to bytes
+/// Convert a `RecordBatch` to a vector of `FlightData` representing the bytes of the dictionaries
+/// and values. This can't be a `From` implementation because neither `RecordBatch` nor `Vec` are
+/// implemented in this crate.
 ///
 /// Note: This implicitly uses the default `IpcWriteOptions`. To configure options,
 /// use `flight_data_from_arrow_batch()`
-impl From<&RecordBatch> for FlightData {
-    fn from(batch: &RecordBatch) -> Self {
-        let options = IpcWriteOptions::default();
-        flight_data_from_arrow_batch(batch, &options)
-    }
+pub fn convert_to_flight_data(batch: &RecordBatch) -> Vec<FlightData> {
+    let options = IpcWriteOptions::default();

Review comment:
       Done! I think this is ready for rereview :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8826: ARROW-10801: [Rust] [Flight] Support sending FlightData for Dictionaries with that of a RecordBatch

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8826:
URL: https://github.com/apache/arrow/pull/8826#discussion_r541786808



##########
File path: rust/arrow/src/ipc/writer.rs
##########
@@ -98,6 +98,239 @@ impl Default for IpcWriteOptions {
     }
 }
 
+#[derive(Debug, Default)]
+pub struct IpcDataGenerator {}
+
+impl IpcDataGenerator {
+    pub fn schema_to_bytes(
+        &self,
+        schema: &Schema,
+        write_options: &IpcWriteOptions,
+    ) -> EncodedData {
+        let mut fbb = FlatBufferBuilder::new();
+        let schema = {
+            let fb = ipc::convert::schema_to_fb_offset(&mut fbb, schema);
+            fb.as_union_value()
+        };
+
+        let mut message = ipc::MessageBuilder::new(&mut fbb);
+        message.add_version(write_options.metadata_version);
+        message.add_header_type(ipc::MessageHeader::Schema);
+        message.add_bodyLength(0);
+        message.add_header(schema);
+        // TODO: custom metadata

Review comment:
       We should attach JIRAs to the TODOs so we don't lose track of them




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8826: ARROW-10801: [Rust] [Flight] Support sending FlightData for Dictionaries with that of a RecordBatch

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8826:
URL: https://github.com/apache/arrow/pull/8826#discussion_r536396692



##########
File path: rust/arrow-flight/src/utils.rs
##########
@@ -26,30 +26,40 @@ use arrow::error::{ArrowError, Result};
 use arrow::ipc::{convert, reader, writer, writer::IpcWriteOptions};
 use arrow::record_batch::RecordBatch;
 
-/// Convert a `RecordBatch` to `FlightData` by converting the header and body to bytes
+/// Convert a `RecordBatch` to a vector of `FlightData` representing the bytes of the dictionaries
+/// and values. This can't be a `From` implementation because neither `RecordBatch` nor `Vec` are

Review comment:
       Happy with this, the `From` impl was hacky, and doesn't work once we need IPC options

##########
File path: rust/arrow-flight/src/utils.rs
##########
@@ -26,30 +26,40 @@ use arrow::error::{ArrowError, Result};
 use arrow::ipc::{convert, reader, writer, writer::IpcWriteOptions};
 use arrow::record_batch::RecordBatch;
 
-/// Convert a `RecordBatch` to `FlightData` by converting the header and body to bytes
+/// Convert a `RecordBatch` to a vector of `FlightData` representing the bytes of the dictionaries
+/// and values. This can't be a `From` implementation because neither `RecordBatch` nor `Vec` are
+/// implemented in this crate.
 ///
 /// Note: This implicitly uses the default `IpcWriteOptions`. To configure options,
 /// use `flight_data_from_arrow_batch()`
-impl From<&RecordBatch> for FlightData {
-    fn from(batch: &RecordBatch) -> Self {
-        let options = IpcWriteOptions::default();
-        flight_data_from_arrow_batch(batch, &options)
-    }
+pub fn convert_to_flight_data(batch: &RecordBatch) -> Vec<FlightData> {
+    let options = IpcWriteOptions::default();

Review comment:
       It's safer to take this as a function argument, because one might want to support either V4 or V5 of the IPC format.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8826: ARROW-10801: [Rust] [Flight] Support sending FlightData for Dictionaries with that of a RecordBatch

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8826:
URL: https://github.com/apache/arrow/pull/8826#issuecomment-738339929


   https://issues.apache.org/jira/browse/ARROW-10801


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org