You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/05/27 12:03:47 UTC

[GitHub] [arrow-rs] alamb opened a new pull request, #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

alamb opened a new pull request, #4294:
URL: https://github.com/apache/arrow-rs/pull/4294

   
   
   # Which issue does this PR close?
   Closes  https://github.com/apache/arrow-rs/issues/4281
   
   # Rationale for this change
    
   It is a PITA to create `FlightInfo` as described on https://github.com/apache/arrow-rs/issues/4281
   
   As I was working on the code and examples, I realized it was a pain to create all the other types as well
   
   
   
   # What changes are included in this PR?
   
   Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`
   
   Since these are all `prost` generated stucts anyways all the fields are `pub` so I figured simply making it easier to modify them would be good
   
   I changed the signatures of `FlightInfo::new()` and `FlightData::new()` as the existing functions are not very useful, which I will explain inline


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4294:
URL: https://github.com/apache/arrow-rs/pull/4294#discussion_r1207920019


##########
arrow-flight/src/lib.rs:
##########
@@ -433,24 +464,45 @@ impl FlightDescriptor {
 }
 
 impl FlightInfo {
-    /// Create a new [`FlightInfo`] that describes the access
-    /// coordinates for retrieval of a dataset.
-    pub fn new(
-        message: IpcMessage,
-        flight_descriptor: Option<FlightDescriptor>,
-        endpoint: Vec<FlightEndpoint>,
-        total_records: i64,
-        total_bytes: i64,
-        ordered: bool,
-    ) -> Self {
-        let IpcMessage(vals) = message;
+    /// Create a new, empty `FlightInfo`, describing where to fetch flight data

Review Comment:
   Rather than making a new `FlightInfoBuilder` I eventually decided to just add the builder style methods on `FlightInfo` itself:
   1. It was simpler code
   2. I wasn't entirely sure what was valid / invalid semantically, so extra error checking seemed unecessary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4294:
URL: https://github.com/apache/arrow-rs/pull/4294#discussion_r1210200732


##########
arrow-flight/src/lib.rs:
##########
@@ -459,6 +511,51 @@ impl FlightInfo {
         let msg = IpcMessage(self.schema);
         msg.try_into()
     }
+
+    /// Specify the schema for the response.
+    ///
+    /// Note this takes the arrow [`Schema`] (not the IPC schema) and
+    /// encodes it using the default IPC options.
+    ///
+    /// Returns an error if `schema` can not be encoded into IPC form.
+    pub fn with_schema(mut self, schema: &Schema) -> ArrowResult<Self> {

Review Comment:
   in a6c973f13



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4294:
URL: https://github.com/apache/arrow-rs/pull/4294#discussion_r1207916592


##########
arrow-flight/examples/flight_sql_server.rs:
##########
@@ -222,26 +221,15 @@ impl FlightSqlService for FlightSqlServiceImpl {
             ticket: Some(ticket),
             location: vec![loc],
         };
-        let endpoints = vec![endpoint];
+        let info = FlightInfo::new()

Review Comment:
   This is a pretty good example of the "before" and "after" that this PR provides.
   
   It doesn't do anything different, it is just easier to work with the structures



##########
arrow-flight/examples/flight_sql_server.rs:
##########
@@ -292,32 +280,14 @@ impl FlightSqlService for FlightSqlServiceImpl {
         request: Request<FlightDescriptor>,
     ) -> Result<Response<FlightInfo>, Status> {
         let flight_descriptor = request.into_inner();
-        let ticket = Ticket {
-            ticket: query.encode_to_vec().into(),
-        };
-
-        let options = IpcWriteOptions::default();
-
-        // encode the schema into the correct form
-        let IpcMessage(schema) = SchemaAsIpc::new(SqlInfoList::schema(), &options)
-            .try_into()
-            .expect("valid sql_info schema");
-
-        let endpoint = vec![FlightEndpoint {
-            ticket: Some(ticket),
-            // we assume users wnating to use this helper would reasonably
-            // never need to be distributed across multile endpoints?
-            location: vec![],
-        }];
-
-        let flight_info = FlightInfo {
-            schema,
-            flight_descriptor: Some(flight_descriptor),
-            endpoint,
-            total_records: -1,
-            total_bytes: -1,
-            ordered: false,
-        };
+        let ticket = Ticket::new(query.encode_to_vec());
+        let endpoint = FlightEndpoint::new().with_ticket(ticket);
+
+        let flight_info = FlightInfo::new()

Review Comment:
   Same thing here -- I claim this is a lot easier to read



##########
arrow-flight/tests/flight_sql_client_cli.rs:
##########
@@ -167,42 +166,31 @@ impl FlightSqlService for FlightSqlServiceImpl {
 
         let batch = Self::fake_result().unwrap();
 
-        let IpcMessage(schema_bytes) =
-            SchemaAsIpc::new(batch.schema().as_ref(), &IpcWriteOptions::default())
-                .try_into()
-                .unwrap();
-
-        let info = FlightInfo {
-            schema: schema_bytes,
-            flight_descriptor: None,
-            endpoint: vec![
-                FlightEndpoint {
-                    ticket: Some(Ticket {
-                        ticket: FetchResults {
-                            handle: String::from("part_1"),
-                        }
-                        .as_any()
-                        .encode_to_vec()
-                        .into(),
-                    }),
-                    location: vec![],
-                },
-                FlightEndpoint {
-                    ticket: Some(Ticket {
-                        ticket: FetchResults {
-                            handle: String::from("part_2"),
-                        }
-                        .as_any()
-                        .encode_to_vec()
-                        .into(),
-                    }),
-                    location: vec![],
-                },
-            ],
-            total_records: batch.num_rows() as i64,
-            total_bytes: batch.get_array_memory_size() as i64,
-            ordered: false,
-        };
+        let info = FlightInfo::new()

Review Comment:
   Again, I think the new version is much more readable (it is still fine to use the old syntax)



##########
arrow-flight/src/lib.rs:
##########
@@ -433,24 +464,45 @@ impl FlightDescriptor {
 }
 
 impl FlightInfo {
-    /// Create a new [`FlightInfo`] that describes the access
-    /// coordinates for retrieval of a dataset.
-    pub fn new(

Review Comment:
   This is also technically a breaking change, but since all the fields on `FlightData` are `pub` anyways I don't think the old constructor was very useful (as above)



##########
arrow-flight/src/lib.rs:
##########
@@ -390,21 +391,51 @@ impl FlightData {
     /// See [`FlightDataEncoderBuilder`] for a higher level API to
     /// convert a stream of [`RecordBatch`]es to [`FlightData`]s
     ///
+    /// # Example:
+    ///
+    /// ```
+    /// # use bytes::Bytes;
+    /// # use arrow_flight::{FlightData, FlightDescriptor};
+    /// # fn encode_data() -> Bytes { Bytes::new() } // dummy data
+    /// // Get encoded Arrow IPC data:
+    /// let data_body: Bytes = encode_data();
+    /// // Create the FlightData message
+    /// let flight_data = FlightData::new()
+    ///   .with_descriptor(FlightDescriptor::new_cmd("the command"))
+    ///   .with_app_metadata("My apps metadata")
+    ///   .with_data_body(data_body)
+    /// ```
+    ///
     /// [`FlightDataEncoderBuilder`]: crate::encode::FlightDataEncoderBuilder
     /// [`RecordBatch`]: arrow_array::RecordBatch
-    pub fn new(
-        flight_descriptor: Option<FlightDescriptor>,
-        message: IpcMessage,
-        app_metadata: impl Into<Bytes>,
-        data_body: impl Into<Bytes>,
-    ) -> Self {
-        let IpcMessage(vals) = message;
-        FlightData {
-            flight_descriptor,
-            data_header: vals,
-            app_metadata: app_metadata.into(),
-            data_body: data_body.into(),
-        }
+    pub fn new() -> Self {

Review Comment:
   This is technically a breaking change, but since all the fields on `FlightData` are `pub` anyways I don't think the old constructor was very useful (it was not used in the arrow codebase, for what that is worth)



##########
arrow-flight/src/lib.rs:
##########
@@ -459,6 +511,51 @@ impl FlightInfo {
         let msg = IpcMessage(self.schema);
         msg.try_into()
     }
+
+    /// Specify the schema for the response.
+    ///
+    /// Note this takes the arrow [`Schema`] (not the IPC schema) and
+    /// encodes it using the default IPC options.
+    ///
+    /// Returns an error if `schema` can not be encoded into IPC form.
+    pub fn with_schema(mut self, schema: &Schema) -> ArrowResult<Self> {

Review Comment:
   This was one of the nastiest APIs to deal with (figuring out the right incantation to encode the schema)



##########
arrow-flight/src/lib.rs:
##########
@@ -390,21 +391,51 @@ impl FlightData {
     /// See [`FlightDataEncoderBuilder`] for a higher level API to
     /// convert a stream of [`RecordBatch`]es to [`FlightData`]s
     ///
+    /// # Example:
+    ///
+    /// ```
+    /// # use bytes::Bytes;
+    /// # use arrow_flight::{FlightData, FlightDescriptor};
+    /// # fn encode_data() -> Bytes { Bytes::new() } // dummy data
+    /// // Get encoded Arrow IPC data:
+    /// let data_body: Bytes = encode_data();
+    /// // Create the FlightData message
+    /// let flight_data = FlightData::new()
+    ///   .with_descriptor(FlightDescriptor::new_cmd("the command"))
+    ///   .with_app_metadata("My apps metadata")
+    ///   .with_data_body(data_body)
+    /// ```
+    ///
     /// [`FlightDataEncoderBuilder`]: crate::encode::FlightDataEncoderBuilder
     /// [`RecordBatch`]: arrow_array::RecordBatch
-    pub fn new(
-        flight_descriptor: Option<FlightDescriptor>,
-        message: IpcMessage,
-        app_metadata: impl Into<Bytes>,
-        data_body: impl Into<Bytes>,
-    ) -> Self {
-        let IpcMessage(vals) = message;
-        FlightData {
-            flight_descriptor,
-            data_header: vals,
-            app_metadata: app_metadata.into(),
-            data_body: data_body.into(),
-        }
+    pub fn new() -> Self {
+        Default::default()
+    }
+
+    /// Add a [`FlightDescriptor`] describing the data
+    pub fn with_descriptor(mut self, flight_descriptor: FlightDescriptor) -> Self {
+        self.flight_descriptor = Some(flight_descriptor);
+        self
+    }
+
+    /// Add a data header
+    pub fn with_data_header(mut self, data_header: impl Into<Bytes>) -> Self {
+        self.data_header = data_header.into();
+        self
+    }
+
+    /// Add a data body. See [`IpcDataGenerator`] to create this data.
+    ///
+    /// [`IpcDataGenerator`]: arrow_ipc::writer::IpcDataGenerator

Review Comment:
   Making a builder API gives a location to add some comments on what these fields should contain (the field description comes from the protobuf so is not Rust specific)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb merged pull request #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #4294:
URL: https://github.com/apache/arrow-rs/pull/4294


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4294:
URL: https://github.com/apache/arrow-rs/pull/4294#discussion_r1209248275


##########
arrow-flight/src/lib.rs:
##########
@@ -459,6 +511,51 @@ impl FlightInfo {
         let msg = IpcMessage(self.schema);
         msg.try_into()
     }
+
+    /// Specify the schema for the response.
+    ///
+    /// Note this takes the arrow [`Schema`] (not the IPC schema) and
+    /// encodes it using the default IPC options.
+    ///
+    /// Returns an error if `schema` can not be encoded into IPC form.
+    pub fn with_schema(mut self, schema: &Schema) -> ArrowResult<Self> {

Review Comment:
   That is a good idea -- I will make that change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] roeap commented on a diff in pull request #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

Posted by "roeap (via GitHub)" <gi...@apache.org>.
roeap commented on code in PR #4294:
URL: https://github.com/apache/arrow-rs/pull/4294#discussion_r1209206884


##########
arrow-flight/src/lib.rs:
##########
@@ -459,6 +511,51 @@ impl FlightInfo {
         let msg = IpcMessage(self.schema);
         msg.try_into()
     }
+
+    /// Specify the schema for the response.
+    ///
+    /// Note this takes the arrow [`Schema`] (not the IPC schema) and
+    /// encodes it using the default IPC options.
+    ///
+    /// Returns an error if `schema` can not be encoded into IPC form.
+    pub fn with_schema(mut self, schema: &Schema) -> ArrowResult<Self> {

Review Comment:
   Just minor, but personally, I like to make the the fact that its fallible explicit in the name.
   
   ```suggestion
       pub fn try_with_schema(mut self, schema: &Schema) -> ArrowResult<Self> {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #4294:
URL: https://github.com/apache/arrow-rs/pull/4294#issuecomment-1566981601

   cc @avantgardnerio  and @roeap  -- I wonder if you have time to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #4294:
URL: https://github.com/apache/arrow-rs/pull/4294#issuecomment-1568383036

   Thanks @roeap  and @avantgardnerio 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] avantgardnerio commented on a diff in pull request #4294: Add Builder style APIs and docs for `FlightData`,` FlightInfo`, `FlightEndpoint`, `Locaation` and `Ticket`

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on code in PR #4294:
URL: https://github.com/apache/arrow-rs/pull/4294#discussion_r1209439535


##########
arrow-flight/src/lib.rs:
##########
@@ -433,24 +464,45 @@ impl FlightDescriptor {
 }
 
 impl FlightInfo {
-    /// Create a new [`FlightInfo`] that describes the access
-    /// coordinates for retrieval of a dataset.
-    pub fn new(
-        message: IpcMessage,
-        flight_descriptor: Option<FlightDescriptor>,
-        endpoint: Vec<FlightEndpoint>,
-        total_records: i64,
-        total_bytes: i64,
-        ordered: bool,
-    ) -> Self {
-        let IpcMessage(vals) = message;
+    /// Create a new, empty `FlightInfo`, describing where to fetch flight data
+    ///
+    ///
+    /// # Example:
+    /// ```
+    /// # use arrow_flight::{FlightInfo, Ticket, FlightDescriptor, FlightEndpoint};
+    /// # use arrow_schema::{Schema, Field, DataType};
+    /// # fn get_schema() -> Schema {
+    /// #   Schema::new(vec![
+    /// #     Field::new("a", DataType::Utf8, false),
+    /// #   ])
+    /// # }
+    /// #
+    /// // Create a new FlightInfo
+    /// let flight_info = FlightInfo::new()
+    ///   // Encode the Arrow schema
+    ///   .with_schema(&get_schema())
+    ///   .expect("encoding failed")
+    ///   .with_descriptor(
+    ///      FlightDescriptor::new_cmd("a command")
+    ///   )
+    ///   .with_endpoint(
+    ///      FlightEndpoint::new()
+    ///        .with_ticket(Ticket::new("ticket contents")
+    ///      )
+    ///    )
+    ///   .with_descriptor(FlightDescriptor::new_cmd("RUN QUERY"));
+    /// ```
+    pub fn new() -> FlightInfo {
         FlightInfo {
-            schema: vals,
-            flight_descriptor,
-            endpoint,
-            total_records,
-            total_bytes,
-            ordered,
+            schema: Bytes::new(),
+            flight_descriptor: None,
+            endpoint: vec![],
+            ordered: false,
+            // Flight says "Set these to -1 if unknown."
+            //
+            // https://github.com/apache/arrow-rs/blob/17ca4d51d0490f9c65f5adde144f677dbc8300e7/format/Flight.proto#L287-L289

Review Comment:
   Love this doc link, thank you for that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org