You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/24 19:44:06 UTC

[GitHub] [arrow-ballista] DaltonModlin opened a new pull request, #442: Add optional flag which advertises host for Arrow Flight SQL #418

DaltonModlin opened a new pull request, #442:
URL: https://github.com/apache/arrow-ballista/pull/442

   - Update scheduler_config_spec.toml to include new flag 'advertise_host'
   - Add advertise_host member variable to SchedulerServer
   - Add advertise_host argument to new, new_with_policy, new_with_builder, and new_with_state in order to propagate flag
   - Add None argument to relevant method calls
   
   Add optional flag which advertises host for Arrow Flight SQL #418
   
   - Update logic in job_to_fetch_part to use advertise-host flag when it exists
   - Remove default from advertise_host in scheduler_config_spec.toml
   - Wrap scheduler_server advertise_host variable in Option
   - Update scheduler's main.rs to reflect advertise_host being wrapped in Option
   
   Utilize executor IP for routing to flight results in job_to_fetch_part even when advertise-host flag is set.
   
   Add missing variable and ownership stuff
   
   Remove unnecessary output from do_get_fallback
   
   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#issuecomment-1292734782

   @DaltonModlin There is a clippy failure .. I will merge once that is resolved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] DaltonModlin commented on pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
DaltonModlin commented on PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#issuecomment-1289600567

   I've validated my changes have the expected outcome via writing a test in the flight portion of the arrow repo.
   ```
   @Test
   public void testBallistaFlag() throws Exception {
       String url = "jdbc:arrow-flight://127.0.0.0:50050";
       java.util.Properties props = new java.util.Properties();
       props.setProperty("useEncryption", "false");
       props.setProperty("user", "admin");
       props.setProperty("password", "password");
       Connection con = DriverManager.getConnection(url, props);
       // statement
       java.sql.Statement stmt = con.createStatement();
       String sql = "create external table customer STORED AS CSV WITH HEADER ROW LOCATION 'arrow-datafusion/datafusion/core/tests/tpch-csv/customer.csv';\n";
       assertTrue(stmt.execute(sql));
   }
   ```
   Running this test in conjunction with a local scheduler (having passing it `--advertise-host="127.0.0.2:50050"`) and a local executor shows the request for flight results is proxied via the Scheduler rather than going directly to the executor IP.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#discussion_r1004511298


##########
ballista/scheduler/scheduler_config_spec.toml:
##########
@@ -24,6 +24,11 @@ conf_file_param = "config_file"
 name = "version"
 doc = "Print version of this executable"
 
+[[param]]
+name = "advertise_host"

Review Comment:
   How about `advertise_endpoint` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove merged pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
andygrove merged PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#issuecomment-1297556723

   @DaltonModlin Looks like there is conflict?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] DaltonModlin commented on pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
DaltonModlin commented on PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#issuecomment-1293908731

   It looks like clippy is worried because there are 8 argument to the function `new_with_policy`. It might be worth using a Builder pattern to construct the SchedulerServer rather than the current pattern of many slightly differing constructors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#discussion_r1005699055


##########
ballista/scheduler/src/flight_sql.rs:
##########
@@ -196,26 +196,48 @@ impl FlightSqlServiceImpl {
     ) -> Result<Vec<FlightEndpoint>, Status> {
         let mut fieps: Vec<_> = vec![];
         for loc in completed.partition_location.iter() {
-            let (host, port) = if let Some(ref md) = loc.executor_meta {
+            let (exec_host, exec_port) = if let Some(ref md) = loc.executor_meta {
                 (md.host.clone(), md.port)
             } else {
                 Err(Status::internal(
-                    "Invalid partition location, missing executor metadata".to_string(),
+                    "Invalid partition location, missing executor metadata and advertise_endpoint flag is undefined.".to_string(),
                 ))?
             };
+
+            let (host, port) = match self.server.advertise_endpoint {
+                Some(_) => {
+                    let advertise_endpoint_vec: Vec<&str> = self
+                        .server
+                        .advertise_endpoint
+                        .as_ref()
+                        .expect("Failed to parse advertise_endpoint value")

Review Comment:
   The endpoint is already available from the pattern matching so the `expect` code is unreachable. This can be simplified:
   
   ```suggestion
               let (host, port) = match &self.server.advertise_endpoint {
                   Some(endpoint) => {
                       let advertise_endpoint_vec: Vec<&str> = endpoint
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on a diff in pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#discussion_r1006225399


##########
ballista/scheduler/src/flight_sql.rs:
##########
@@ -196,26 +196,41 @@ impl FlightSqlServiceImpl {
     ) -> Result<Vec<FlightEndpoint>, Status> {
         let mut fieps: Vec<_> = vec![];
         for loc in completed.partition_location.iter() {
-            let (host, port) = if let Some(ref md) = loc.executor_meta {
+            let (exec_host, exec_port) = if let Some(ref md) = loc.executor_meta {
                 (md.host.clone(), md.port)
             } else {
                 Err(Status::internal(
-                    "Invalid partition location, missing executor metadata".to_string(),
+                    "Invalid partition location, missing executor metadata and advertise_endpoint flag is undefined.".to_string(),
                 ))?
             };
+
+            let (host, port) = match &self.server.advertise_endpoint {
+                Some(endpoint) => {
+                    let advertise_endpoint_vec: Vec<&str> = endpoint.split(":").collect();
+                    match advertise_endpoint_vec.as_slice() {
+                        [host_ip, port] => {

Review Comment:
   :heart: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on a diff in pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#discussion_r1003762368


##########
ballista/scheduler/src/flight_sql.rs:
##########
@@ -196,26 +196,41 @@ impl FlightSqlServiceImpl {
     ) -> Result<Vec<FlightEndpoint>, Status> {
         let mut fieps: Vec<_> = vec![];
         for loc in completed.partition_location.iter() {
-            let (host, port) = if let Some(ref md) = loc.executor_meta {
+
+            let (exec_host, exec_port) = if let Some(ref md) = loc.executor_meta {
                 (md.host.clone(), md.port)
             } else {
                 Err(Status::internal(
-                    "Invalid partition location, missing executor metadata".to_string(),
+                    "Invalid partition location, missing executor metadata and advertise_host flag is undefined.".to_string(),
                 ))?
             };
+
+            let (host, port) = match self.server.advertise_host {
+                Some(_) => {
+                    let advertise_host_flag: Vec<&str> = self
+                        .server
+                        .advertise_host
+                        .as_ref()
+                        .unwrap()

Review Comment:
   I think we want to avoid `unwrap()`s as a rule, but especially in functions that already return `Result`s...



##########
ballista/scheduler/scheduler_config_spec.toml:
##########
@@ -24,6 +24,11 @@ conf_file_param = "config_file"
 name = "version"
 doc = "Print version of this executable"
 
+[[param]]
+name = "advertise_host"

Review Comment:
   I love the use of the word `advertise` as that is what it is commonly called. I dislike the word `host`, because this is hostname & port. Sadly the term for that appears to be "socket", which I think no one uses outside OS libraries https://stackoverflow.com/questions/34955747/whats-the-combination-of-an-ip-address-and-port-number-called



##########
ballista/scheduler/src/flight_sql.rs:
##########
@@ -196,26 +196,41 @@ impl FlightSqlServiceImpl {
     ) -> Result<Vec<FlightEndpoint>, Status> {
         let mut fieps: Vec<_> = vec![];
         for loc in completed.partition_location.iter() {
-            let (host, port) = if let Some(ref md) = loc.executor_meta {
+
+            let (exec_host, exec_port) = if let Some(ref md) = loc.executor_meta {
                 (md.host.clone(), md.port)
             } else {
                 Err(Status::internal(
-                    "Invalid partition location, missing executor metadata".to_string(),
+                    "Invalid partition location, missing executor metadata and advertise_host flag is undefined.".to_string(),
                 ))?
             };
+
+            let (host, port) = match self.server.advertise_host {
+                Some(_) => {
+                    let advertise_host_flag: Vec<&str> = self
+                        .server
+                        .advertise_host
+                        .as_ref()
+                        .unwrap()
+                        .split(":")
+                        .collect();
+                    (advertise_host_flag[0].to_string(), advertise_host_flag[1].parse().unwrap())

Review Comment:
   Same... I think there is a safe way to index into the array: https://adventures.michaelfbryan.com/posts/daily/slice-patterns/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#issuecomment-1289607978

   > flight results is proxied
   
   I'm actually a little surprised that query returns a flight since it is DML. 
   
   @andygrove I think we'll want to merge this PR because we presently rely on a bug in the FlightSQL JDBC driver where Ballista only works because it always re-uses the connection to the scheduler. This PR should make it work even after that issue is fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#issuecomment-1290593104

   Approach LGTM. Agree on the unwraps. I would like to test this out as well before approving. I'll try and get that to that today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #442: Add optional flag which advertises host for Arrow Flight SQL #418

Posted by GitBox <gi...@apache.org>.
andygrove commented on code in PR #442:
URL: https://github.com/apache/arrow-ballista/pull/442#discussion_r1005699055


##########
ballista/scheduler/src/flight_sql.rs:
##########
@@ -196,26 +196,48 @@ impl FlightSqlServiceImpl {
     ) -> Result<Vec<FlightEndpoint>, Status> {
         let mut fieps: Vec<_> = vec![];
         for loc in completed.partition_location.iter() {
-            let (host, port) = if let Some(ref md) = loc.executor_meta {
+            let (exec_host, exec_port) = if let Some(ref md) = loc.executor_meta {
                 (md.host.clone(), md.port)
             } else {
                 Err(Status::internal(
-                    "Invalid partition location, missing executor metadata".to_string(),
+                    "Invalid partition location, missing executor metadata and advertise_endpoint flag is undefined.".to_string(),
                 ))?
             };
+
+            let (host, port) = match self.server.advertise_endpoint {
+                Some(_) => {
+                    let advertise_endpoint_vec: Vec<&str> = self
+                        .server
+                        .advertise_endpoint
+                        .as_ref()
+                        .expect("Failed to parse advertise_endpoint value")

Review Comment:
   The endpoint is already available from the pattern matching and the `expect` code is unreachable. This can be simplified:
   
   ```suggestion
               let (host, port) = match &self.server.advertise_endpoint {
                   Some(endpoint) => {
                       let advertise_endpoint_vec: Vec<&str> = endpoint
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org