You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "aiguofer (via GitHub)" <gi...@apache.org> on 2023/03/31 17:17:03 UTC

[GitHub] [arrow] aiguofer opened a new issue, #34829: [Java] Ability to use a path based host when using the JDBC driver

aiguofer opened a new issue, #34829:
URL: https://github.com/apache/arrow/issues/34829

   ### Describe the enhancement requested
   
   From an infrastructure perspective, it would be easier for us to do path based routing for our Flight server instead of sub-domain based routing. For example, `https://mydomain.com/grpc/flight`. A model like this would allow us to create routing rules for all grpc services we expose within our company using shared architecture. 
   
   Looking around at `getUrlsArgs` , there seems to be no use for the `path` part of the JDBC URI, so I see a few ways this could be implemented:
   
   ```
   jdbc:arrow-flight-sql://mydomain.com/grpc/flight:443
   jdbc:arrow-flight-sql://https://mydomain.com/grpc/flight:443
   jdbc:arrow-flight-sql://mydomain.com:443/grpc/flight
   jdbc:arrow-flight-sql://https://mydomain.com:443/grpc/flight
   ```
   
   This seems to be fairly uncommon, but there's at least one other driver that does this. For example, the Simba BigQuery connection looks like this:
   
   ```
   jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<Your Project ID>;OAuthType=0;OAuthServiceAcctEmail=<Service Email Address>;OAuthPvtKeyPath=<Path to Key File>;
   ```
   
   I'm not sure if this is already supported by the `Avatica` driver, so it may also need some changes there.
   
   ### Component(s)
   
   Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34829: [Java] Ability to use a path based host when using the JDBC driver

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34829:
URL: https://github.com/apache/arrow/issues/34829#issuecomment-1512168860

   > I think this still leaves the option of something like:
   > 
   > ```
   > jdbc:arrow-flight-sql://mydomain.com/grpc/flight_service:443/catalog/schema/hierarchy
   > ```
   > 
   > which is similar to how the BigQuery driver does it.
   
   Is that a valid URI? At least, I gave it a quick spin with JS and Python and neither can parse out the actual port, host, etc. anymore.
   
   The other thing is that this is explicitly not something gRPC wants to support, though it appears that gRPC Java at least happens to be OK with it right now (c.f. https://github.com/grpc/grpc-dotnet/issues/110 and https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#requests)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34829: [Java] Ability to use a path based host when using the JDBC driver

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34829:
URL: https://github.com/apache/arrow/issues/34829#issuecomment-1492401288

   Interesting, as far as I was aware, gRPC wouldn't pass on the path to the server - I suppose it does then? (So connecting to `grpc://foo:1234/bar/baz` and calling `HelloWorldService.HelloWorld` would result in the gRPC server at `grpc://foo:1234` getting - and rejecting - a request for `/bar/baz/HelloWorldService/HelloWorld`, unless you have a proxy to rewrite the URL?)
   
   The catalog support would be client-side from my understanding of the proposal, so it wouldn't interact well with that indeed. Possibly we should use query parameters for that instead?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] aiguofer commented on issue #34829: [Java] Ability to use a path based host when using the JDBC driver

Posted by "aiguofer (via GitHub)" <gi...@apache.org>.
aiguofer commented on issue #34829:
URL: https://github.com/apache/arrow/issues/34829#issuecomment-1492385325

   Ahhh I figured using the path for catalog/schema/database might be a thing in the future.
   
   In our case we're using `nginx` to do our routing. Using path based routing allows us set up centralized rules for HTTP2 and various gRPC services that we're exposing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34829: [Java] Ability to use a path based host when using the JDBC driver

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34829:
URL: https://github.com/apache/arrow/issues/34829#issuecomment-1513879797

   Well from your description, the server side wouldn't matter because it never sees the extra path in the URI right? It's only the client side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34829: [Java] Ability to use a path based host when using the JDBC driver

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34829:
URL: https://github.com/apache/arrow/issues/34829#issuecomment-1492402290

   You may want to chime in here: https://lists.apache.org/thread/fd6r1n7vt91sg2c7fr35wcrsqz6x4645


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34829: [Java] Ability to use a path based host when using the JDBC driver

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34829:
URL: https://github.com/apache/arrow/issues/34829#issuecomment-1492319382

   There is a proposal to use the path to specify the active catalog/schema on connection, so that may not work in the future unfortunately. That is interesting though, I wonder if there's a good way to let both work or to give you something to route on. 
   
   gRPC doesn't really use the path in its connections, so I'm guessing there needs to be something client-side rewriting the URLs anyways?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] aiguofer commented on issue #34829: [Java] Ability to use a path based host when using the JDBC driver

Posted by "aiguofer (via GitHub)" <gi...@apache.org>.
aiguofer commented on issue #34829:
URL: https://github.com/apache/arrow/issues/34829#issuecomment-1512018271

   Hey, sorry been pulled in other directions recently. In our case, we're looking to have a reverse proxy in front of the gRPC server which would simply use the path based approach to know which gRPC server to forward the request to. For example calling `arrow.flight.protocol.FlightService.Handshake` to `mydomain.com/grpc/flight_service` would forward the request to our `FlightServer`, while a `MessageService.SendMessage` request to `mydomain.com/grpc/messaging` would be forwarded to our Messaging service. The path would have no other meaning other than letting the reverse proxy know what service to hit.
   
   I looked over that discussion and it seems like it's leaning towards something like:
   
   ```
   jdbc:arrow-flight-sql://mydomain.com:443/catalog/schema/hierarchy
   ```
   
   I think this still leaves the option of something like:
   
   ```
   jdbc:arrow-flight-sql://mydomain.com/grpc/flight_service:443/catalog/schema/hierarchy
   ```
   
   which is similar to how the BigQuery driver does it.
   
   It's also possible that something like
   ```
   jdbc:arrow-flight-sql://mydomain.com:443/catalog/schema/hierarchy?urlPath=/grpc/flight_service
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] aiguofer commented on issue #34829: [Java] Ability to use a path based host when using the JDBC driver

Posted by "aiguofer (via GitHub)" <gi...@apache.org>.
aiguofer commented on issue #34829:
URL: https://github.com/apache/arrow/issues/34829#issuecomment-1513641415

   > Is that a valid URI? At least, I gave it a quick spin with JS and Python and neither can parse out the actual port, host, etc. anymore.
   
   It is not. The "host" is a valid URI, but the entire string is not. In order for that to work, the current parsing in the JDBC driver would have to change to first extract the host and then parse the rest of the URI.
   
   > The other thing is that this is explicitly not something gRPC wants to support, though it appears that gRPC Java at least happens to be OK with it right now (c.f. https://github.com/grpc/grpc-dotnet/issues/110 and https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#requests)
   
   That's interesting... so it seems like they don't want to support this on the client side? As I understand it, it's currently doable on the server side but not on the client side? That's definitely a blocker for us if they don't plan to support that in gRPC clients. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] aiguofer commented on issue #34829: [Java] Ability to use a path based host when using the JDBC driver

Posted by "aiguofer (via GitHub)" <gi...@apache.org>.
aiguofer commented on issue #34829:
URL: https://github.com/apache/arrow/issues/34829#issuecomment-1514124319

   Yeah exactly. It seems like they wanted to do something similar to what we hoped to do in https://github.com/grpc/grpc-dotnet/issues/110#issuecomment-1039219134 (in our case, http and grpc services are on different hosts, but we want to simplify routing rules for different services). However, if this isn't well supported on the client side then it's probably not a good path forward. Ideally, people could implement their own FlightClient, or use ADBC to interact with our service in the future if they can't, or don't want to, use the JDBC driver.
   
   It probably doesn't make sense to add this functionality to the JDBC driver if it can't be added to all the other options.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org