You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "ravjotbrar (via GitHub)" <gi...@apache.org> on 2023/02/03 01:11:26 UTC

[GitHub] [arrow] ravjotbrar opened a new issue, #34016: [Python] Add client CookieMiddleware to pyarrow

ravjotbrar opened a new issue, #34016:
URL: https://github.com/apache/arrow/issues/34016

   ### Describe the enhancement requested
   
   We currently implement CookieMiddleware in our own Python Arrow Flight client to connect to a Dremio Flight endpoint. I plan to move this middleware into pyarrow, similar to the [client cookie middleware for java ](https://github.com/apache/arrow/blob/2ad89525173bbe3e03814eb4310b6e58fad5b989/java/flight/flight-core/src/main/java/org/apache/arrow/flight/client/ClientCookieMiddleware.java#L44). 
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1415955065

   Ideally we would just provide Python bindings to the C++ implementation instead of having the work duplicated 3x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1453639238

   If it's completely application layer: then it probably doesn't belong in the core library at all.
   
   Most/all of the Flight implementation is already in C++, so there's not really a difference there.
   
   There's not really much maintainer time to take on multiple implementations of everything. That is my primary worry. If Dremio is willing to help with maintenance here, then I think having the Python implementation is fine. If it goes unmaintained and unresolved issues arise, then it may get deprecated instead. (Though I suppose the middleware should be fairly trivial, since you can take advantage of the standard library, so that may not be a worry in the end.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1423312385

   Just a quick response, but the Python middleware is the one that incurs FFI overhead on every call, not the C++ one. 
   
   On Wed, Feb 8, 2023, at 17:09, Paul Nienaber wrote:
   > 
   > 
   > @lidavidm <https://github.com/lidavidm> I've been working with @ravjotbrar <https://github.com/ravjotbrar> and I can't see that implementing the middleware callback contract across an FFI is a good option here. My thinking here:
   > 
   >  • It's an application state management helper, and managing the state behind an FFI other than performance reasons doesn't make a lot of sense—and crossing the FFI extra times is more likely a performance hit than a benefit.
   >  • It would save duplication of the core implementation code, however the FFI definitions would increase complexity of maintenance.
   >  • The middleware callback interface is already in Python (if there's ever a performance argument it might make sense to allow two types of middleware objects where the Flight implementation detects whether there's a pure-C++ one and dynamically wires in the callbacks that way), so using C++ code for this small component would make extension or modification by Python app authors much more difficult.
   >  • Even without the goal of modifying the included middleware implementation, Python app authors could easily want to simply look at the code to understand its behaviour in more detail, and reusing the C++ implementation would again present a significant barrier to users here.
   >  • Arrow is a multi-language set of bindings/SDK. Normally all of the bindings would be written in native languages, which would mean code duplication, but as it stands some interpreted languages are wrapped around the C++ implementation for performance reasons alone. But expecting to reduce code/implementation duplication across multiple supported languages as a goal in and of itself is not usually something to expect in such a multi-language library implementation.
   > Perhaps I am missing something further about why it would in fact be for the best to use the C++ impl anyways?
   > 
   > 
   > —
   > Reply to this email directly, view it on GitHub <https://github.com/apache/arrow/issues/34016#issuecomment-1423303637>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACQB33FSP7D34VMWT2QFG3WWQKRBANCNFSM6AAAAAAUPWGM2I>.
   > You are receiving this because you were mentioned.Message ID: ***@***.***>
   > 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1423315187

   > The middleware callback interface is already in Python (if there's ever a performance argument it might make sense to allow two types of middleware objects where the Flight implementation detects whether there's a pure-C++ one and dynamically wires in the callbacks that way), so using C++ code for this small component would make extension or modification by Python app authors much more difficult.
   
   Basically, I'm confused here. The middleware interface is in C++. Python middleware is actually a special C++ middleware that makes FFI calls back into Python. What I'm saying is you can mostly avoid that dance: just allow directly adding the C++ cookie middleware to the client from Python.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] indigophox commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "indigophox (via GitHub)" <gi...@apache.org>.
indigophox commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1453050734

   The middleware is part of the application-level connection state logic (albeit a helper).  Arrow is explicitly the transport below this, so the middleware logic is part of the application.  Accordingly a Python app developer should have ownership of this data and behaviour, but if we wrap the C++ black box and save the FFI hop, then:  How will a Python developer understand (reading the code), debug, extend, learn from, etc. the provided middleware class?
   
   This does not seem to me to provide a good developer experience for Python folks, while also creating more complicated maintenance around a dummy Python/Cython object that purports to implement the Python middleware class (which is already a defined interface) but then simply injects a native C++ class in a fairly nonstandard way.
   
   As before I can see providing this type of native-code middleware binding as an option if it's seen to be a performance bottleneck for some high TPS Arrow users' use cases, but in any other circumstance (and as a default implementation) it seems to be a detriment to the Python app developer's experience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1423313445

   Also: if Dremio is looking for a Python client, I thought 1) Dremio is using Flight SQL and 2) there is a Flight SQL client for Python now: https://github.com/apache/arrow-adbc/tree/main/python/adbc_driver_flightsql


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ravjotbrar commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "ravjotbrar (via GitHub)" <gi...@apache.org>.
ravjotbrar commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1416525279

   That's a fair point. Will look into it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] indigophox commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "indigophox (via GitHub)" <gi...@apache.org>.
indigophox commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1423303637

   @lidavidm I've been working with @ravjotbrar and I can't see that implementing the middleware callback contract across an FFI is a good option here.  My thinking here:
   
   * It's an application state management helper, and managing the state behind an FFI other than performance reasons doesn't make a lot of sense—and crossing the FFI extra times is more likely a performance hit than a benefit.
   * It would save duplication of the core implementation code, however the FFI definitions would increase complexity of maintenance.
   * The middleware callback interface is already in Python (if there's ever a performance argument it might make sense to allow two types of middleware objects where the Flight implementation detects whether there's a pure-C++ one and dynamically wires in the callbacks that way), so using C++ code for this small component would make extension or modification by Python app authors much more difficult.
   * Even without the goal of modifying the included middleware implementation, Python app authors could easily want to simply look at the code to understand its behaviour in more detail, and reusing the C++ implementation would again present a significant barrier to users here.
   * Arrow is a multi-language set of bindings/SDK.  Normally all of the bindings would be written in native languages, which would mean code duplication, but as it stands some interpreted languages are wrapped around the C++ implementation for performance reasons alone.  But expecting to reduce code/implementation duplication across multiple supported languages as a goal in and of itself is not usually something to expect in such a multi-language library implementation.
   
   Perhaps I am missing something further about why it would in fact be for the best to use the C++ impl anyways?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1423313713

   I've tested this against dremio (OSS), would that solve your higher-level goal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] aiguofer commented on issue #34016: [Python] Add client CookieMiddleware to pyarrow

Posted by "aiguofer (via GitHub)" <gi...@apache.org>.
aiguofer commented on issue #34016:
URL: https://github.com/apache/arrow/issues/34016#issuecomment-1567766045

   I've been playing around with the Python ADBC driver and noticed the cookies the the server was trying to set weren't being set by the client. I kind of assumed all Flight clients respected cookies, and would love to have that be the case. 
   
   The Python ADBC driver is interesting because it's not actually using the Python FlightClient, but instead it's using bindings into the Go ADBC driver, which uses the Go FlightClient. It seems neither of those currently have a Client Cookie Middleware. What's a good path forward to track this work? It seems like the Client cookie middleware implementations should live in this repo (like the java one does), and the Flight Clients created by the ADBC drivers should make sure to use them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org