You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "nitesh-sinha (via GitHub)" <gi...@apache.org> on 2023/09/28 07:21:58 UTC

[GitHub] [arrow] nitesh-sinha opened a new issue, #37925: [Python]: Arrow Flight SQL server communication issue with JDBC Arrow FlightSQL driver

nitesh-sinha opened a new issue, #37925:
URL: https://github.com/apache/arrow/issues/37925

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   Hello,
   
   I'm trying to build an Arrow Flight SQL server(which wraps DuckDB querying parquet files) in Python. I've implemented the handler methods defined in pyarrow [FlightServerBase class](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightServerBase.html#pyarrow.flight.FlightServerBase) and testing it with a Dbeaver client(loaded with[ JDBC driver for Arrow Flight SQL](https://www.dremio.com/drivers/jdbc/)). However even though the client connects successfully with the server, it is unable to read any of the data sent back from the server. I'm suspecting it might be due to the RecordBatch structure? After a lot of reading up the docs, I've tried various ways of creating the RecordBatch with no luck. 
   
   For debugging simplicity I hand-wrote the following RecordBatch to be sent for a DoGet RPC call(with CommandGetSqlInfo command) in the Ticket. Can someone help point out any errors in this?
   
   ```
   def do_get_sql_info(self, context: flight.ServerCallContext, cmd: sqlPb.CommandGetSqlInfo) -> flight.FlightDataStream:
           sql_info_metadata = [
               {"info_name": "0", "value": "db_name"},
               {"info_name": "1", "value": "duckdb"},
           ]
   
           schema = pa.schema([
               pa.field("info_name", pa.uint32()),
               pa.field("value", pa.dense_union([
                   pa.field("string_value", pa.string()),
                   pa.field("bool_value", pa.bool_()),
                   pa.field("bigint_value", pa.int64()),
                   pa.field("int32_bitmask", pa.int32()),
                   pa.field("string_list", pa.list_(pa.string())),
                   pa.field("int32_to_int32_list_map", pa.map_(pa.int32(), pa.list_(pa.int32())))
               ]))
           ])
           batch = pa.RecordBatch.from_pandas(pd.DataFrame(sql_info_metadata), schema=schema)
           return flight.FlightDataStream(batch)
   ```
   
   The client is unable to read the DB name as `duckdb`, instead it just prints `??` 
   
   Note: 
   - I'm using the [C++ Flight SQL server ](https://github.com/apache/arrow/blob/15a8ac3ce4e3ac31f9f361770ad4a38c69102aa1/cpp/src/arrow/flight/sql/server.cc#L956) as reference. They seem to be using Builders to build the SqlInfoResult but I could not find its equivalent in Pyarrow.
   - I have checked Arrow Flight Python example server [here](https://github.com/apache/arrow/blob/aca1d3eeed3775c2f02e9f5d59d62478267950b1/python/examples/flight/server.py) but it feels too simplistic and does not cover Flight SQL usecase. 
   - Also tried to check what the client driver code expects [here](https://github.com/apache/arrow/blob/aca1d3eeed3775c2f02e9f5d59d62478267950b1/java/flight/flight-sql-jdbc-core/src/main/java/org/apache/arrow/driver/jdbc/client/ArrowFlightSqlClientHandler.java#L98) but its not too clear to me. 
   
   Appreciate some pointers on this. Thanks!
   
   ### Component(s)
   
   FlightRPC, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python]: Arrow Flight SQL server communication issue with JDBC Arrow FlightSQL driver [arrow]

Posted by "jduo (via GitHub)" <gi...@apache.org>.
jduo commented on issue #37925:
URL: https://github.com/apache/arrow/issues/37925#issuecomment-1779966754

   Hi @nitesh-sinha ,
   The info_name field is being misused. It should be an integer, rather than a string. The integer should have the same value as an entry in the SqlInfo enum (https://github.com/apache/arrow/blob/e8360615adf6c5a9bb76b81267d08388c7cfc3a9/format/FlightSql.proto#L76).
   
   A tool such as DBeaver may require many more SqlInfo properties to be implemented since it can call getDatabaseMetaData()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python]: Arrow Flight SQL server communication issue with JDBC Arrow FlightSQL driver [arrow]

Posted by "shivarajugowda (via GitHub)" <gi...@apache.org>.
shivarajugowda commented on issue #37925:
URL: https://github.com/apache/arrow/issues/37925#issuecomment-1899094517

   Similar request: https://github.com/apache/arrow/issues/37700


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python]: Arrow Flight SQL server communication issue with JDBC Arrow FlightSQL driver [arrow]

Posted by "shivarajugowda (via GitHub)" <gi...@apache.org>.
shivarajugowda commented on issue #37925:
URL: https://github.com/apache/arrow/issues/37925#issuecomment-1899077371

   May be a reference Flight SQL Server implementation in Python just like the ones available in Java and C++ would help. Is there any such thing in roadmap?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python]: Arrow Flight SQL server communication issue with JDBC Arrow FlightSQL driver [arrow]

Posted by "nitesh-sinha (via GitHub)" <gi...@apache.org>.
nitesh-sinha commented on issue #37925:
URL: https://github.com/apache/arrow/issues/37925#issuecomment-1815787320

   Sorry for the late response James and thanks for your inputs. Arrow Flight
   work is currently paused at the moment due to other priorities. I will
   validate your observations when I pick it up.
   
   Thanks
   Nitesh
   
   On Thu, Oct 26, 2023 at 1:31 AM James Duong ***@***.***>
   wrote:
   
   > Hi @nitesh-sinha <https://github.com/nitesh-sinha> ,
   > The info_name field is being misused. It should be an integer, rather than
   > a string. The integer should have the same value as an entry in the SqlInfo
   > enum (
   > https://github.com/apache/arrow/blob/e8360615adf6c5a9bb76b81267d08388c7cfc3a9/format/FlightSql.proto#L76
   > ).
   >
   > A tool such as DBeaver may require many more SqlInfo properties to be
   > implemented since it can call getDatabaseMetaData()
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/issues/37925#issuecomment-1779966754>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ABXVVF5ITIYGJD57QODRGADYBFV2BAVCNFSM6AAAAAA5KP2CZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZHE3DMNZVGQ>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org