You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Cai-Yao (via GitHub)" <gi...@apache.org> on 2023/05/26 07:04:48 UTC

[GitHub] [arrow] Cai-Yao opened a new issue, #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Cai-Yao opened a new issue, #35777:
URL: https://github.com/apache/arrow/issues/35777

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   In the server
   ```
   public class FlightsqlServer {
       private static final Logger LOG = LogManager.getLogger(FlightsqlServer.class);
       private final FlightServer flightServer;
       private volatile boolean running;
   
       public FlightsqlServer(int port, ConnectScheduler connectScheduler) {
           BufferAllocator allocator = new RootAllocator(100000);
           Location location = Location.forGrpcInsecure("0.0.0.0", port);
           FlightsqlProducer producer = new FlightsqlProducer();
           flightServer = FlightServer.builder(allocator, location, producer).build();
       }
   
       // start Flightsql protocol service
       // return true if success, otherwise false
       public boolean start() {
           try {
               flightServer.start();
               LOG.info("Flightsql network service is started.");
           } catch (IOException e) {
               LOG.warn("Open Flightsql network service failed.", e);
               return false;
           }
           return true;
       }
   
       public void stop() {
           if (running) {
               running = false;
               // close server channel, make accept throw exception
               try {
                   flightServer.close();
               } catch (InterruptedException e) {
                   LOG.warn("close server channel failed.", e);
               }
           }
       }
   }
   ```
   The client code:
   ```
   public class FlightSQLJDBC {
      // JDBC driver name and database URL
      static final String DB_URL = "jdbc:arrow-flight-sql://localhost:9040/clickbench?useServerPrepStmts=false&useSSL=false";
   
      //  Database credentials
      static final String USER = "root";
      static final String PASS = "";
   
      public static void main(String[] args) {
      Connection conn = null;
      Statement stmt = null;
      PreparedStatement pre_stmt = null;
      try{
   
         //STEP 3: Open a connection
         System.out.println("Connecting to database...");
         conn = DriverManager.getConnection(DB_URL, USER, PASS);
   
         //STEP 4: Execute a query
         System.out.println("Creating statement...");
         String sql = "select * from hits limit 500000";
         stmt = conn.createStatement();
         ResultSet rs = stmt.executeQuery(sql);
         int rowNum = rs.getRow();
         System.out.println(rowNum);
         //STEP 6: Clean-up environment
         rs.close();
         stmt.close();
         conn.close();
   ......
   ```
   The connect error:
   ```
   java.sql.SQLException: cfjd.org.apache.arrow.flight.FlightRuntimeException: UNAVAILABLE: io exception
   Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
   	at org.apache.arrow.driver.jdbc.client.ArrowFlightSqlClientHandler$Builder.build(ArrowFlightSqlClientHandler.java:586)
   	at org.apache.arrow.driver.jdbc.ArrowFlightConnection.createNewClientHandler(ArrowFlightConnection.java:109)
   	at org.apache.arrow.driver.jdbc.ArrowFlightConnection.createNewConnection(ArrowFlightConnection.java:88)
   	at org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver.connect(ArrowFlightJdbcDriver.java:85)
   	at org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver.connect(ArrowFlightJdbcDriver.java:49)
   	at java.sql.DriverManager.getConnection(DriverManager.java:664)
   	at java.sql.DriverManager.getConnection(DriverManager.java:247)
   	at FlightSQLJDBC.main(FlightSQLJDBC.java:22)
   Caused by: cfjd.org.apache.arrow.flight.FlightRuntimeException: UNAVAILABLE: io exception
   Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
   	at cfjd.org.apache.arrow.flight.CallStatus.toRuntimeException(CallStatus.java:131)
   	at cfjd.org.apache.arrow.flight.grpc.StatusUtils.fromGrpcRuntimeException(StatusUtils.java:164)
   	at cfjd.org.apache.arrow.flight.grpc.StatusUtils.fromThrowable(StatusUtils.java:185)
   	at cfjd.org.apache.arrow.flight.auth2.ClientHandshakeWrapper.doClientHandshake(ClientHandshakeWrapper.java:59)
   	at cfjd.org.apache.arrow.flight.FlightClient.handshake(FlightClient.java:210)
   	at org.apache.arrow.driver.jdbc.client.utils.ClientAuthenticationUtils.getAuthenticate(ClientAuthenticationUtils.java:105)
   	at org.apache.arrow.driver.jdbc.client.utils.ClientAuthenticationUtils.getAuthenticate(ClientAuthenticationUtils.java:93)
   	at org.apache.arrow.driver.jdbc.client.ArrowFlightSqlClientHandler$Builder.build(ArrowFlightSqlClientHandler.java:575)
   	... 7 more
   Caused by: cfjd.io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 00001204000000000000037fffffff000400100000000600002000000004080000000000000f0001
   	at cfjd.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1215)
   	at cfjd.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285)
   	at cfjd.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:519)
   	at cfjd.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:458)
   	at cfjd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:280)
   	at cfjd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
   	at cfjd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
   	at cfjd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
   	at cfjd.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
   	at cfjd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
   	at cfjd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
   	at cfjd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
   	at cfjd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
   	at cfjd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
   	at cfjd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
   	at cfjd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
   	at cfjd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
   	at cfjd.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
   	at cfjd.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   	at cfjd.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   	at java.lang.Thread.run(Thread.java:748)
   ```
   How should I solve it? Is there something wrong with the server?
   
   ### Component(s)
   
   FlightRPC, Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] xinyiZzz commented on issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1573379172

   @lidavidm thanks
   We have successfully used ADBC in Python to connect to the flight sql server in Apache Doris, and got better performance.
   
   fetchall and convert to Pandas DataFrame, 10x faster than `pymysql` and `pandas.read_sql`.
   
   Next, I will continue to explore how to make Apache Doris better support ADBC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] xinyiZzz commented on issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1719061018

   > FWIW, ADBC exposes a direct `fetch_df` method so you can avoid having to fetch and then convert. (Under the hood it will use PyArrow to convert from Arrow to Pandas, and PyArrow has heavily optimized this path over the years.)
   
   Cool! , this avoids the cost of `fetchall` and `to_pandas`, in my tests, the performance increased to 5 times
   
   `select URL from clickbench.hits where URL!=''  limit 10000000`
   - flightsql, 19.0s
     -  execute SQL cost, 3.6s
     - fetchall cost, 13.1s
     - to_pandas cost, 2.2s
   - flightsql-fetch_df, 3.9s
     -  execute SQL cost, 3.6s
     - fetch_df cost, 0.3s


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Cai-Yao commented on issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "Cai-Yao (via GitHub)" <gi...@apache.org>.
Cai-Yao commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1573416854

   > It's not really written as an example, but the tests use a server based on Apache Derby: https://github.com/apache/arrow/blob/main/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java
   > 
   > For Python: the Python code will try to create a prepared statement (this is to fit DBAPI, IIRC) and if the server raises any error except for NotImplemented it will fail. (If it gets NotImplemented, it will ignore and execute without a prepared statement.)
   
   Thank you very much, the example you gave was very helpful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1567132597

   It's not really written as an example, but the tests use a server based on Apache Derby: https://github.com/apache/arrow/blob/main/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java
   
   For Python: the Python code will try to create a prepared statement (this is to fit DBAPI, IIRC) and if the server raises any error except for NotImplemented it will fail. (If it gets NotImplemented, it will ignore and execute without a prepared statement.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1573635961

   FWIW, ADBC exposes a direct `fetch_df` method so you can avoid having to fetch and then convert. (Under the hood it will use PyArrow to convert from Arrow to Pandas, and PyArrow has heavily optimized this path over the years.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Cai-Yao commented on issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "Cai-Yao (via GitHub)" <gi...@apache.org>.
Cai-Yao commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1566432425

   > Can you try [`useEncryption=false`](https://arrow.apache.org/docs/java/flight_sql_jdbc_driver.html)? It seems the docs are wrong on the default value
   
   Thx, it's right.
   
   I would like to ask if there is Java sample code for flight-sql server. I only find the arrow flight service in Doc.
   
   I use python `adbc_driver_flightsql.dbapi` client connection to execute sql `select 1`, it will report error
   
   ```
   Traceback (most recent call last):
     File "/app/test.python", line 15, in <module>
       cursor.execute("SELECT 1, 2.0, 'Hello, world!'")
     File "/opt/bitnami/python/lib/python3.9/site-packages/adbc_driver_manager/dbapi.py", line 603, in execute
       self._prepare_execute(operation, parameters)
     File "/opt/bitnami/python/lib/python3.9/site-packages/adbc_driver_manager/dbapi.py", line 572, in _prepare_execute
       self._stmt.prepare()
     File "adbc_driver_manager/_lib.pyx", line 1090, in adbc_driver_manager._lib.AdbcStatement.prepare
     File "adbc_driver_manager/_lib.pyx", line 385, in adbc_driver_manager._lib.check_error
   adbc_driver_manager._lib.InternalError: ADBC_STATUS_INTERNAL (9): [FlightSQL] rpc error: code = Internal desc = There was an error servicing your request.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] xinyiZzz commented on issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1719115860

   > Great! I'd be interested in seeing how this develops.
   
   Thanks for your attention @lidavidm 
   
   I'm trying to implement Arrow Flight SQL server in Apache Doris to support ADBC.
   
   I will release the design document to the Doris community soon. @
   
   
   brief explanation:
   
   Motivation:
   Current demand is to speed up the return of Doris query results to Python. The scenario is data science/machine learning. In the future, it may be able to replace the interface of other systems such as Spark to read Doris.
   
   Doris is also a column-stored database. It is very expensive to convert the `column data` in Doris into `row data` and then back to `column data` through Mysql.
   
   Currently using python's mysql-client to read data from Doris is 10-20 times slower than ClickHouse. Previously I tried to use JDBC to take over the query in Doris Arrow Flight Server, compared with mysql-client, the performance improved by 4-10 times, but it was still 1 times slower than ClickHouse.
   
   Implementation:
   I referenced Arrow Flight Example and Dremio.
   Doris also has two roles: `Frontend` and `Backend`. `Frontend` is responsible for generate query plan and scheduling, and `Backend` is responsible for query execution.
   ADBC connection process:
   - ADBC Client connects to `Frontend`, `Frontend` sends the query plan to `Backend`, and returns the endpoint of the result `Backend` to the ADBC Client.
   - ADBC Client connects to `Backend` in the endpoint to pull data.
   
   After a simple test, the performance has been several times faster than ClickHouse. I will continue to develop it and look forward to the final effect.
   
   Related PRs:
   https://github.com/apache/doris/pull/23765
   https://github.com/apache/doris/pull/24314


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1573634646

   Great! I'd be interested in seeing how this develops.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Cai-Yao closed issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "Cai-Yao (via GitHub)" <gi...@apache.org>.
Cai-Yao closed issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed
URL: https://github.com/apache/arrow/issues/35777


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on issue #35777: [Flight-SQL] use Flight_SQL_JDBC to connect server failed

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1564251184

   Can you try [`useEncryption=false`](https://arrow.apache.org/docs/java/flight_sql_jdbc_driver.html)? It seems the docs are wrong on the default value


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org