You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@kyuubi.apache.org by "link3280 (via GitHub)" <gi...@apache.org> on 2023/06/24 03:46:10 UTC

[GitHub] [kyuubi] link3280 commented on issue #4806: [FEATURE] Incremental result fetching for Flink engine

link3280 commented on issue #4806:
URL: https://github.com/apache/kyuubi/issues/4806#issuecomment-1605252144

   Currently, the beeline incremental fetching works on the client side.  The Kyuubi server pulls all result rows from the engine and returns them to the client in a micro-batch manner. 
   
   However, this could be problematic for Flink engine or other streaming scenarios, where the operation is still running and producing more records after the Kyuubi operation is executed synchronized and considered result-available. 
   
   Consider a select-from-kafka use case, if we simply forward to fetch result requests to the Flink engine, there would be 3 situations:
   
   - the Kafka topic is temporarily empty, we get an empty rowset
   - the Kafka topic has 10 records, so we get a rowset with 10 rows (less than `kyuubi.session.engine.flink.max.rows`)
   - the Kafka topic has lots of records, so we get a rowset with `${kyuubi.session.engine.flink.max.rows}` rows 
   
   In the first two situations, Flink engine should but can't tell the Kyuubi server to fetch again later, thus the current workaround is that Flink engine polls result set until the rows reached `${kyuubi.session.engine.flink.max.rows}`.
   
   We could fix this by adding a `hasMoreResults` field to the result fetching response (AKA TRowSet), but this would cause a large footprint, as it touches the thrift protocol. So I'm thinking of injecting `TRowSet` with metadata columns prefixed with `__KYUUBI_` to allow the engines to tell the server there're more rows to fetch. The server needs to drop these metadata columns before returning the `TRowSet` to the clients.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org