You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2019/09/09 19:59:00 UTC

[jira] [Commented] (CALCITE-3333) Add time-based ResultSet frame size limiting

    [ https://issues.apache.org/jira/browse/CALCITE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926061#comment-16926061 ] 

Julian Hyde commented on CALCITE-3333:
--------------------------------------

I like the idea of this, but it's just at an API level. Can you surface it (or just part of the functionality) through the connect string parameters so that people can use it out of the box?

When you talk about limiting based on "time" is that the time spent on the server, or the round-trip time? Unless the server is very slow (e.g. reading a continuous query sourced from kafka) the RPC latency is much larger than the time spent on the server.

I'd love if there was a way to say "10,000 rows, or 1 million bytes, as long as you can do it in 100 ms server and 1 s RPC round trip" as a connect string parameter.

> Add time-based ResultSet frame size limiting
> --------------------------------------------
>
>                 Key: CALCITE-3333
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3333
>             Project: Calcite
>          Issue Type: New Feature
>          Components: avatica
>            Reporter: Gabriel Reid
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The size of a single JDBC ResultSet frame returned in a single {{prepareAndExecute}} or {{fetch}} invocation is currently 100, meaning that each retrieval of a portion of a ResultSet will send 100 rows over the wire. This frame size may be too big in some situations, and too small in other situations.
> If the underlying data source being queried can provide thousands of (small) records per second, then only reading them at 100 per RPC call will be unnecessarily slow.
> On the other hand, if the underlying data source is only providing records at a rate of 1 per second, then it will take 100 seconds for each RPC call to return, which can lead to timeouts (particularly if Avatica server is sitting behind a proxy that has a strict request timeout).
> The main factors to take into account when finding an ideal size of frame to return for each RPC call are:
> * make the frames small enough that they don't overload either Avatica server or the client with overly large amounts of data at one time
> * make the frames large enough so that the percentage of total query time that is spent only on RPC overhead is minimized
> The general idea of this ticket is to add a pluggable "frame size limiting" functionality so that frame size limiting can be done based on the number of rows, number of bytes, amount of time spent building a frame, or any other property or combination of properties.
> Note that CALCITE-2322 contains some work to allow configuring the size of a single frame on a Connection or Statement (via the {{setFetchSize}} method), although it's not yet merged in. That ticket would also be useful, and does not conflict with the general intent of this ticket.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)