You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Jan Van Besien <ja...@ngdata.com> on 2015/09/21 15:35:37 UTC

avatica streaming

Hi,

I was looking into Avatica to create a "thin" jdbc client for our
existing "thick" jdbc client implemented with calcite. I got something
working very quickly, very much similar to what Apache Phoenix has
done.

However, I immediately notice that there is no streaming between
client and server for large result sets. In other words, if I execute
a query which results in a large result set, the client has to wait a
long time without any feedback and if the result set is large enough
the server goes OOM.

I am wondering if this functionality is simply missing from Avatica or
whether there is some extra work required on my end to make it work.

Thanks
Jan

Re: avatica streaming

Posted by Jan Van Besien <ja...@ngdata.com>.
"Streaming" might have been a bad choice of words, batches of rows is
actually what I meant.

I looked into it a bit more and noticed that it actually works with
1.4.0-incubating but not with 1.3.0-incubating. There is no real
reason why I was still on 1.3.0, so it is no longer an issue for me.

I am using JdbcMeta with the out-of-the-box
org.apache.calcite.avatica.server.Main.

Thanks,
Jan

On Mon, Sep 21, 2015 at 8:39 PM, Julian Hyde <jh...@apache.org> wrote:
> A trusim is that if you look sufficiently closely at any stream it is revealed to be batches. The Avatica API supports incremental fetches[1] as batches of rows (called “frames”) and so both its JSON and Protobuf formats.
>
> There used to be a limitation for metadata result sets (e.g. getTables) that all of the rows were squashed into the the first frame, however many there were. (See class MetaResultSet.) I don’t recall whether that limitation still holds.
>
> For other result sets, you can implement the “fetch” method and start returning rows as soon as you have some.
>
> If there is a particular implementation of Avatica server that buffers so many rows that it runs out of memory then that is a bug and it would be helpful to see the stack trace.
>
> Julian
>
> [1] http://calcite.incubator.apache.org/apidocs/org/apache/calcite/avatica/Meta.html#fetch-org.apache.calcite.avatica.Meta.StatementHandle-java.util.List-long-int- <http://calcite.incubator.apache.org/apidocs/org/apache/calcite/avatica/Meta.html#fetch-org.apache.calcite.avatica.Meta.StatementHandle-java.util.List-long-int->
>
>
>> On Sep 21, 2015, at 8:28 AM, Josh Elser <jo...@gmail.com> wrote:
>>
>> Hi Jan,
>>
>> Cool! I hope you're having success using Avatica.
>>
>> I would call it missing functionality in Avatica itself. IIRC, you'll get bundles of 100 messages that come back in one HTTP response from the Avatica server.
>>
>> It's been a while since I've looked at state of the art web-tech, but I remember seeing some neat natives built into gRPC that support streaming on top of HTTP/2. I'm not sure how best (in terms of client compatibility) to support streaming results back instead of bundling and sending once a certain size is reached, but it's definitely an area that could be improved! Would love to have a discussion on the matter.
>>
>> - Josh
>>
>> Jan Van Besien wrote:
>>> Hi,
>>>
>>> I was looking into Avatica to create a "thin" jdbc client for our
>>> existing "thick" jdbc client implemented with calcite. I got something
>>> working very quickly, very much similar to what Apache Phoenix has
>>> done.
>>>
>>> However, I immediately notice that there is no streaming between
>>> client and server for large result sets. In other words, if I execute
>>> a query which results in a large result set, the client has to wait a
>>> long time without any feedback and if the result set is large enough
>>> the server goes OOM.
>>>
>>> I am wondering if this functionality is simply missing from Avatica or
>>> whether there is some extra work required on my end to make it work.
>>>
>>> Thanks
>>> Jan
>

Re: avatica streaming

Posted by Julian Hyde <jh...@apache.org>.
A trusim is that if you look sufficiently closely at any stream it is revealed to be batches. The Avatica API supports incremental fetches[1] as batches of rows (called “frames”) and so both its JSON and Protobuf formats.

There used to be a limitation for metadata result sets (e.g. getTables) that all of the rows were squashed into the the first frame, however many there were. (See class MetaResultSet.) I don’t recall whether that limitation still holds.

For other result sets, you can implement the “fetch” method and start returning rows as soon as you have some.

If there is a particular implementation of Avatica server that buffers so many rows that it runs out of memory then that is a bug and it would be helpful to see the stack trace.

Julian

[1] http://calcite.incubator.apache.org/apidocs/org/apache/calcite/avatica/Meta.html#fetch-org.apache.calcite.avatica.Meta.StatementHandle-java.util.List-long-int- <http://calcite.incubator.apache.org/apidocs/org/apache/calcite/avatica/Meta.html#fetch-org.apache.calcite.avatica.Meta.StatementHandle-java.util.List-long-int->


> On Sep 21, 2015, at 8:28 AM, Josh Elser <jo...@gmail.com> wrote:
> 
> Hi Jan,
> 
> Cool! I hope you're having success using Avatica.
> 
> I would call it missing functionality in Avatica itself. IIRC, you'll get bundles of 100 messages that come back in one HTTP response from the Avatica server.
> 
> It's been a while since I've looked at state of the art web-tech, but I remember seeing some neat natives built into gRPC that support streaming on top of HTTP/2. I'm not sure how best (in terms of client compatibility) to support streaming results back instead of bundling and sending once a certain size is reached, but it's definitely an area that could be improved! Would love to have a discussion on the matter.
> 
> - Josh
> 
> Jan Van Besien wrote:
>> Hi,
>> 
>> I was looking into Avatica to create a "thin" jdbc client for our
>> existing "thick" jdbc client implemented with calcite. I got something
>> working very quickly, very much similar to what Apache Phoenix has
>> done.
>> 
>> However, I immediately notice that there is no streaming between
>> client and server for large result sets. In other words, if I execute
>> a query which results in a large result set, the client has to wait a
>> long time without any feedback and if the result set is large enough
>> the server goes OOM.
>> 
>> I am wondering if this functionality is simply missing from Avatica or
>> whether there is some extra work required on my end to make it work.
>> 
>> Thanks
>> Jan


Re: avatica streaming

Posted by Josh Elser <jo...@gmail.com>.
Hi Jan,

Cool! I hope you're having success using Avatica.

I would call it missing functionality in Avatica itself. IIRC, you'll 
get bundles of 100 messages that come back in one HTTP response from the 
Avatica server.

It's been a while since I've looked at state of the art web-tech, but I 
remember seeing some neat natives built into gRPC that support streaming 
on top of HTTP/2. I'm not sure how best (in terms of client 
compatibility) to support streaming results back instead of bundling and 
sending once a certain size is reached, but it's definitely an area that 
could be improved! Would love to have a discussion on the matter.

- Josh

Jan Van Besien wrote:
> Hi,
>
> I was looking into Avatica to create a "thin" jdbc client for our
> existing "thick" jdbc client implemented with calcite. I got something
> working very quickly, very much similar to what Apache Phoenix has
> done.
>
> However, I immediately notice that there is no streaming between
> client and server for large result sets. In other words, if I execute
> a query which results in a large result set, the client has to wait a
> long time without any feedback and if the result set is large enough
> the server goes OOM.
>
> I am wondering if this functionality is simply missing from Avatica or
> whether there is some extra work required on my end to make it work.
>
> Thanks
> Jan