You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Sumit Nigam <su...@yahoo.com> on 2015/10/06 17:38:17 UTC

ResultSet size

Hi,
Does Phoenix buffer the result set internally? I mean when I fire a huge skip scan IN clause, then the data being returned may be too huge to contain in memory. So, ideally I'd like to stream data through the resultset.next() method. So, my question is does Phoenix really stream results? 
And if so, is there a way to control how much is loads in one time in client side before its next() fetches next batch of data from region servers to client?
Best regards,Sumit

Re: ResultSet size

Posted by Jesse Yates <je...@gmail.com>.
Correct. So you have to make sure that you have enough memory to handle the
fetchSize * concurrent requests.


On Tue, Oct 6, 2015 at 10:34 AM Sumit Nigam <su...@yahoo.com> wrote:

> Thanks Samarth and Jesse.
>
> So, in effect setting the batch size (say, stmt.setFetchSize()) ensures
> that only that much data is copied over the wire en-mass? And
> 'behind-the-scenes', Phoenix driver would fetch next batch as each fetch
> size is exhausted?
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Samarth Jain <sa...@gmail.com>
> *To:* "user@phoenix.apache.org" <us...@phoenix.apache.org>
> *Cc:* Sumit Nigam <su...@yahoo.com>
> *Sent:* Tuesday, October 6, 2015 9:20 PM
> *Subject:* Re: ResultSet size
>
> To add to what Jesse said, you can override the default scanner fetch size
> programmatically via Phoenix by calling statement.setFetchSize(int).
> On Tuesday, October 6, 2015, Jesse Yates <je...@gmail.com> wrote:
>
>
> So HBase (and by extension, Phoenix) does not do true "streaming" of rows
> - rows are copied into memory from the HFiles and then eventually copied
> en-mass onto the wire. On the client they are pulled off in chunks and
> paged through by the client scanner. You can control the batch size (amount
> of rows in each 'page') via the usual HBase client configurations
>
> On Tue, Oct 6, 2015 at 8:38 AM Sumit Nigam <su...@yahoo.com> wrote:
>
> Hi,
>
> Does Phoenix buffer the result set internally? I mean when I fire a huge
> skip scan IN clause, then the data being returned may be too huge to
> contain in memory. So, ideally I'd like to stream data through the
> resultset.next() method. So, my question is does Phoenix really stream
> results?
>
> And if so, is there a way to control how much is loads in one time in
> client side before its next() fetches next batch of data from region
> servers to client?
>
> Best regards,
> Sumit
>
>
>
>

Re: ResultSet size

Posted by Sumit Nigam <su...@yahoo.com>.
Thanks Samarth and Jesse.
So, in effect setting the batch size (say, stmt.setFetchSize()) ensures that only that much data is copied over the wire en-mass? And 'behind-the-scenes', Phoenix driver would fetch next batch as each fetch size is exhausted?
Thanks,Sumit 
      From: Samarth Jain <sa...@gmail.com>
 To: "user@phoenix.apache.org" <us...@phoenix.apache.org> 
Cc: Sumit Nigam <su...@yahoo.com> 
 Sent: Tuesday, October 6, 2015 9:20 PM
 Subject: Re: ResultSet size
   
To add to what Jesse said, you can override the default scanner fetch size programmatically via Phoenix by calling statement.setFetchSize(int).
On Tuesday, October 6, 2015, Jesse Yates <je...@gmail.com> wrote:



So HBase (and by extension, Phoenix) does not do true "streaming" of rows - rows are copied into memory from the HFiles and then eventually copied en-mass onto the wire. On the client they are pulled off in chunks and paged through by the client scanner. You can control the batch size (amount of rows in each 'page') via the usual HBase client configurations

On Tue, Oct 6, 2015 at 8:38 AM Sumit Nigam <su...@yahoo.com> wrote:

Hi,
Does Phoenix buffer the result set internally? I mean when I fire a huge skip scan IN clause, then the data being returned may be too huge to contain in memory. So, ideally I'd like to stream data through the resultset.next() method. So, my question is does Phoenix really stream results? 
And if so, is there a way to control how much is loads in one time in client side before its next() fetches next batch of data from region servers to client?
Best regards,Sumit



  

Re: ResultSet size

Posted by Samarth Jain <sa...@gmail.com>.
To add to what Jesse said, you can override the default scanner fetch size
programmatically via Phoenix by calling statement.setFetchSize(int).
On Tuesday, October 6, 2015, Jesse Yates <je...@gmail.com> wrote:

> So HBase (and by extension, Phoenix) does not do true "streaming" of rows
> - rows are copied into memory from the HFiles and then eventually copied
> en-mass onto the wire. On the client they are pulled off in chunks and
> paged through by the client scanner. You can control the batch size (amount
> of rows in each 'page') via the usual HBase client configurations
>
> On Tue, Oct 6, 2015 at 8:38 AM Sumit Nigam <sumit_only@yahoo.com
> <javascript:_e(%7B%7D,'cvml','sumit_only@yahoo.com');>> wrote:
>
>> Hi,
>>
>> Does Phoenix buffer the result set internally? I mean when I fire a huge
>> skip scan IN clause, then the data being returned may be too huge to
>> contain in memory. So, ideally I'd like to stream data through the
>> resultset.next() method. So, my question is does Phoenix really stream
>> results?
>>
>> And if so, is there a way to control how much is loads in one time in
>> client side before its next() fetches next batch of data from region
>> servers to client?
>>
>> Best regards,
>> Sumit
>>
>

Re: ResultSet size

Posted by Jesse Yates <je...@gmail.com>.
So HBase (and by extension, Phoenix) does not do true "streaming" of rows -
rows are copied into memory from the HFiles and then eventually copied
en-mass onto the wire. On the client they are pulled off in chunks and
paged through by the client scanner. You can control the batch size (amount
of rows in each 'page') via the usual HBase client configurations

On Tue, Oct 6, 2015 at 8:38 AM Sumit Nigam <su...@yahoo.com> wrote:

> Hi,
>
> Does Phoenix buffer the result set internally? I mean when I fire a huge
> skip scan IN clause, then the data being returned may be too huge to
> contain in memory. So, ideally I'd like to stream data through the
> resultset.next() method. So, my question is does Phoenix really stream
> results?
>
> And if so, is there a way to control how much is loads in one time in
> client side before its next() fetches next batch of data from region
> servers to client?
>
> Best regards,
> Sumit
>