You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Abe Weinograd <ab...@flonet.com> on 2014/07/24 23:24:32 UTC

query client performance

Hello,

One of our main use cases is to extract a subset of our data in an ETL tool
(usually in the 10 million row range) from our tables in Phoenix.  The
behavior I am seeing is that all rows are streamed to the machine running
the Phoenix Client and then processed before the JDBC driver gets the next
row.

We have tuned the scanner cache to 1000 rows, however it takes a while.  I
can imagine the all rows are being sorted before they are streamed out to
the result set.  Is this something we can change?  what other things can I
tune for this access pattern?

Thanks!
Abe

Re: query client performance

Posted by James Taylor <ja...@apache.org>.
Yes, we're planning on cutting an RC early next week.
Thanks,
James

On Fri, Jul 25, 2014 at 12:36 PM, Abe Weinograd <ab...@flonet.com> wrote:
> Thanks James.  That's very helpful.
>
> 4.1 is being released soon?
>
> Thanks,
> Abe
>
>
> On Fri, Jul 25, 2014 at 3:34 PM, James Taylor <ja...@apache.org>
> wrote:
>>
>> Hi Abe,
>> FWIW, there's an improvement in place
>> (https://issues.apache.org/jira/browse/PHOENIX-539) for our upcoming
>> next release that doesn't cause the first rext row call to pull over
>> everything. Instead, it is done in chunks.
>>
>> As far as what you can do now, I'd recommend putting a LIMIT clause on
>> your queries as this will bound the number of rows that get pulled
>> over. You can also page through the results as described here:
>> http://phoenix.apache.org/paged.html and elaborated on in this email
>> thread: http://s.apache.org/588
>>
>> Thanks,
>> James
>>
>> On Fri, Jul 25, 2014 at 10:45 AM, Nicolas Maillard
>> <nm...@hortonworks.com> wrote:
>> > Hello Abe
>> >
>> > You are right currently the pheonix client is the final step hence some
>> > processing can happen there.
>> > One way is to actually put the client on the cluster to avoid long anf
>> > suboptimal networks.
>> > Maybe a service standing in the cluster for you in front of the client
>> > to do
>> > last stepds and even pagination/compression.
>> >
>> >
>> >
>> > On Thu, Jul 24, 2014 at 11:24 PM, Abe Weinograd <ab...@flonet.com> wrote:
>> >>
>> >> Hello,
>> >>
>> >> One of our main use cases is to extract a subset of our data in an ETL
>> >> tool (usually in the 10 million row range) from our tables in Phoenix.
>> >> The
>> >> behavior I am seeing is that all rows are streamed to the machine
>> >> running
>> >> the Phoenix Client and then processed before the JDBC driver gets the
>> >> next
>> >> row.
>> >>
>> >> We have tuned the scanner cache to 1000 rows, however it takes a while.
>> >> I
>> >> can imagine the all rows are being sorted before they are streamed out
>> >> to
>> >> the result set.  Is this something we can change?  what other things
>> >> can I
>> >> tune for this access pattern?
>> >>
>> >> Thanks!
>> >> Abe
>> >
>> >
>> >
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or entity
>> > to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the
>> > reader of
>> > this message is not the intended recipient, you are hereby notified that
>> > any
>> > printing, copying, dissemination, distribution, disclosure or forwarding
>> > of
>> > this communication is strictly prohibited. If you have received this
>> > communication in error, please contact the sender immediately and delete
>> > it
>> > from your system. Thank You.
>
>

Re: query client performance

Posted by Abe Weinograd <ab...@flonet.com>.
Thanks James.  That's very helpful.

4.1 is being released soon?

Thanks,
Abe


On Fri, Jul 25, 2014 at 3:34 PM, James Taylor <ja...@apache.org>
wrote:

> Hi Abe,
> FWIW, there's an improvement in place
> (https://issues.apache.org/jira/browse/PHOENIX-539) for our upcoming
> next release that doesn't cause the first rext row call to pull over
> everything. Instead, it is done in chunks.
>
> As far as what you can do now, I'd recommend putting a LIMIT clause on
> your queries as this will bound the number of rows that get pulled
> over. You can also page through the results as described here:
> http://phoenix.apache.org/paged.html and elaborated on in this email
> thread: http://s.apache.org/588
>
> Thanks,
> James
>
> On Fri, Jul 25, 2014 at 10:45 AM, Nicolas Maillard
> <nm...@hortonworks.com> wrote:
> > Hello Abe
> >
> > You are right currently the pheonix client is the final step hence some
> > processing can happen there.
> > One way is to actually put the client on the cluster to avoid long anf
> > suboptimal networks.
> > Maybe a service standing in the cluster for you in front of the client
> to do
> > last stepds and even pagination/compression.
> >
> >
> >
> > On Thu, Jul 24, 2014 at 11:24 PM, Abe Weinograd <ab...@flonet.com> wrote:
> >>
> >> Hello,
> >>
> >> One of our main use cases is to extract a subset of our data in an ETL
> >> tool (usually in the 10 million row range) from our tables in Phoenix.
>  The
> >> behavior I am seeing is that all rows are streamed to the machine
> running
> >> the Phoenix Client and then processed before the JDBC driver gets the
> next
> >> row.
> >>
> >> We have tuned the scanner cache to 1000 rows, however it takes a while.
>  I
> >> can imagine the all rows are being sorted before they are streamed out
> to
> >> the result set.  Is this something we can change?  what other things
> can I
> >> tune for this access pattern?
> >>
> >> Thanks!
> >> Abe
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the
> reader of
> > this message is not the intended recipient, you are hereby notified that
> any
> > printing, copying, dissemination, distribution, disclosure or forwarding
> of
> > this communication is strictly prohibited. If you have received this
> > communication in error, please contact the sender immediately and delete
> it
> > from your system. Thank You.
>

Re: query client performance

Posted by James Taylor <ja...@apache.org>.
Hi Abe,
FWIW, there's an improvement in place
(https://issues.apache.org/jira/browse/PHOENIX-539) for our upcoming
next release that doesn't cause the first rext row call to pull over
everything. Instead, it is done in chunks.

As far as what you can do now, I'd recommend putting a LIMIT clause on
your queries as this will bound the number of rows that get pulled
over. You can also page through the results as described here:
http://phoenix.apache.org/paged.html and elaborated on in this email
thread: http://s.apache.org/588

Thanks,
James

On Fri, Jul 25, 2014 at 10:45 AM, Nicolas Maillard
<nm...@hortonworks.com> wrote:
> Hello Abe
>
> You are right currently the pheonix client is the final step hence some
> processing can happen there.
> One way is to actually put the client on the cluster to avoid long anf
> suboptimal networks.
> Maybe a service standing in the cluster for you in front of the client to do
> last stepds and even pagination/compression.
>
>
>
> On Thu, Jul 24, 2014 at 11:24 PM, Abe Weinograd <ab...@flonet.com> wrote:
>>
>> Hello,
>>
>> One of our main use cases is to extract a subset of our data in an ETL
>> tool (usually in the 10 million row range) from our tables in Phoenix.  The
>> behavior I am seeing is that all rows are streamed to the machine running
>> the Phoenix Client and then processed before the JDBC driver gets the next
>> row.
>>
>> We have tuned the scanner cache to 1000 rows, however it takes a while.  I
>> can imagine the all rows are being sorted before they are streamed out to
>> the result set.  Is this something we can change?  what other things can I
>> tune for this access pattern?
>>
>> Thanks!
>> Abe
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.

Re: query client performance

Posted by Nicolas Maillard <nm...@hortonworks.com>.
Hello Abe

You are right currently the pheonix client is the final step hence some
processing can happen there.
One way is to actually put the client on the cluster to avoid long anf
suboptimal networks.
Maybe a service standing in the cluster for you in front of the client to
do last stepds and even pagination/compression.



On Thu, Jul 24, 2014 at 11:24 PM, Abe Weinograd <ab...@flonet.com> wrote:

> Hello,
>
> One of our main use cases is to extract a subset of our data in an ETL
> tool (usually in the 10 million row range) from our tables in Phoenix.  The
> behavior I am seeing is that all rows are streamed to the machine running
> the Phoenix Client and then processed before the JDBC driver gets the next
> row.
>
> We have tuned the scanner cache to 1000 rows, however it takes a while.  I
> can imagine the all rows are being sorted before they are streamed out to
> the result set.  Is this something we can change?  what other things can I
> tune for this access pattern?
>
> Thanks!
> Abe
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.