You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Doug Meil <do...@explorysmedical.com> on 2013/12/02 19:58:55 UTC

Re: Online/Realtime query with filter and join?

You are going to want to figure out a rowkey (or a set of tables with
rowkeys) to restrict the number of I/O's. If you just slap Impala in front
of HBase (or even Phoenix, for that matter) you could write SQL against it
but if it's winds up doing a full-scan of an Hbase table underneath you
won't get your < 100ms response time.

Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
saying start with the rowkeys first so that you limit the I/O.  Then start
adding frameworks as needed (and/or build a schema with Phoenix in the
same rowkey exercise).

Such response-time requirements make me think that this is for application
support, so why the requirement for SQL? Might want to start writing it as
a Java program first.

On 11/29/13 4:32 PM, "Mourad K" <mo...@gmail.com> wrote:

>You might want to consider something like Impala or Phoenix, I presume
>you are trying to do some report query for dashboard or UI?
>MapReduce is certainly not adequate as there is too much latency on
>startup. If you want to give this a try, cdh4 and Impala are a good start.
>
>Mouradk
>
>On 29 Nov 2013, at 10:33, Ramon Wang <ra...@appannie.com> wrote:
>
>> The general performance requirement for each query is less than 100 ms,
>> that's the average level. Sounds crazy, but yes we need to find a way
>>for
>> it.
>> 
>> Thanks
>> Ramon
>> 
>> 
>> On Fri, Nov 29, 2013 at 5:01 PM, yonghu <yo...@gmail.com> wrote:
>> 
>>> The question is what you mean of "real-time". What is your performance
>>> request? In my opinion, I don't think the MapReduce is suitable for the
>>> real time data processing.
>>> 
>>> 
>>> On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu <az...@gmail.com> wrote:
>>> 
>>>> you can try phoniex.
>>>> On 2013-11-29 3:44 PM, "Ramon Wang" <ra...@appannie.com> wrote:
>>>> 
>>>>> Hi Folks
>>>>> 
>>>>> It seems to be impossible, but I still want to check if there is a
>>>>>way
>>> we
>>>>> can do "complex" query on HBase with "Order By", "JOIN".. etc like we
>>>> have
>>>>> with normal RDBMS, we are asked to provided such a solution for it,
>>>>>any
>>>>> ideas? Thanks for your help.
>>>>> 
>>>>> BTW, i think maybe impala from CDH would be a way to go, but haven't
>>> got
>>>>> time to check it yet.
>>>>> 
>>>>> Thanks
>>>>> Ramon
>>>

Re: Online/Realtime query with filter and join?

Posted by Viral Bajaria <vi...@gmail.com>.

Pradeep, correct me if I am wrong but prestodb has not released the HBase
plugin as yet or they did and maybe I missed the announcement ?

I agree with what Doug is saying here, you can't achieve < 100ms on every
kind of query on HBase unless and until you design the rowkey in a way to
help you reduce your I/O. A full scan of a table with billions of rows and
columns can take forever, but good indexing (via rowkey or secondary
indexes) could help speed up.

Thanks,
Viral


On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota <pr...@gmail.com>wrote:

> In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
> mix. :)
>
> http://prestodb.io/
>
>
> On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil <doug.meil@explorysmedical.com
> >wrote:
>
> >
> > You are going to want to figure out a rowkey (or a set of tables with
> > rowkeys) to restrict the number of I/O's. If you just slap Impala in
> front
> > of HBase (or even Phoenix, for that matter) you could write SQL against
> it
> > but if it's winds up doing a full-scan of an Hbase table underneath you
> > won't get your < 100ms response time.
> >
> > Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
> > saying start with the rowkeys first so that you limit the I/O.  Then
> start
> > adding frameworks as needed (and/or build a schema with Phoenix in the
> > same rowkey exercise).
> >
> > Such response-time requirements make me think that this is for
> application
> > support, so why the requirement for SQL? Might want to start writing it
> as
> > a Java program first.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 11/29/13 4:32 PM, "Mourad K" <mo...@gmail.com> wrote:
> >
> > >You might want to consider something like Impala or Phoenix, I presume
> > >you are trying to do some report query for dashboard or UI?
> > >MapReduce is certainly not adequate as there is too much latency on
> > >startup. If you want to give this a try, cdh4 and Impala are a good
> start.
> > >
> > >Mouradk
> > >
> > >On 29 Nov 2013, at 10:33, Ramon Wang <ra...@appannie.com> wrote:
> > >
> > >> The general performance requirement for each query is less than 100
> ms,
> > >> that's the average level. Sounds crazy, but yes we need to find a way
> > >>for
> > >> it.
> > >>
> > >> Thanks
> > >> Ramon
> > >>
> > >>
> > >> On Fri, Nov 29, 2013 at 5:01 PM, yonghu <yo...@gmail.com>
> wrote:
> > >>
> > >>> The question is what you mean of "real-time". What is your
> performance
> > >>> request? In my opinion, I don't think the MapReduce is suitable for
> the
> > >>> real time data processing.
> > >>>
> > >>>
> > >>> On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu <az...@gmail.com>
> wrote:
> > >>>
> > >>>> you can try phoniex.
> > >>>> On 2013-11-29 3:44 PM, "Ramon Wang" <ra...@appannie.com> wrote:
> > >>>>
> > >>>>> Hi Folks
> > >>>>>
> > >>>>> It seems to be impossible, but I still want to check if there is a
> > >>>>>way
> > >>> we
> > >>>>> can do "complex" query on HBase with "Order By", "JOIN".. etc like
> we
> > >>>> have
> > >>>>> with normal RDBMS, we are asked to provided such a solution for it,
> > >>>>>any
> > >>>>> ideas? Thanks for your help.
> > >>>>>
> > >>>>> BTW, i think maybe impala from CDH would be a way to go, but
> haven't
> > >>> got
> > >>>>> time to check it yet.
> > >>>>>
> > >>>>> Thanks
> > >>>>> Ramon
> > >>>
> >
> >
>

Re: Online/Realtime query with filter and join?

Posted by Pradeep Gollakota <pr...@gmail.com>.

@Viral I'm not sure... I just know that they mentioned on the front page
that PrestoDB can query HBase tables.


On Mon, Dec 2, 2013 at 11:07 AM, James Taylor <jt...@salesforce.com>wrote:

> I agree with Doug Meil's advice. Start with your row key design. In
> Phoenix, your PRIMARY KEY CONSTRAINT defines your row key. You should lead
> with the columns that you'll filter against most frequently. Then, take a
> look at adding secondary indexes to speedup queries against other columns.
>
> Thanks,
> James
>
>
> On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota <pradeepg26@gmail.com
> >wrote:
>
> > In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
> > mix. :)
> >
> > http://prestodb.io/
> >
> >
> > On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil <
> doug.meil@explorysmedical.com
> > >wrote:
> >
> > >
> > > You are going to want to figure out a rowkey (or a set of tables with
> > > rowkeys) to restrict the number of I/O's. If you just slap Impala in
> > front
> > > of HBase (or even Phoenix, for that matter) you could write SQL against
> > it
> > > but if it's winds up doing a full-scan of an Hbase table underneath you
> > > won't get your < 100ms response time.
> > >
> > > Note:  I'm not saying you can't do this with Impala or Phoenix, I'm
> just
> > > saying start with the rowkeys first so that you limit the I/O.  Then
> > start
> > > adding frameworks as needed (and/or build a schema with Phoenix in the
> > > same rowkey exercise).
> > >
> > > Such response-time requirements make me think that this is for
> > application
> > > support, so why the requirement for SQL? Might want to start writing it
> > as
> > > a Java program first.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 11/29/13 4:32 PM, "Mourad K" <mo...@gmail.com> wrote:
> > >
> > > >You might want to consider something like Impala or Phoenix, I presume
> > > >you are trying to do some report query for dashboard or UI?
> > > >MapReduce is certainly not adequate as there is too much latency on
> > > >startup. If you want to give this a try, cdh4 and Impala are a good
> > start.
> > > >
> > > >Mouradk
> > > >
> > > >On 29 Nov 2013, at 10:33, Ramon Wang <ra...@appannie.com> wrote:
> > > >
> > > >> The general performance requirement for each query is less than 100
> > ms,
> > > >> that's the average level. Sounds crazy, but yes we need to find a
> way
> > > >>for
> > > >> it.
> > > >>
> > > >> Thanks
> > > >> Ramon
> > > >>
> > > >>
> > > >> On Fri, Nov 29, 2013 at 5:01 PM, yonghu <yo...@gmail.com>
> > wrote:
> > > >>
> > > >>> The question is what you mean of "real-time". What is your
> > performance
> > > >>> request? In my opinion, I don't think the MapReduce is suitable for
> > the
> > > >>> real time data processing.
> > > >>>
> > > >>>
> > > >>> On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu <az...@gmail.com>
> > wrote:
> > > >>>
> > > >>>> you can try phoniex.
> > > >>>> On 2013-11-29 3:44 PM, "Ramon Wang" <ra...@appannie.com> wrote:
> > > >>>>
> > > >>>>> Hi Folks
> > > >>>>>
> > > >>>>> It seems to be impossible, but I still want to check if there is
> a
> > > >>>>>way
> > > >>> we
> > > >>>>> can do "complex" query on HBase with "Order By", "JOIN".. etc
> like
> > we
> > > >>>> have
> > > >>>>> with normal RDBMS, we are asked to provided such a solution for
> it,
> > > >>>>>any
> > > >>>>> ideas? Thanks for your help.
> > > >>>>>
> > > >>>>> BTW, i think maybe impala from CDH would be a way to go, but
> > haven't
> > > >>> got
> > > >>>>> time to check it yet.
> > > >>>>>
> > > >>>>> Thanks
> > > >>>>> Ramon
> > > >>>
> > >
> > >
> >
>

Re: Online/Realtime query with filter and join?

Posted by James Taylor <jt...@salesforce.com>.

I agree with Doug Meil's advice. Start with your row key design. In
Phoenix, your PRIMARY KEY CONSTRAINT defines your row key. You should lead
with the columns that you'll filter against most frequently. Then, take a
look at adding secondary indexes to speedup queries against other columns.

Thanks,
James


On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota <pr...@gmail.com>wrote:

> In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
> mix. :)
>
> http://prestodb.io/
>
>
> On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil <doug.meil@explorysmedical.com
> >wrote:
>
> >
> > You are going to want to figure out a rowkey (or a set of tables with
> > rowkeys) to restrict the number of I/O's. If you just slap Impala in
> front
> > of HBase (or even Phoenix, for that matter) you could write SQL against
> it
> > but if it's winds up doing a full-scan of an Hbase table underneath you
> > won't get your < 100ms response time.
> >
> > Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
> > saying start with the rowkeys first so that you limit the I/O.  Then
> start
> > adding frameworks as needed (and/or build a schema with Phoenix in the
> > same rowkey exercise).
> >
> > Such response-time requirements make me think that this is for
> application
> > support, so why the requirement for SQL? Might want to start writing it
> as
> > a Java program first.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 11/29/13 4:32 PM, "Mourad K" <mo...@gmail.com> wrote:
> >
> > >You might want to consider something like Impala or Phoenix, I presume
> > >you are trying to do some report query for dashboard or UI?
> > >MapReduce is certainly not adequate as there is too much latency on
> > >startup. If you want to give this a try, cdh4 and Impala are a good
> start.
> > >
> > >Mouradk
> > >
> > >On 29 Nov 2013, at 10:33, Ramon Wang <ra...@appannie.com> wrote:
> > >
> > >> The general performance requirement for each query is less than 100
> ms,
> > >> that's the average level. Sounds crazy, but yes we need to find a way
> > >>for
> > >> it.
> > >>
> > >> Thanks
> > >> Ramon
> > >>
> > >>
> > >> On Fri, Nov 29, 2013 at 5:01 PM, yonghu <yo...@gmail.com>
> wrote:
> > >>
> > >>> The question is what you mean of "real-time". What is your
> performance
> > >>> request? In my opinion, I don't think the MapReduce is suitable for
> the
> > >>> real time data processing.
> > >>>
> > >>>
> > >>> On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu <az...@gmail.com>
> wrote:
> > >>>
> > >>>> you can try phoniex.
> > >>>> On 2013-11-29 3:44 PM, "Ramon Wang" <ra...@appannie.com> wrote:
> > >>>>
> > >>>>> Hi Folks
> > >>>>>
> > >>>>> It seems to be impossible, but I still want to check if there is a
> > >>>>>way
> > >>> we
> > >>>>> can do "complex" query on HBase with "Order By", "JOIN".. etc like
> we
> > >>>> have
> > >>>>> with normal RDBMS, we are asked to provided such a solution for it,
> > >>>>>any
> > >>>>> ideas? Thanks for your help.
> > >>>>>
> > >>>>> BTW, i think maybe impala from CDH would be a way to go, but
> haven't
> > >>> got
> > >>>>> time to check it yet.
> > >>>>>
> > >>>>> Thanks
> > >>>>> Ramon
> > >>>
> >
> >
>

Re: Online/Realtime query with filter and join?

Posted by Pradeep Gollakota <pr...@gmail.com>.

In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
mix. :)

http://prestodb.io/


On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil <do...@explorysmedical.com>wrote:

>
> You are going to want to figure out a rowkey (or a set of tables with
> rowkeys) to restrict the number of I/O's. If you just slap Impala in front
> of HBase (or even Phoenix, for that matter) you could write SQL against it
> but if it's winds up doing a full-scan of an Hbase table underneath you
> won't get your < 100ms response time.
>
> Note:  I'm not saying you can't do this with Impala or Phoenix, I'm just
> saying start with the rowkeys first so that you limit the I/O.  Then start
> adding frameworks as needed (and/or build a schema with Phoenix in the
> same rowkey exercise).
>
> Such response-time requirements make me think that this is for application
> support, so why the requirement for SQL? Might want to start writing it as
> a Java program first.
>
>
>
>
>
>
>
>
>
> On 11/29/13 4:32 PM, "Mourad K" <mo...@gmail.com> wrote:
>
> >You might want to consider something like Impala or Phoenix, I presume
> >you are trying to do some report query for dashboard or UI?
> >MapReduce is certainly not adequate as there is too much latency on
> >startup. If you want to give this a try, cdh4 and Impala are a good start.
> >
> >Mouradk
> >
> >On 29 Nov 2013, at 10:33, Ramon Wang <ra...@appannie.com> wrote:
> >
> >> The general performance requirement for each query is less than 100 ms,
> >> that's the average level. Sounds crazy, but yes we need to find a way
> >>for
> >> it.
> >>
> >> Thanks
> >> Ramon
> >>
> >>
> >> On Fri, Nov 29, 2013 at 5:01 PM, yonghu <yo...@gmail.com> wrote:
> >>
> >>> The question is what you mean of "real-time". What is your performance
> >>> request? In my opinion, I don't think the MapReduce is suitable for the
> >>> real time data processing.
> >>>
> >>>
> >>> On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu <az...@gmail.com> wrote:
> >>>
> >>>> you can try phoniex.
> >>>> On 2013-11-29 3:44 PM, "Ramon Wang" <ra...@appannie.com> wrote:
> >>>>
> >>>>> Hi Folks
> >>>>>
> >>>>> It seems to be impossible, but I still want to check if there is a
> >>>>>way
> >>> we
> >>>>> can do "complex" query on HBase with "Order By", "JOIN".. etc like we
> >>>> have
> >>>>> with normal RDBMS, we are asked to provided such a solution for it,
> >>>>>any
> >>>>> ideas? Thanks for your help.
> >>>>>
> >>>>> BTW, i think maybe impala from CDH would be a way to go, but haven't
> >>> got
> >>>>> time to check it yet.
> >>>>>
> >>>>> Thanks
> >>>>> Ramon
> >>>
>
>