You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jilal Oussama <ji...@gmail.com> on 2013/11/22 16:25:27 UTC

FTS performance

Hi all,

I am looking for some performance suggestions.

I would like to get all the keys of table (wich contains ~100 million rows).

Currently, I am doing a FTS with FirstKeyOnlyFilter and KeyOnlyFilter using
Thrift from a python script, and I find it very slow ...

Any suggestions whould be appreciable.

HBase : 0.94.13
Hadoop : 1.2.1

Thanks in advance.

Re: FTS performance

Posted by Oussama Jilal <ji...@gmail.com>.

Hmm ... yes I think I should test this with native Java but I can't get 
rid of Thrift ... too many other applications using it.

On 11/23/2013 03:45 AM, Haosong Huang wrote:
> Thrift may be the bottleneck.
>
>
> On Sat, Nov 23, 2013 at 6:09 AM, Oussama Jilal <ji...@gmail.com>wrote:
>
>> Of course yes, I guess my only option for now is MR ...
>>
>>
>> On 11/22/2013 09:34 PM, Jean-Marc Spaggiari wrote:
>>
>>> But even so. For FTS you will most probably want to benefit from
>>> parallelism to be able to scale.
>>>
>>>
>>> 2013/11/22 Oussama Jilal <ji...@gmail.com>
>>>
>>>   I am only loading the keys (using FirstKeyOnlyFilter and KeyOnlyFilter),
>>>> not the entire rows.
>>>>
>>>>
>>>> On 11/22/2013 07:03 PM, Vladimir Rodionov wrote:
>>>>
>>>>   Loading 100M rows over network from HBase server to HBase client is not
>>>>> a
>>>>> right approach
>>>>> for someone looking for speed.
>>>>>
>>>>> Best regards,
>>>>> Vladimir Rodionov
>>>>> Principal Platform Engineer
>>>>> Carrier IQ, www.carrieriq.com
>>>>> e-mail: vrodionov@carrieriq.com
>>>>>
>>>>> ________________________________________
>>>>> From: Jilal Oussama [jilal.oussama@gmail.com]
>>>>> Sent: Friday, November 22, 2013 8:50 AM
>>>>> To: Mailing List Apache HBase
>>>>> Subject: Re: FTS performance
>>>>>
>>>>> What I have been avoiding all along ... MR & CoProcessors ... thanks all
>>>>>
>>>>>
>>>>> 2013/11/22 Asaf Mesika <as...@gmail.com>
>>>>>
>>>>>    You're right, just in bear mind response time won't match an online
>>>>> query
>>>>>
>>>>>> if this is what you are aiming at.
>>>>>>
>>>>>> On Friday, November 22, 2013, Jean-Marc Spaggiari wrote:
>>>>>>
>>>>>>    You can also simply do a MR job without any coprocessors nor Phoenix
>>>>>>
>>>>>>> required....
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/22 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
>>>>>>>
>>>>>>>    The best way is to go parallel with coprocessors. Try Phoenix which
>>>>>>>
>>>>>>>> has this built in or write your own.
>>>>>>>>
>>>>>>>> On Friday, November 22, 2013, Jilal Oussama wrote:
>>>>>>>>
>>>>>>>>    Hi all,
>>>>>>>>
>>>>>>>>> I am looking for some performance suggestions.
>>>>>>>>>
>>>>>>>>> I would like to get all the keys of table (wich contains ~100
>>>>>>>>> million
>>>>>>>>> rows).
>>>>>>>>>
>>>>>>>>> Currently, I am doing a FTS with FirstKeyOnlyFilter and
>>>>>>>>> KeyOnlyFilter
>>>>>>>>>
>>>>>>>>>   using
>>>>>>>>   Thrift from a python script, and I find it very slow ...
>>>>>>>>> Any suggestions whould be appreciable.
>>>>>>>>>
>>>>>>>>> HBase : 0.94.13
>>>>>>>>> Hadoop : 1.2.1
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>>>>
>>>>>>>>>    Confidentiality Notice:  The information contained in this
>>>>>>>>> message,
>>>>>>>>>
>>>>>>>> including any attachments hereto, may be confidential and is
>>>>> intended to be
>>>>> read only by the individual or entity to whom this message is
>>>>> addressed. If
>>>>> the reader of this message is not the intended recipient or an agent or
>>>>> designee of the intended recipient, please note that any review, use,
>>>>> disclosure or distribution of this message or its attachments, in any
>>>>> form,
>>>>> is strictly prohibited.  If you have received this message in error,
>>>>> please
>>>>> immediately notify the sender and/or Notifications@carrieriq.com and
>>>>> delete or destroy any copy of this message and its attachments.
>>>>>
>>>>>
>

Re: FTS performance

Posted by Haosong Huang <ha...@gmail.com>.

Thrift may be the bottleneck.


On Sat, Nov 23, 2013 at 6:09 AM, Oussama Jilal <ji...@gmail.com>wrote:

> Of course yes, I guess my only option for now is MR ...
>
>
> On 11/22/2013 09:34 PM, Jean-Marc Spaggiari wrote:
>
>> But even so. For FTS you will most probably want to benefit from
>> parallelism to be able to scale.
>>
>>
>> 2013/11/22 Oussama Jilal <ji...@gmail.com>
>>
>>  I am only loading the keys (using FirstKeyOnlyFilter and KeyOnlyFilter),
>>> not the entire rows.
>>>
>>>
>>> On 11/22/2013 07:03 PM, Vladimir Rodionov wrote:
>>>
>>>  Loading 100M rows over network from HBase server to HBase client is not
>>>> a
>>>> right approach
>>>> for someone looking for speed.
>>>>
>>>> Best regards,
>>>> Vladimir Rodionov
>>>> Principal Platform Engineer
>>>> Carrier IQ, www.carrieriq.com
>>>> e-mail: vrodionov@carrieriq.com
>>>>
>>>> ________________________________________
>>>> From: Jilal Oussama [jilal.oussama@gmail.com]
>>>> Sent: Friday, November 22, 2013 8:50 AM
>>>> To: Mailing List Apache HBase
>>>> Subject: Re: FTS performance
>>>>
>>>> What I have been avoiding all along ... MR & CoProcessors ... thanks all
>>>>
>>>>
>>>> 2013/11/22 Asaf Mesika <as...@gmail.com>
>>>>
>>>>   You're right, just in bear mind response time won't match an online
>>>> query
>>>>
>>>>> if this is what you are aiming at.
>>>>>
>>>>> On Friday, November 22, 2013, Jean-Marc Spaggiari wrote:
>>>>>
>>>>>   You can also simply do a MR job without any coprocessors nor Phoenix
>>>>>
>>>>>> required....
>>>>>>
>>>>>>
>>>>>> 2013/11/22 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
>>>>>>
>>>>>>   The best way is to go parallel with coprocessors. Try Phoenix which
>>>>>>
>>>>>>> has this built in or write your own.
>>>>>>>
>>>>>>> On Friday, November 22, 2013, Jilal Oussama wrote:
>>>>>>>
>>>>>>>   Hi all,
>>>>>>>
>>>>>>>> I am looking for some performance suggestions.
>>>>>>>>
>>>>>>>> I would like to get all the keys of table (wich contains ~100
>>>>>>>> million
>>>>>>>> rows).
>>>>>>>>
>>>>>>>> Currently, I am doing a FTS with FirstKeyOnlyFilter and
>>>>>>>> KeyOnlyFilter
>>>>>>>>
>>>>>>>>  using
>>>>>>>
>>>>>>>  Thrift from a python script, and I find it very slow ...
>>>>>>>>
>>>>>>>> Any suggestions whould be appreciable.
>>>>>>>>
>>>>>>>> HBase : 0.94.13
>>>>>>>> Hadoop : 1.2.1
>>>>>>>>
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>>   Confidentiality Notice:  The information contained in this
>>>>>>>> message,
>>>>>>>>
>>>>>>> including any attachments hereto, may be confidential and is
>>>> intended to be
>>>> read only by the individual or entity to whom this message is
>>>> addressed. If
>>>> the reader of this message is not the intended recipient or an agent or
>>>> designee of the intended recipient, please note that any review, use,
>>>> disclosure or distribution of this message or its attachments, in any
>>>> form,
>>>> is strictly prohibited.  If you have received this message in error,
>>>> please
>>>> immediately notify the sender and/or Notifications@carrieriq.com and
>>>> delete or destroy any copy of this message and its attachments.
>>>>
>>>>
>>>
>


-- 
Best Regards,
Haosdent Huang

Re: FTS performance

Posted by Oussama Jilal <ji...@gmail.com>.

Of course yes, I guess my only option for now is MR ...

On 11/22/2013 09:34 PM, Jean-Marc Spaggiari wrote:
> But even so. For FTS you will most probably want to benefit from
> parallelism to be able to scale.
>
>
> 2013/11/22 Oussama Jilal <ji...@gmail.com>
>
>> I am only loading the keys (using FirstKeyOnlyFilter and KeyOnlyFilter),
>> not the entire rows.
>>
>>
>> On 11/22/2013 07:03 PM, Vladimir Rodionov wrote:
>>
>>> Loading 100M rows over network from HBase server to HBase client is not a
>>> right approach
>>> for someone looking for speed.
>>>
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodionov@carrieriq.com
>>>
>>> ________________________________________
>>> From: Jilal Oussama [jilal.oussama@gmail.com]
>>> Sent: Friday, November 22, 2013 8:50 AM
>>> To: Mailing List Apache HBase
>>> Subject: Re: FTS performance
>>>
>>> What I have been avoiding all along ... MR & CoProcessors ... thanks all
>>>
>>>
>>> 2013/11/22 Asaf Mesika <as...@gmail.com>
>>>
>>>   You're right, just in bear mind response time won't match an online query
>>>> if this is what you are aiming at.
>>>>
>>>> On Friday, November 22, 2013, Jean-Marc Spaggiari wrote:
>>>>
>>>>   You can also simply do a MR job without any coprocessors nor Phoenix
>>>>> required....
>>>>>
>>>>>
>>>>> 2013/11/22 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
>>>>>
>>>>>   The best way is to go parallel with coprocessors. Try Phoenix which
>>>>>> has this built in or write your own.
>>>>>>
>>>>>> On Friday, November 22, 2013, Jilal Oussama wrote:
>>>>>>
>>>>>>   Hi all,
>>>>>>> I am looking for some performance suggestions.
>>>>>>>
>>>>>>> I would like to get all the keys of table (wich contains ~100 million
>>>>>>> rows).
>>>>>>>
>>>>>>> Currently, I am doing a FTS with FirstKeyOnlyFilter and KeyOnlyFilter
>>>>>>>
>>>>>> using
>>>>>>
>>>>>>> Thrift from a python script, and I find it very slow ...
>>>>>>>
>>>>>>> Any suggestions whould be appreciable.
>>>>>>>
>>>>>>> HBase : 0.94.13
>>>>>>> Hadoop : 1.2.1
>>>>>>>
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>>   Confidentiality Notice:  The information contained in this message,
>>> including any attachments hereto, may be confidential and is intended to be
>>> read only by the individual or entity to whom this message is addressed. If
>>> the reader of this message is not the intended recipient or an agent or
>>> designee of the intended recipient, please note that any review, use,
>>> disclosure or distribution of this message or its attachments, in any form,
>>> is strictly prohibited.  If you have received this message in error, please
>>> immediately notify the sender and/or Notifications@carrieriq.com and
>>> delete or destroy any copy of this message and its attachments.
>>>
>>

Re: FTS performance

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

But even so. For FTS you will most probably want to benefit from
parallelism to be able to scale.


2013/11/22 Oussama Jilal <ji...@gmail.com>

> I am only loading the keys (using FirstKeyOnlyFilter and KeyOnlyFilter),
> not the entire rows.
>
>
> On 11/22/2013 07:03 PM, Vladimir Rodionov wrote:
>
>> Loading 100M rows over network from HBase server to HBase client is not a
>> right approach
>> for someone looking for speed.
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: vrodionov@carrieriq.com
>>
>> ________________________________________
>> From: Jilal Oussama [jilal.oussama@gmail.com]
>> Sent: Friday, November 22, 2013 8:50 AM
>> To: Mailing List Apache HBase
>> Subject: Re: FTS performance
>>
>> What I have been avoiding all along ... MR & CoProcessors ... thanks all
>>
>>
>> 2013/11/22 Asaf Mesika <as...@gmail.com>
>>
>>  You're right, just in bear mind response time won't match an online query
>>> if this is what you are aiming at.
>>>
>>> On Friday, November 22, 2013, Jean-Marc Spaggiari wrote:
>>>
>>>  You can also simply do a MR job without any coprocessors nor Phoenix
>>>> required....
>>>>
>>>>
>>>> 2013/11/22 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
>>>>
>>>>  The best way is to go parallel with coprocessors. Try Phoenix which
>>>>> has this built in or write your own.
>>>>>
>>>>> On Friday, November 22, 2013, Jilal Oussama wrote:
>>>>>
>>>>>  Hi all,
>>>>>>
>>>>>> I am looking for some performance suggestions.
>>>>>>
>>>>>> I would like to get all the keys of table (wich contains ~100 million
>>>>>> rows).
>>>>>>
>>>>>> Currently, I am doing a FTS with FirstKeyOnlyFilter and KeyOnlyFilter
>>>>>>
>>>>> using
>>>>>
>>>>>> Thrift from a python script, and I find it very slow ...
>>>>>>
>>>>>> Any suggestions whould be appreciable.
>>>>>>
>>>>>> HBase : 0.94.13
>>>>>> Hadoop : 1.2.1
>>>>>>
>>>>>> Thanks in advance.
>>>>>>
>>>>>>  Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or Notifications@carrieriq.com and
>> delete or destroy any copy of this message and its attachments.
>>
>
>

Re: FTS performance

Posted by Oussama Jilal <ji...@gmail.com>.

I am only loading the keys (using FirstKeyOnlyFilter and KeyOnlyFilter), 
not the entire rows.

On 11/22/2013 07:03 PM, Vladimir Rodionov wrote:
> Loading 100M rows over network from HBase server to HBase client is not a right approach
> for someone looking for speed.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Jilal Oussama [jilal.oussama@gmail.com]
> Sent: Friday, November 22, 2013 8:50 AM
> To: Mailing List Apache HBase
> Subject: Re: FTS performance
>
> What I have been avoiding all along ... MR & CoProcessors ... thanks all
>
>
> 2013/11/22 Asaf Mesika <as...@gmail.com>
>
>> You're right, just in bear mind response time won't match an online query
>> if this is what you are aiming at.
>>
>> On Friday, November 22, 2013, Jean-Marc Spaggiari wrote:
>>
>>> You can also simply do a MR job without any coprocessors nor Phoenix
>>> required....
>>>
>>>
>>> 2013/11/22 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
>>>
>>>> The best way is to go parallel with coprocessors. Try Phoenix which
>>>> has this built in or write your own.
>>>>
>>>> On Friday, November 22, 2013, Jilal Oussama wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am looking for some performance suggestions.
>>>>>
>>>>> I would like to get all the keys of table (wich contains ~100 million
>>>>> rows).
>>>>>
>>>>> Currently, I am doing a FTS with FirstKeyOnlyFilter and KeyOnlyFilter
>>>> using
>>>>> Thrift from a python script, and I find it very slow ...
>>>>>
>>>>> Any suggestions whould be appreciable.
>>>>>
>>>>> HBase : 0.94.13
>>>>> Hadoop : 1.2.1
>>>>>
>>>>> Thanks in advance.
>>>>>
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: FTS performance

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

Loading 100M rows over network from HBase server to HBase client is not a right approach
for someone looking for speed.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Jilal Oussama [jilal.oussama@gmail.com]
Sent: Friday, November 22, 2013 8:50 AM
To: Mailing List Apache HBase
Subject: Re: FTS performance

What I have been avoiding all along ... MR & CoProcessors ... thanks all

2013/11/22 Asaf Mesika <as...@gmail.com>

> You're right, just in bear mind response time won't match an online query
> if this is what you are aiming at.
>
> On Friday, November 22, 2013, Jean-Marc Spaggiari wrote:
>
> > You can also simply do a MR job without any coprocessors nor Phoenix
> > required....
> >
> >
> > 2013/11/22 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
> >
> > > The best way is to go parallel with coprocessors. Try Phoenix which
> > > has this built in or write your own.
> > >
> > > On Friday, November 22, 2013, Jilal Oussama wrote:
> > >
> > > > Hi all,
> > > >
> > > > I am looking for some performance suggestions.
> > > >
> > > > I would like to get all the keys of table (wich contains ~100 million
> > > > rows).
> > > >
> > > > Currently, I am doing a FTS with FirstKeyOnlyFilter and KeyOnlyFilter
> > > using
> > > > Thrift from a python script, and I find it very slow ...
> > > >
> > > > Any suggestions whould be appreciable.
> > > >
> > > > HBase : 0.94.13
> > > > Hadoop : 1.2.1
> > > >
> > > > Thanks in advance.
> > > >
> > >
> >
>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: FTS performance

Posted by Jilal Oussama <ji...@gmail.com>.

What I have been avoiding all along ... MR & CoProcessors ... thanks all


2013/11/22 Asaf Mesika <as...@gmail.com>

> You're right, just in bear mind response time won't match an online query
> if this is what you are aiming at.
>
> On Friday, November 22, 2013, Jean-Marc Spaggiari wrote:
>
> > You can also simply do a MR job without any coprocessors nor Phoenix
> > required....
> >
> >
> > 2013/11/22 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
> >
> > > The best way is to go parallel with coprocessors. Try Phoenix which
> > > has this built in or write your own.
> > >
> > > On Friday, November 22, 2013, Jilal Oussama wrote:
> > >
> > > > Hi all,
> > > >
> > > > I am looking for some performance suggestions.
> > > >
> > > > I would like to get all the keys of table (wich contains ~100 million
> > > > rows).
> > > >
> > > > Currently, I am doing a FTS with FirstKeyOnlyFilter and KeyOnlyFilter
> > > using
> > > > Thrift from a python script, and I find it very slow ...
> > > >
> > > > Any suggestions whould be appreciable.
> > > >
> > > > HBase : 0.94.13
> > > > Hadoop : 1.2.1
> > > >
> > > > Thanks in advance.
> > > >
> > >
> >
>

Re: FTS performance

Posted by Asaf Mesika <as...@gmail.com>.

You're right, just in bear mind response time won't match an online query
if this is what you are aiming at.

On Friday, November 22, 2013, Jean-Marc Spaggiari wrote:

> You can also simply do a MR job without any coprocessors nor Phoenix
> required....
>
>
> 2013/11/22 Asaf Mesika <asaf.mesika@gmail.com <javascript:;>>
>
> > The best way is to go parallel with coprocessors. Try Phoenix which
> > has this built in or write your own.
> >
> > On Friday, November 22, 2013, Jilal Oussama wrote:
> >
> > > Hi all,
> > >
> > > I am looking for some performance suggestions.
> > >
> > > I would like to get all the keys of table (wich contains ~100 million
> > > rows).
> > >
> > > Currently, I am doing a FTS with FirstKeyOnlyFilter and KeyOnlyFilter
> > using
> > > Thrift from a python script, and I find it very slow ...
> > >
> > > Any suggestions whould be appreciable.
> > >
> > > HBase : 0.94.13
> > > Hadoop : 1.2.1
> > >
> > > Thanks in advance.
> > >
> >
>

Re: FTS performance

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

You can also simply do a MR job without any coprocessors nor Phoenix
required....


2013/11/22 Asaf Mesika <as...@gmail.com>

> The best way is to go parallel with coprocessors. Try Phoenix which
> has this built in or write your own.
>
> On Friday, November 22, 2013, Jilal Oussama wrote:
>
> > Hi all,
> >
> > I am looking for some performance suggestions.
> >
> > I would like to get all the keys of table (wich contains ~100 million
> > rows).
> >
> > Currently, I am doing a FTS with FirstKeyOnlyFilter and KeyOnlyFilter
> using
> > Thrift from a python script, and I find it very slow ...
> >
> > Any suggestions whould be appreciable.
> >
> > HBase : 0.94.13
> > Hadoop : 1.2.1
> >
> > Thanks in advance.
> >
>

Re: FTS performance

Posted by Asaf Mesika <as...@gmail.com>.

The best way is to go parallel with coprocessors. Try Phoenix which
has this built in or write your own.

On Friday, November 22, 2013, Jilal Oussama wrote:

> Hi all,
>
> I am looking for some performance suggestions.
>
> I would like to get all the keys of table (wich contains ~100 million
> rows).
>
> Currently, I am doing a FTS with FirstKeyOnlyFilter and KeyOnlyFilter using
> Thrift from a python script, and I find it very slow ...
>
> Any suggestions whould be appreciable.
>
> HBase : 0.94.13
> Hadoop : 1.2.1
>
> Thanks in advance.
>