You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2013/01/04 16:24:16 UTC

Fastest way to find is a row exist?

Hi,

What's the fastest way to know if a row exist?

Today I'm doing that:

Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
Result entry_exist = table_entry.get(get_entry_exist);

But should this be faster?
Get get_entry_exist = new Get(key);
Result entry_exist = table_entry.get(get_entry_exist);

There is only one CF and one C on my table.

Or is there an even faster way?

Also, is there a way to make that even faster? I think BloomFilters
can help, right?

Thanks,

JM

Re: Fastest way to find is a row exist?

Posted by Anton Lyska <an...@wildec.com>.
Hi,

using KeyOnlyFilter will prevent sending value via netrwork

04.01.2013 17:24, Jean-Marc Spaggiari пишет:
> Hi,
>
> What's the fastest way to know if a row exist?
>
> Today I'm doing that:
>
> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
> Result entry_exist = table_entry.get(get_entry_exist);
>
> But should this be faster?
> Get get_entry_exist = new Get(key);
> Result entry_exist = table_entry.get(get_entry_exist);
>
> There is only one CF and one C on my table.
>
> Or is there an even faster way?
>
> Also, is there a way to make that even faster? I think BloomFilters
> can help, right?
>
> Thanks,
>
> JM
>


Re: Fastest way to find is a row exist?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Finally, I looked at how exists(Get) is done and build
exists(List<Get>)... (HBASE-7503)

I will run some bench to compare what is faster. batch(List<Get>) or
exists(List<Get>)... I build it for 0.94 too and will deploy the
updated build on my cluster...

2013/1/6, Asaf Mesika <as...@gmail.com>:
> Why not write your own filter class which you can initialize with a
> set of keys to search for.
> The HTable on the client side will split the keys based on row keys so
> it will be sent to the right regions. There your filter can utilize
> SEEK_NEXT_USING_HINT Return Code to see efficiently on those set of
> key values
> This will ensure you do this search in one rpc call.
> Your filter can also transform the KeyValue so that only the row keys
> are returned
>
> Sent from my iPad
>
> On 6 בינו 2013, at 05:46, Mohamed Ibrahim <m0...@gmail.com> wrote:
>
>> Sorry, I didn't notice your email about packing 500 operations before.
>>
>> You might actually benefit from checking with a batch of Gets vs
>> individual
>> exists.
>>
>> Best,
>> Mohamed
>>
>>
>> On Sat, Jan 5, 2013 at 8:29 AM, Jean-Marc Spaggiari
>> <jean-marc@spaggiari.org
>>> wrote:
>>
>>> Hum, very interesting!
>>>
>>> Now, what's the best option? Array of get which will retrieve more
>>> information? Or multiple HTable.exits one by one?
>>>
>>> The best will have been to have an array of gets passed to the
>>> exist... I will see how big it is to add that...
>>>
>>> JM
>>>
>>> 2013/1/4, Mohamed Ibrahim <m0...@gmail.com>:
>>>> What about HTable.exists ??
>>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#exists(org.apache.hadoop.hbase.client.Get)
>>>>
>>>> I think that should work if the Get has only the row key.
>>>>
>>>> Mohamed
>>>>
>>>>
>>>> On Fri, Jan 4, 2013 at 3:17 PM, Adrien Mogenet
>>>> <ad...@gmail.com>wrote:
>>>>
>>>>> On every Get, BloomFilter is acting as a filter (!) on top of each
>>>>> HFile
>>>>> and allows to check if a key is absent from the HFile. So yes, you
>>>>> will
>>>>> benefit from these filters.
>>>>>
>>>>>
>>>>> On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari <
>>>>> jean-marc@spaggiari.org
>>>>>> wrote:
>>>>>
>>>>>> Is KeyOnlyFilter using the BloomFilters too?
>>>>>>
>>>>>> Here is, with more details, what I'm doing.
>>>>>>
>>>>>> Few questions.
>>>>>> - Can I create one single KeyOnlyFilter and give the same filter to
>>>>>> all the gets?
>>>>>> - Will bloom filters benefit in such scenario? My key is small. Let's
>>>>>> say average 128 bytes.
>>>>>>
>>>>>> The goal here is to check about 500 entries at a time to validate if
>>>>>> they already exist or not.
>>>>>>
>>>>>> In my MR, I'm starting when I have more than 100K lines to handle,
>>>>>> and
>>>>>> each line car have up to 1K entries. So it can result up to 100M
>>>>>> gets... Job took initially 500 minutes to complete. I have added few
>>>>>> pretty good nodes and it's not taking less than 300 minutes. But I
>>>>>> would like to get under 100 minutes if I can...
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> JM
>>>>>>
>>>>>>        Vector<Get> gets_entry_exist = new Vector<Get>();
>>>>>>        for (Entry entry : entries.getEntries())
>>>>>>        {
>>>>>>                Get entry_exist = new Get(entry.toKey());
>>>>>>                entry_exist.setFilter(new KeyOnlyFilter());
>>>>>>                gets_entry_exist.add(entry_exist);
>>>>>>        }
>>>>>>
>>>>>>        Result[] result_entry_exist =
>>>>>> table_entry.get(gets_entry_exist);
>>>>>>
>>>>>>        int index = 0;
>>>>>>        for (Entry entry : entries.getEntries())
>>>>>>        {
>>>>>>                boolean isEmpty =
>>>>>> result_entry_exist[index++].isEmpty();
>>>>>>                if (isEmpty)
>>>>>>                {
>>>>>>                        // Process here
>>>>>>                }
>>>>>>        }
>>>>>>                                                {
>>>>>>
>>>>>>
>>>>>> 2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
>>>>>>> Hello Jean-Marc,
>>>>>>>
>>>>>>> BloomFilters are just designed for that.
>>>>>>>
>>>>>>> But they say if a row doesn't exist with a ash of the key (not the
>>>>>> oposit,
>>>>>>> 2 rowkeys could have the same ash result).
>>>>>>>
>>>>>>> If you want to be sure the rowkey exists you have to search for it
>>> in
>>>>> the
>>>>>>> HFile ( the whole mechanism is transparent with the get() ).
>>>>>>>
>>>>>>> Their is also an KeOnlyFilter
>>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
>>>>>>> preventing from getting the whole columns of the existing key as
>>>>>>> return
>>>>>>> (which could be heavy).
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> --
>>>>>>> Damien
>>>>>>>
>>>>>>>
>>>>>>> 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> What's the fastest way to know if a row exist?
>>>>>>>>
>>>>>>>> Today I'm doing that:
>>>>>>>>
>>>>>>>> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
>>>>>>>> Result entry_exist = table_entry.get(get_entry_exist);
>>>>>>>>
>>>>>>>> But should this be faster?
>>>>>>>> Get get_entry_exist = new Get(key);
>>>>>>>> Result entry_exist = table_entry.get(get_entry_exist);
>>>>>>>>
>>>>>>>> There is only one CF and one C on my table.
>>>>>>>>
>>>>>>>> Or is there an even faster way?
>>>>>>>>
>>>>>>>> Also, is there a way to make that even faster? I think BloomFilters
>>>>>>>> can help, right?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> JM
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Adrien Mogenet
>>>>> 06.59.16.64.22
>>>>> http://www.mogenet.me
>>>
>

Re: Fastest way to find is a row exist?

Posted by Asaf Mesika <as...@gmail.com>.
Why not write your own filter class which you can initialize with a
set of keys to search for.
The HTable on the client side will split the keys based on row keys so
it will be sent to the right regions. There your filter can utilize
SEEK_NEXT_USING_HINT Return Code to see efficiently on those set of
key values
This will ensure you do this search in one rpc call.
Your filter can also transform the KeyValue so that only the row keys
are returned

Sent from my iPad

On 6 בינו 2013, at 05:46, Mohamed Ibrahim <m0...@gmail.com> wrote:

> Sorry, I didn't notice your email about packing 500 operations before.
>
> You might actually benefit from checking with a batch of Gets vs individual
> exists.
>
> Best,
> Mohamed
>
>
> On Sat, Jan 5, 2013 at 8:29 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
>> wrote:
>
>> Hum, very interesting!
>>
>> Now, what's the best option? Array of get which will retrieve more
>> information? Or multiple HTable.exits one by one?
>>
>> The best will have been to have an array of gets passed to the
>> exist... I will see how big it is to add that...
>>
>> JM
>>
>> 2013/1/4, Mohamed Ibrahim <m0...@gmail.com>:
>>> What about HTable.exists ??
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#exists(org.apache.hadoop.hbase.client.Get)
>>>
>>> I think that should work if the Get has only the row key.
>>>
>>> Mohamed
>>>
>>>
>>> On Fri, Jan 4, 2013 at 3:17 PM, Adrien Mogenet
>>> <ad...@gmail.com>wrote:
>>>
>>>> On every Get, BloomFilter is acting as a filter (!) on top of each HFile
>>>> and allows to check if a key is absent from the HFile. So yes, you will
>>>> benefit from these filters.
>>>>
>>>>
>>>> On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari <
>>>> jean-marc@spaggiari.org
>>>>> wrote:
>>>>
>>>>> Is KeyOnlyFilter using the BloomFilters too?
>>>>>
>>>>> Here is, with more details, what I'm doing.
>>>>>
>>>>> Few questions.
>>>>> - Can I create one single KeyOnlyFilter and give the same filter to
>>>>> all the gets?
>>>>> - Will bloom filters benefit in such scenario? My key is small. Let's
>>>>> say average 128 bytes.
>>>>>
>>>>> The goal here is to check about 500 entries at a time to validate if
>>>>> they already exist or not.
>>>>>
>>>>> In my MR, I'm starting when I have more than 100K lines to handle, and
>>>>> each line car have up to 1K entries. So it can result up to 100M
>>>>> gets... Job took initially 500 minutes to complete. I have added few
>>>>> pretty good nodes and it's not taking less than 300 minutes. But I
>>>>> would like to get under 100 minutes if I can...
>>>>>
>>>>> Thanks,
>>>>>
>>>>> JM
>>>>>
>>>>>        Vector<Get> gets_entry_exist = new Vector<Get>();
>>>>>        for (Entry entry : entries.getEntries())
>>>>>        {
>>>>>                Get entry_exist = new Get(entry.toKey());
>>>>>                entry_exist.setFilter(new KeyOnlyFilter());
>>>>>                gets_entry_exist.add(entry_exist);
>>>>>        }
>>>>>
>>>>>        Result[] result_entry_exist =
>>>>> table_entry.get(gets_entry_exist);
>>>>>
>>>>>        int index = 0;
>>>>>        for (Entry entry : entries.getEntries())
>>>>>        {
>>>>>                boolean isEmpty =
>>>>> result_entry_exist[index++].isEmpty();
>>>>>                if (isEmpty)
>>>>>                {
>>>>>                        // Process here
>>>>>                }
>>>>>        }
>>>>>                                                {
>>>>>
>>>>>
>>>>> 2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
>>>>>> Hello Jean-Marc,
>>>>>>
>>>>>> BloomFilters are just designed for that.
>>>>>>
>>>>>> But they say if a row doesn't exist with a ash of the key (not the
>>>>> oposit,
>>>>>> 2 rowkeys could have the same ash result).
>>>>>>
>>>>>> If you want to be sure the rowkey exists you have to search for it
>> in
>>>> the
>>>>>> HFile ( the whole mechanism is transparent with the get() ).
>>>>>>
>>>>>> Their is also an KeOnlyFilter
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
>>>>>> preventing from getting the whole columns of the existing key as
>>>>>> return
>>>>>> (which could be heavy).
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> --
>>>>>> Damien
>>>>>>
>>>>>>
>>>>>> 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> What's the fastest way to know if a row exist?
>>>>>>>
>>>>>>> Today I'm doing that:
>>>>>>>
>>>>>>> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
>>>>>>> Result entry_exist = table_entry.get(get_entry_exist);
>>>>>>>
>>>>>>> But should this be faster?
>>>>>>> Get get_entry_exist = new Get(key);
>>>>>>> Result entry_exist = table_entry.get(get_entry_exist);
>>>>>>>
>>>>>>> There is only one CF and one C on my table.
>>>>>>>
>>>>>>> Or is there an even faster way?
>>>>>>>
>>>>>>> Also, is there a way to make that even faster? I think BloomFilters
>>>>>>> can help, right?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> JM
>>>>
>>>>
>>>>
>>>> --
>>>> Adrien Mogenet
>>>> 06.59.16.64.22
>>>> http://www.mogenet.me
>>

Re: Fastest way to find is a row exist?

Posted by Mohamed Ibrahim <m0...@gmail.com>.
Sorry, I didn't notice your email about packing 500 operations before.

You might actually benefit from checking with a batch of Gets vs individual
exists.

Best,
Mohamed


On Sat, Jan 5, 2013 at 8:29 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hum, very interesting!
>
> Now, what's the best option? Array of get which will retrieve more
> information? Or multiple HTable.exits one by one?
>
> The best will have been to have an array of gets passed to the
> exist... I will see how big it is to add that...
>
> JM
>
> 2013/1/4, Mohamed Ibrahim <m0...@gmail.com>:
> > What about HTable.exists ??
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#exists(org.apache.hadoop.hbase.client.Get)
> >
> > I think that should work if the Get has only the row key.
> >
> > Mohamed
> >
> >
> > On Fri, Jan 4, 2013 at 3:17 PM, Adrien Mogenet
> > <ad...@gmail.com>wrote:
> >
> >> On every Get, BloomFilter is acting as a filter (!) on top of each HFile
> >> and allows to check if a key is absent from the HFile. So yes, you will
> >> benefit from these filters.
> >>
> >>
> >> On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari <
> >> jean-marc@spaggiari.org
> >> > wrote:
> >>
> >> > Is KeyOnlyFilter using the BloomFilters too?
> >> >
> >> > Here is, with more details, what I'm doing.
> >> >
> >> > Few questions.
> >> > - Can I create one single KeyOnlyFilter and give the same filter to
> >> > all the gets?
> >> > - Will bloom filters benefit in such scenario? My key is small. Let's
> >> > say average 128 bytes.
> >> >
> >> > The goal here is to check about 500 entries at a time to validate if
> >> > they already exist or not.
> >> >
> >> > In my MR, I'm starting when I have more than 100K lines to handle, and
> >> > each line car have up to 1K entries. So it can result up to 100M
> >> > gets... Job took initially 500 minutes to complete. I have added few
> >> > pretty good nodes and it's not taking less than 300 minutes. But I
> >> > would like to get under 100 minutes if I can...
> >> >
> >> > Thanks,
> >> >
> >> > JM
> >> >
> >> >         Vector<Get> gets_entry_exist = new Vector<Get>();
> >> >         for (Entry entry : entries.getEntries())
> >> >         {
> >> >                 Get entry_exist = new Get(entry.toKey());
> >> >                 entry_exist.setFilter(new KeyOnlyFilter());
> >> >                 gets_entry_exist.add(entry_exist);
> >> >         }
> >> >
> >> >         Result[] result_entry_exist =
> >> > table_entry.get(gets_entry_exist);
> >> >
> >> >         int index = 0;
> >> >         for (Entry entry : entries.getEntries())
> >> >         {
> >> >                 boolean isEmpty =
> >> > result_entry_exist[index++].isEmpty();
> >> >                 if (isEmpty)
> >> >                 {
> >> >                         // Process here
> >> >                 }
> >> >         }
> >> >                                                 {
> >> >
> >> >
> >> > 2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
> >> > > Hello Jean-Marc,
> >> > >
> >> > > BloomFilters are just designed for that.
> >> > >
> >> > > But they say if a row doesn't exist with a ash of the key (not the
> >> > oposit,
> >> > > 2 rowkeys could have the same ash result).
> >> > >
> >> > > If you want to be sure the rowkey exists you have to search for it
> in
> >> the
> >> > > HFile ( the whole mechanism is transparent with the get() ).
> >> > >
> >> > > Their is also an KeOnlyFilter
> >> > >
> >> >
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
> >> > > preventing from getting the whole columns of the existing key as
> >> > > return
> >> > > (which could be heavy).
> >> > >
> >> > > Cheers,
> >> > >
> >> > > --
> >> > > Damien
> >> > >
> >> > >
> >> > > 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
> >> > >
> >> > >> Hi,
> >> > >>
> >> > >> What's the fastest way to know if a row exist?
> >> > >>
> >> > >> Today I'm doing that:
> >> > >>
> >> > >> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
> >> > >> Result entry_exist = table_entry.get(get_entry_exist);
> >> > >>
> >> > >> But should this be faster?
> >> > >> Get get_entry_exist = new Get(key);
> >> > >> Result entry_exist = table_entry.get(get_entry_exist);
> >> > >>
> >> > >> There is only one CF and one C on my table.
> >> > >>
> >> > >> Or is there an even faster way?
> >> > >>
> >> > >> Also, is there a way to make that even faster? I think BloomFilters
> >> > >> can help, right?
> >> > >>
> >> > >> Thanks,
> >> > >>
> >> > >> JM
> >> > >>
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Adrien Mogenet
> >> 06.59.16.64.22
> >> http://www.mogenet.me
> >>
> >
>

Re: Fastest way to find is a row exist?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hum, very interesting!

Now, what's the best option? Array of get which will retrieve more
information? Or multiple HTable.exits one by one?

The best will have been to have an array of gets passed to the
exist... I will see how big it is to add that...

JM

2013/1/4, Mohamed Ibrahim <m0...@gmail.com>:
> What about HTable.exists ??
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#exists(org.apache.hadoop.hbase.client.Get)
>
> I think that should work if the Get has only the row key.
>
> Mohamed
>
>
> On Fri, Jan 4, 2013 at 3:17 PM, Adrien Mogenet
> <ad...@gmail.com>wrote:
>
>> On every Get, BloomFilter is acting as a filter (!) on top of each HFile
>> and allows to check if a key is absent from the HFile. So yes, you will
>> benefit from these filters.
>>
>>
>> On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org
>> > wrote:
>>
>> > Is KeyOnlyFilter using the BloomFilters too?
>> >
>> > Here is, with more details, what I'm doing.
>> >
>> > Few questions.
>> > - Can I create one single KeyOnlyFilter and give the same filter to
>> > all the gets?
>> > - Will bloom filters benefit in such scenario? My key is small. Let's
>> > say average 128 bytes.
>> >
>> > The goal here is to check about 500 entries at a time to validate if
>> > they already exist or not.
>> >
>> > In my MR, I'm starting when I have more than 100K lines to handle, and
>> > each line car have up to 1K entries. So it can result up to 100M
>> > gets... Job took initially 500 minutes to complete. I have added few
>> > pretty good nodes and it's not taking less than 300 minutes. But I
>> > would like to get under 100 minutes if I can...
>> >
>> > Thanks,
>> >
>> > JM
>> >
>> >         Vector<Get> gets_entry_exist = new Vector<Get>();
>> >         for (Entry entry : entries.getEntries())
>> >         {
>> >                 Get entry_exist = new Get(entry.toKey());
>> >                 entry_exist.setFilter(new KeyOnlyFilter());
>> >                 gets_entry_exist.add(entry_exist);
>> >         }
>> >
>> >         Result[] result_entry_exist =
>> > table_entry.get(gets_entry_exist);
>> >
>> >         int index = 0;
>> >         for (Entry entry : entries.getEntries())
>> >         {
>> >                 boolean isEmpty =
>> > result_entry_exist[index++].isEmpty();
>> >                 if (isEmpty)
>> >                 {
>> >                         // Process here
>> >                 }
>> >         }
>> >                                                 {
>> >
>> >
>> > 2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
>> > > Hello Jean-Marc,
>> > >
>> > > BloomFilters are just designed for that.
>> > >
>> > > But they say if a row doesn't exist with a ash of the key (not the
>> > oposit,
>> > > 2 rowkeys could have the same ash result).
>> > >
>> > > If you want to be sure the rowkey exists you have to search for it in
>> the
>> > > HFile ( the whole mechanism is transparent with the get() ).
>> > >
>> > > Their is also an KeOnlyFilter
>> > >
>> >
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
>> > > preventing from getting the whole columns of the existing key as
>> > > return
>> > > (which could be heavy).
>> > >
>> > > Cheers,
>> > >
>> > > --
>> > > Damien
>> > >
>> > >
>> > > 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
>> > >
>> > >> Hi,
>> > >>
>> > >> What's the fastest way to know if a row exist?
>> > >>
>> > >> Today I'm doing that:
>> > >>
>> > >> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
>> > >> Result entry_exist = table_entry.get(get_entry_exist);
>> > >>
>> > >> But should this be faster?
>> > >> Get get_entry_exist = new Get(key);
>> > >> Result entry_exist = table_entry.get(get_entry_exist);
>> > >>
>> > >> There is only one CF and one C on my table.
>> > >>
>> > >> Or is there an even faster way?
>> > >>
>> > >> Also, is there a way to make that even faster? I think BloomFilters
>> > >> can help, right?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> JM
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> Adrien Mogenet
>> 06.59.16.64.22
>> http://www.mogenet.me
>>
>

Re: Fastest way to find is a row exist?

Posted by Mohamed Ibrahim <m0...@gmail.com>.
What about HTable.exists ??
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#exists(org.apache.hadoop.hbase.client.Get)

I think that should work if the Get has only the row key.

Mohamed


On Fri, Jan 4, 2013 at 3:17 PM, Adrien Mogenet <ad...@gmail.com>wrote:

> On every Get, BloomFilter is acting as a filter (!) on top of each HFile
> and allows to check if a key is absent from the HFile. So yes, you will
> benefit from these filters.
>
>
> On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > wrote:
>
> > Is KeyOnlyFilter using the BloomFilters too?
> >
> > Here is, with more details, what I'm doing.
> >
> > Few questions.
> > - Can I create one single KeyOnlyFilter and give the same filter to
> > all the gets?
> > - Will bloom filters benefit in such scenario? My key is small. Let's
> > say average 128 bytes.
> >
> > The goal here is to check about 500 entries at a time to validate if
> > they already exist or not.
> >
> > In my MR, I'm starting when I have more than 100K lines to handle, and
> > each line car have up to 1K entries. So it can result up to 100M
> > gets... Job took initially 500 minutes to complete. I have added few
> > pretty good nodes and it's not taking less than 300 minutes. But I
> > would like to get under 100 minutes if I can...
> >
> > Thanks,
> >
> > JM
> >
> >         Vector<Get> gets_entry_exist = new Vector<Get>();
> >         for (Entry entry : entries.getEntries())
> >         {
> >                 Get entry_exist = new Get(entry.toKey());
> >                 entry_exist.setFilter(new KeyOnlyFilter());
> >                 gets_entry_exist.add(entry_exist);
> >         }
> >
> >         Result[] result_entry_exist = table_entry.get(gets_entry_exist);
> >
> >         int index = 0;
> >         for (Entry entry : entries.getEntries())
> >         {
> >                 boolean isEmpty =  result_entry_exist[index++].isEmpty();
> >                 if (isEmpty)
> >                 {
> >                         // Process here
> >                 }
> >         }
> >                                                 {
> >
> >
> > 2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
> > > Hello Jean-Marc,
> > >
> > > BloomFilters are just designed for that.
> > >
> > > But they say if a row doesn't exist with a ash of the key (not the
> > oposit,
> > > 2 rowkeys could have the same ash result).
> > >
> > > If you want to be sure the rowkey exists you have to search for it in
> the
> > > HFile ( the whole mechanism is transparent with the get() ).
> > >
> > > Their is also an KeOnlyFilter
> > >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
> > > preventing from getting the whole columns of the existing key as return
> > > (which could be heavy).
> > >
> > > Cheers,
> > >
> > > --
> > > Damien
> > >
> > >
> > > 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
> > >
> > >> Hi,
> > >>
> > >> What's the fastest way to know if a row exist?
> > >>
> > >> Today I'm doing that:
> > >>
> > >> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
> > >> Result entry_exist = table_entry.get(get_entry_exist);
> > >>
> > >> But should this be faster?
> > >> Get get_entry_exist = new Get(key);
> > >> Result entry_exist = table_entry.get(get_entry_exist);
> > >>
> > >> There is only one CF and one C on my table.
> > >>
> > >> Or is there an even faster way?
> > >>
> > >> Also, is there a way to make that even faster? I think BloomFilters
> > >> can help, right?
> > >>
> > >> Thanks,
> > >>
> > >> JM
> > >>
> > >
> >
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

Re: Fastest way to find is a row exist?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
I want to remove it because I have set it up on the wrong column ;) I
should have used NAME => 'a' instead of ='@' ;)

I have setup the kof on the code and redeployed. I have also added the
bloom on the right column. I will remove the wrong one later.

As soon as the compaction is done I will restart my MR and keep
fingers crossed...

2013/1/4, Bryan Beaudreault <bb...@hubspot.com>:
> Why do you want to remove the bloom filter?  I think you should keep the
> bloom filter but also use the KeyOnlyFilter to cut down on data transferred
> over the wire.
>
>
> On Fri, Jan 4, 2013 at 3:28 PM, Jean-Marc Spaggiari
> <jean-marc@spaggiari.org
>> wrote:
>
>> Ok. I have activate them on 2 of my main tables and I will re-run the
>> job and see.
>>
>> 2 other questions then ;)
>>
>> 1) I have activated them that way: alter 'work_proposed', NAME => '@',
>> BLOOMFILTER => 'ROW' how can I remove them?
>> 2) Should I major_compact to make sure all the hash are stored?
>>
>> Thanks,
>>
>> JM
>>
>> 2013/1/4, Adrien Mogenet <ad...@gmail.com>:
>> > On every Get, BloomFilter is acting as a filter (!) on top of each
>> > HFile
>> > and allows to check if a key is absent from the HFile. So yes, you will
>> > benefit from these filters.
>> >
>> >
>> > On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari
>> > <jean-marc@spaggiari.org
>> >> wrote:
>> >
>> >> Is KeyOnlyFilter using the BloomFilters too?
>> >>
>> >> Here is, with more details, what I'm doing.
>> >>
>> >> Few questions.
>> >> - Can I create one single KeyOnlyFilter and give the same filter to
>> >> all the gets?
>> >> - Will bloom filters benefit in such scenario? My key is small. Let's
>> >> say average 128 bytes.
>> >>
>> >> The goal here is to check about 500 entries at a time to validate if
>> >> they already exist or not.
>> >>
>> >> In my MR, I'm starting when I have more than 100K lines to handle, and
>> >> each line car have up to 1K entries. So it can result up to 100M
>> >> gets... Job took initially 500 minutes to complete. I have added few
>> >> pretty good nodes and it's not taking less than 300 minutes. But I
>> >> would like to get under 100 minutes if I can...
>> >>
>> >> Thanks,
>> >>
>> >> JM
>> >>
>> >>         Vector<Get> gets_entry_exist = new Vector<Get>();
>> >>         for (Entry entry : entries.getEntries())
>> >>         {
>> >>                 Get entry_exist = new Get(entry.toKey());
>> >>                 entry_exist.setFilter(new KeyOnlyFilter());
>> >>                 gets_entry_exist.add(entry_exist);
>> >>         }
>> >>
>> >>         Result[] result_entry_exist =
>> >> table_entry.get(gets_entry_exist);
>> >>
>> >>         int index = 0;
>> >>         for (Entry entry : entries.getEntries())
>> >>         {
>> >>                 boolean isEmpty =
>>  result_entry_exist[index++].isEmpty();
>> >>                 if (isEmpty)
>> >>                 {
>> >>                         // Process here
>> >>                 }
>> >>         }
>> >>                                                 {
>> >>
>> >>
>> >> 2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
>> >> > Hello Jean-Marc,
>> >> >
>> >> > BloomFilters are just designed for that.
>> >> >
>> >> > But they say if a row doesn't exist with a ash of the key (not the
>> >> oposit,
>> >> > 2 rowkeys could have the same ash result).
>> >> >
>> >> > If you want to be sure the rowkey exists you have to search for it
>> >> > in
>> >> > the
>> >> > HFile ( the whole mechanism is transparent with the get() ).
>> >> >
>> >> > Their is also an KeOnlyFilter
>> >> >
>> >>
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
>> >> > preventing from getting the whole columns of the existing key as
>> return
>> >> > (which could be heavy).
>> >> >
>> >> > Cheers,
>> >> >
>> >> > --
>> >> > Damien
>> >> >
>> >> >
>> >> > 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> What's the fastest way to know if a row exist?
>> >> >>
>> >> >> Today I'm doing that:
>> >> >>
>> >> >> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
>> >> >> Result entry_exist = table_entry.get(get_entry_exist);
>> >> >>
>> >> >> But should this be faster?
>> >> >> Get get_entry_exist = new Get(key);
>> >> >> Result entry_exist = table_entry.get(get_entry_exist);
>> >> >>
>> >> >> There is only one CF and one C on my table.
>> >> >>
>> >> >> Or is there an even faster way?
>> >> >>
>> >> >> Also, is there a way to make that even faster? I think BloomFilters
>> >> >> can help, right?
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> JM
>> >> >>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Adrien Mogenet
>> > 06.59.16.64.22
>> > http://www.mogenet.me
>> >
>>
>

Re: Fastest way to find is a row exist?

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Why do you want to remove the bloom filter?  I think you should keep the
bloom filter but also use the KeyOnlyFilter to cut down on data transferred
over the wire.


On Fri, Jan 4, 2013 at 3:28 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Ok. I have activate them on 2 of my main tables and I will re-run the
> job and see.
>
> 2 other questions then ;)
>
> 1) I have activated them that way: alter 'work_proposed', NAME => '@',
> BLOOMFILTER => 'ROW' how can I remove them?
> 2) Should I major_compact to make sure all the hash are stored?
>
> Thanks,
>
> JM
>
> 2013/1/4, Adrien Mogenet <ad...@gmail.com>:
> > On every Get, BloomFilter is acting as a filter (!) on top of each HFile
> > and allows to check if a key is absent from the HFile. So yes, you will
> > benefit from these filters.
> >
> >
> > On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari
> > <jean-marc@spaggiari.org
> >> wrote:
> >
> >> Is KeyOnlyFilter using the BloomFilters too?
> >>
> >> Here is, with more details, what I'm doing.
> >>
> >> Few questions.
> >> - Can I create one single KeyOnlyFilter and give the same filter to
> >> all the gets?
> >> - Will bloom filters benefit in such scenario? My key is small. Let's
> >> say average 128 bytes.
> >>
> >> The goal here is to check about 500 entries at a time to validate if
> >> they already exist or not.
> >>
> >> In my MR, I'm starting when I have more than 100K lines to handle, and
> >> each line car have up to 1K entries. So it can result up to 100M
> >> gets... Job took initially 500 minutes to complete. I have added few
> >> pretty good nodes and it's not taking less than 300 minutes. But I
> >> would like to get under 100 minutes if I can...
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >>         Vector<Get> gets_entry_exist = new Vector<Get>();
> >>         for (Entry entry : entries.getEntries())
> >>         {
> >>                 Get entry_exist = new Get(entry.toKey());
> >>                 entry_exist.setFilter(new KeyOnlyFilter());
> >>                 gets_entry_exist.add(entry_exist);
> >>         }
> >>
> >>         Result[] result_entry_exist = table_entry.get(gets_entry_exist);
> >>
> >>         int index = 0;
> >>         for (Entry entry : entries.getEntries())
> >>         {
> >>                 boolean isEmpty =
>  result_entry_exist[index++].isEmpty();
> >>                 if (isEmpty)
> >>                 {
> >>                         // Process here
> >>                 }
> >>         }
> >>                                                 {
> >>
> >>
> >> 2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
> >> > Hello Jean-Marc,
> >> >
> >> > BloomFilters are just designed for that.
> >> >
> >> > But they say if a row doesn't exist with a ash of the key (not the
> >> oposit,
> >> > 2 rowkeys could have the same ash result).
> >> >
> >> > If you want to be sure the rowkey exists you have to search for it in
> >> > the
> >> > HFile ( the whole mechanism is transparent with the get() ).
> >> >
> >> > Their is also an KeOnlyFilter
> >> >
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
> >> > preventing from getting the whole columns of the existing key as
> return
> >> > (which could be heavy).
> >> >
> >> > Cheers,
> >> >
> >> > --
> >> > Damien
> >> >
> >> >
> >> > 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
> >> >
> >> >> Hi,
> >> >>
> >> >> What's the fastest way to know if a row exist?
> >> >>
> >> >> Today I'm doing that:
> >> >>
> >> >> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
> >> >> Result entry_exist = table_entry.get(get_entry_exist);
> >> >>
> >> >> But should this be faster?
> >> >> Get get_entry_exist = new Get(key);
> >> >> Result entry_exist = table_entry.get(get_entry_exist);
> >> >>
> >> >> There is only one CF and one C on my table.
> >> >>
> >> >> Or is there an even faster way?
> >> >>
> >> >> Also, is there a way to make that even faster? I think BloomFilters
> >> >> can help, right?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> JM
> >> >>
> >> >
> >>
> >
> >
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >
>

Re: Fastest way to find is a row exist?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Ok. I have activate them on 2 of my main tables and I will re-run the
job and see.

2 other questions then ;)

1) I have activated them that way: alter 'work_proposed', NAME => '@',
BLOOMFILTER => 'ROW' how can I remove them?
2) Should I major_compact to make sure all the hash are stored?

Thanks,

JM

2013/1/4, Adrien Mogenet <ad...@gmail.com>:
> On every Get, BloomFilter is acting as a filter (!) on top of each HFile
> and allows to check if a key is absent from the HFile. So yes, you will
> benefit from these filters.
>
>
> On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari
> <jean-marc@spaggiari.org
>> wrote:
>
>> Is KeyOnlyFilter using the BloomFilters too?
>>
>> Here is, with more details, what I'm doing.
>>
>> Few questions.
>> - Can I create one single KeyOnlyFilter and give the same filter to
>> all the gets?
>> - Will bloom filters benefit in such scenario? My key is small. Let's
>> say average 128 bytes.
>>
>> The goal here is to check about 500 entries at a time to validate if
>> they already exist or not.
>>
>> In my MR, I'm starting when I have more than 100K lines to handle, and
>> each line car have up to 1K entries. So it can result up to 100M
>> gets... Job took initially 500 minutes to complete. I have added few
>> pretty good nodes and it's not taking less than 300 minutes. But I
>> would like to get under 100 minutes if I can...
>>
>> Thanks,
>>
>> JM
>>
>>         Vector<Get> gets_entry_exist = new Vector<Get>();
>>         for (Entry entry : entries.getEntries())
>>         {
>>                 Get entry_exist = new Get(entry.toKey());
>>                 entry_exist.setFilter(new KeyOnlyFilter());
>>                 gets_entry_exist.add(entry_exist);
>>         }
>>
>>         Result[] result_entry_exist = table_entry.get(gets_entry_exist);
>>
>>         int index = 0;
>>         for (Entry entry : entries.getEntries())
>>         {
>>                 boolean isEmpty =  result_entry_exist[index++].isEmpty();
>>                 if (isEmpty)
>>                 {
>>                         // Process here
>>                 }
>>         }
>>                                                 {
>>
>>
>> 2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
>> > Hello Jean-Marc,
>> >
>> > BloomFilters are just designed for that.
>> >
>> > But they say if a row doesn't exist with a ash of the key (not the
>> oposit,
>> > 2 rowkeys could have the same ash result).
>> >
>> > If you want to be sure the rowkey exists you have to search for it in
>> > the
>> > HFile ( the whole mechanism is transparent with the get() ).
>> >
>> > Their is also an KeOnlyFilter
>> >
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
>> > preventing from getting the whole columns of the existing key as return
>> > (which could be heavy).
>> >
>> > Cheers,
>> >
>> > --
>> > Damien
>> >
>> >
>> > 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
>> >
>> >> Hi,
>> >>
>> >> What's the fastest way to know if a row exist?
>> >>
>> >> Today I'm doing that:
>> >>
>> >> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
>> >> Result entry_exist = table_entry.get(get_entry_exist);
>> >>
>> >> But should this be faster?
>> >> Get get_entry_exist = new Get(key);
>> >> Result entry_exist = table_entry.get(get_entry_exist);
>> >>
>> >> There is only one CF and one C on my table.
>> >>
>> >> Or is there an even faster way?
>> >>
>> >> Also, is there a way to make that even faster? I think BloomFilters
>> >> can help, right?
>> >>
>> >> Thanks,
>> >>
>> >> JM
>> >>
>> >
>>
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

Re: Fastest way to find is a row exist?

Posted by Adrien Mogenet <ad...@gmail.com>.
On every Get, BloomFilter is acting as a filter (!) on top of each HFile
and allows to check if a key is absent from the HFile. So yes, you will
benefit from these filters.


On Fri, Jan 4, 2013 at 8:58 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Is KeyOnlyFilter using the BloomFilters too?
>
> Here is, with more details, what I'm doing.
>
> Few questions.
> - Can I create one single KeyOnlyFilter and give the same filter to
> all the gets?
> - Will bloom filters benefit in such scenario? My key is small. Let's
> say average 128 bytes.
>
> The goal here is to check about 500 entries at a time to validate if
> they already exist or not.
>
> In my MR, I'm starting when I have more than 100K lines to handle, and
> each line car have up to 1K entries. So it can result up to 100M
> gets... Job took initially 500 minutes to complete. I have added few
> pretty good nodes and it's not taking less than 300 minutes. But I
> would like to get under 100 minutes if I can...
>
> Thanks,
>
> JM
>
>         Vector<Get> gets_entry_exist = new Vector<Get>();
>         for (Entry entry : entries.getEntries())
>         {
>                 Get entry_exist = new Get(entry.toKey());
>                 entry_exist.setFilter(new KeyOnlyFilter());
>                 gets_entry_exist.add(entry_exist);
>         }
>
>         Result[] result_entry_exist = table_entry.get(gets_entry_exist);
>
>         int index = 0;
>         for (Entry entry : entries.getEntries())
>         {
>                 boolean isEmpty =  result_entry_exist[index++].isEmpty();
>                 if (isEmpty)
>                 {
>                         // Process here
>                 }
>         }
>                                                 {
>
>
> 2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
> > Hello Jean-Marc,
> >
> > BloomFilters are just designed for that.
> >
> > But they say if a row doesn't exist with a ash of the key (not the
> oposit,
> > 2 rowkeys could have the same ash result).
> >
> > If you want to be sure the rowkey exists you have to search for it in the
> > HFile ( the whole mechanism is transparent with the get() ).
> >
> > Their is also an KeOnlyFilter
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
> > preventing from getting the whole columns of the existing key as return
> > (which could be heavy).
> >
> > Cheers,
> >
> > --
> > Damien
> >
> >
> > 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
> >
> >> Hi,
> >>
> >> What's the fastest way to know if a row exist?
> >>
> >> Today I'm doing that:
> >>
> >> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
> >> Result entry_exist = table_entry.get(get_entry_exist);
> >>
> >> But should this be faster?
> >> Get get_entry_exist = new Get(key);
> >> Result entry_exist = table_entry.get(get_entry_exist);
> >>
> >> There is only one CF and one C on my table.
> >>
> >> Or is there an even faster way?
> >>
> >> Also, is there a way to make that even faster? I think BloomFilters
> >> can help, right?
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >
>



-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: Fastest way to find is a row exist?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Is KeyOnlyFilter using the BloomFilters too?

Here is, with more details, what I'm doing.

Few questions.
- Can I create one single KeyOnlyFilter and give the same filter to
all the gets?
- Will bloom filters benefit in such scenario? My key is small. Let's
say average 128 bytes.

The goal here is to check about 500 entries at a time to validate if
they already exist or not.

In my MR, I'm starting when I have more than 100K lines to handle, and
each line car have up to 1K entries. So it can result up to 100M
gets... Job took initially 500 minutes to complete. I have added few
pretty good nodes and it's not taking less than 300 minutes. But I
would like to get under 100 minutes if I can...

Thanks,

JM

	Vector<Get> gets_entry_exist = new Vector<Get>();
	for (Entry entry : entries.getEntries())
	{
		Get entry_exist = new Get(entry.toKey());
		entry_exist.setFilter(new KeyOnlyFilter());
		gets_entry_exist.add(entry_exist);
	}

	Result[] result_entry_exist = table_entry.get(gets_entry_exist);

	int index = 0;
	for (Entry entry : entries.getEntries())
	{
		boolean isEmpty =  result_entry_exist[index++].isEmpty();
		if (isEmpty)
		{
			// Process here
		}
	}
						{


2013/1/4, Damien Hardy <dh...@viadeoteam.com>:
> Hello Jean-Marc,
>
> BloomFilters are just designed for that.
>
> But they say if a row doesn't exist with a ash of the key (not the oposit,
> 2 rowkeys could have the same ash result).
>
> If you want to be sure the rowkey exists you have to search for it in the
> HFile ( the whole mechanism is transparent with the get() ).
>
> Their is also an KeOnlyFilter
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
> preventing from getting the whole columns of the existing key as return
> (which could be heavy).
>
> Cheers,
>
> --
> Damien
>
>
> 2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>
>
>> Hi,
>>
>> What's the fastest way to know if a row exist?
>>
>> Today I'm doing that:
>>
>> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
>> Result entry_exist = table_entry.get(get_entry_exist);
>>
>> But should this be faster?
>> Get get_entry_exist = new Get(key);
>> Result entry_exist = table_entry.get(get_entry_exist);
>>
>> There is only one CF and one C on my table.
>>
>> Or is there an even faster way?
>>
>> Also, is there a way to make that even faster? I think BloomFilters
>> can help, right?
>>
>> Thanks,
>>
>> JM
>>
>

Re: Fastest way to find is a row exist?

Posted by Damien Hardy <dh...@viadeoteam.com>.
Hello Jean-Marc,

BloomFilters are just designed for that.

But they say if a row doesn't exist with a ash of the key (not the oposit,
2 rowkeys could have the same ash result).

If you want to be sure the rowkey exists you have to search for it in the
HFile ( the whole mechanism is transparent with the get() ).

Their is also an KeOnlyFilter
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html
preventing from getting the whole columns of the existing key as return
(which could be heavy).

Cheers,

-- 
Damien


2013/1/4 Jean-Marc Spaggiari <je...@spaggiari.org>

> Hi,
>
> What's the fastest way to know if a row exist?
>
> Today I'm doing that:
>
> Get get_entry_exist = new Get(key).addColumn(CF_DATA, C_DATA);
> Result entry_exist = table_entry.get(get_entry_exist);
>
> But should this be faster?
> Get get_entry_exist = new Get(key);
> Result entry_exist = table_entry.get(get_entry_exist);
>
> There is only one CF and one C on my table.
>
> Or is there an even faster way?
>
> Also, is there a way to make that even faster? I think BloomFilters
> can help, right?
>
> Thanks,
>
> JM
>