You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by vaibhav thapliyal <va...@gmail.com> on 2016/10/10 10:09:46 UTC

Re: Indexing Column Values in Accumulo

Creating an Inverted Index could serve your use case. You can store the
column family and column qualifier both in the row of the index table
separated by a delimiter.

For eg cf|cq

And then perform queries on just the row id to get a low query time.

On 29 September 2016 at 11:03, Josh Elser <jo...@gmail.com> wrote:

> Hi Yamini,
>
> You're right that a filter would have to exhaustively search a table to
> find all rows that contain a certain family and qualifier. If you
> explicitly know the rows that you want to search, this is a fast operation.
>
> Have you considered creating an inverted index? This would be a table that
> you have to maintain on your own. Accumulo does not provide automatic index
> generation.
>
> - Josh
>
>
> Yamini Joshi wrote:
>
>> Hello everyone
>>
>> Is there a way to easily index column fields for efficient lookups in
>> Accumulo? My use case is to select the records containing a certain
>> column family and column qualifier from among a set of column
>> qualifiers(reverse lookup). Although this could be done using a custom
>> filter, I'm looking for an optimal solution (since filter might scan the
>> entire database).
>>
>> Best regards,
>> Yamini Joshi
>>
>

Re: Indexing Column Values in Accumulo

Posted by vaibhav thapliyal <va...@gmail.com>.
Based on your use case,  you can put the CQ values in the rowid of the
index table and put the rowid values in the values column. This leaves the
column family and column qualifier with room for more filter labels which
you can use to perform further filtering of your data.

Regards
Vaibhav

On 10 Oct 2016 8:32 p.m., "Yamini Joshi" <ya...@gmail.com> wrote:

> I guess there is no other way. Also, once I get the rowIDs, I need to do
> further filtering. Do the filters parse an entire record? My use case is to
> select rowIds with a cf|cq value (given a list of values(cqs)). In other
> words, the filter will have to access all the cf|cqs, right?
>
> Best regards,
> Yamini Joshi
>
> On Mon, Oct 10, 2016 at 5:09 AM, vaibhav thapliyal <
> vaibhav.thapliyal.91@gmail.com> wrote:
>
>> Creating an Inverted Index could serve your use case. You can store the
>> column family and column qualifier both in the row of the index table
>> separated by a delimiter.
>>
>> For eg cf|cq
>>
>> And then perform queries on just the row id to get a low query time.
>>
>> On 29 September 2016 at 11:03, Josh Elser <jo...@gmail.com> wrote:
>>
>>> Hi Yamini,
>>>
>>> You're right that a filter would have to exhaustively search a table to
>>> find all rows that contain a certain family and qualifier. If you
>>> explicitly know the rows that you want to search, this is a fast operation.
>>>
>>> Have you considered creating an inverted index? This would be a table
>>> that you have to maintain on your own. Accumulo does not provide automatic
>>> index generation.
>>>
>>> - Josh
>>>
>>>
>>> Yamini Joshi wrote:
>>>
>>>> Hello everyone
>>>>
>>>> Is there a way to easily index column fields for efficient lookups in
>>>> Accumulo? My use case is to select the records containing a certain
>>>> column family and column qualifier from among a set of column
>>>> qualifiers(reverse lookup). Although this could be done using a custom
>>>> filter, I'm looking for an optimal solution (since filter might scan the
>>>> entire database).
>>>>
>>>> Best regards,
>>>> Yamini Joshi
>>>>
>>>
>>
>

Re: Indexing Column Values in Accumulo

Posted by Yamini Joshi <ya...@gmail.com>.
I guess there is no other way. Also, once I get the rowIDs, I need to do
further filtering. Do the filters parse an entire record? My use case is to
select rowIds with a cf|cq value (given a list of values(cqs)). In other
words, the filter will have to access all the cf|cqs, right?

Best regards,
Yamini Joshi

On Mon, Oct 10, 2016 at 5:09 AM, vaibhav thapliyal <
vaibhav.thapliyal.91@gmail.com> wrote:

> Creating an Inverted Index could serve your use case. You can store the
> column family and column qualifier both in the row of the index table
> separated by a delimiter.
>
> For eg cf|cq
>
> And then perform queries on just the row id to get a low query time.
>
> On 29 September 2016 at 11:03, Josh Elser <jo...@gmail.com> wrote:
>
>> Hi Yamini,
>>
>> You're right that a filter would have to exhaustively search a table to
>> find all rows that contain a certain family and qualifier. If you
>> explicitly know the rows that you want to search, this is a fast operation.
>>
>> Have you considered creating an inverted index? This would be a table
>> that you have to maintain on your own. Accumulo does not provide automatic
>> index generation.
>>
>> - Josh
>>
>>
>> Yamini Joshi wrote:
>>
>>> Hello everyone
>>>
>>> Is there a way to easily index column fields for efficient lookups in
>>> Accumulo? My use case is to select the records containing a certain
>>> column family and column qualifier from among a set of column
>>> qualifiers(reverse lookup). Although this could be done using a custom
>>> filter, I'm looking for an optimal solution (since filter might scan the
>>> entire database).
>>>
>>> Best regards,
>>> Yamini Joshi
>>>
>>
>