You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Stas Maksimov <ma...@gmail.com> on 2013/02/15 23:51:34 UTC

storing lists in columns

Hi all,

I have a requirement to store lists in HBase columns like this:
"table", "rowid1", "f:list", "aa, bb, cc"
"table", "rowid2", "f:list", "aabb, cc"

There is a further requirement to be able to find rows where f:list
contains a particular item, e.g. when I need to find rows having item "aa"
only "rowid1" should match, and for item "cc" both "rowid1" and "rowid2"
should match.

For now I decided to use SingleColumnValueFilter with substring matching.
As using comma-separated list proved difficult to search through, I'm using
pipe symbols to separate items like this: "|aa|bb|cc|", so that I could
pass the search item surrounded by pipes into the filter:
SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|')

This proved to work effectively enough, however I would prefer to use
something more standard for my list storage (e.g. serialised JSON), or
perhaps something even more optimised for a search - performance really
does matter here.

Any opinions on this solution and possible enhancements are much
appreciated.

Many thanks,
Stas

Re: storing lists in columns

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Stas,

Don't forget that you should always try to keep the number of columns
families lower than 3, else you might face some performances issues.

JM

2013/2/19, Stas Maksimov <ma...@gmail.com>:
> Hi Jean-Marc,
>
> I've validated this, it works perfectly. Very easy to implement and it's
> very fast!
>
> Thankfully in this project there isn't a lot of lists in each table, so I
> won't have to create too many column families. In other scenarios it could
> be a problem.
>
> Many thanks,
> Stas
>
>
> On 16 February 2013 02:29, Jean-Marc Spaggiari
> <je...@spaggiari.org>wrote:
>
>> Hi Stas,
>>
>> Few options are coming into my mind.
>>
>> Quickly:
>> 1) Why not storing the products in specif columns instead of in the
>> same one? Like:
>> table, rowid1, cf:list, c:aa, value:true
>> table, rowid1, cf:list, c:bb, value:true
>> table, rowid1, cf:list, c:cc, value:true
>> table, rowid2, cf:list, c:aabb, value:true
>> table, rowid2, cf:list, c:cc, value:true
>> That way when you do a search you query directly the right column for
>> the right row. And using "exist" call with also reduce the size of the
>> data transfered.
>>
>> 2) You can store the data in the oposite way. Like:
>> table, aa, cf:products, c:rowid1, value:true
>> table, aabb, cf:products, c:rowid2, value:true
>> table, bb, cf:products, c:rowid1, value:true
>> table, cc, cf:products, c:rowid1, value:true
>> table, cc, cf:products, c:rowid2, value:true
>> Here, you query by your product ID, and you search the column based on
>> your previous rowid.
>>
>>
>> I will say the 2 solutions are equivalent, but it will really depend
>> on your data pattern and you query pattern.
>>
>> JM
>>
>> 2013/2/15, Stas Maksimov <ma...@gmail.com>:
>> > Hi all,
>> >
>> > I have a requirement to store lists in HBase columns like this:
>> > "table", "rowid1", "f:list", "aa, bb, cc"
>> > "table", "rowid2", "f:list", "aabb, cc"
>> >
>> > There is a further requirement to be able to find rows where f:list
>> > contains a particular item, e.g. when I need to find rows having item
>> "aa"
>> > only "rowid1" should match, and for item "cc" both "rowid1" and
>> > "rowid2"
>> > should match.
>> >
>> > For now I decided to use SingleColumnValueFilter with substring
>> > matching.
>> > As using comma-separated list proved difficult to search through, I'm
>> using
>> > pipe symbols to separate items like this: "|aa|bb|cc|", so that I could
>> > pass the search item surrounded by pipes into the filter:
>> > SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|')
>> >
>> > This proved to work effectively enough, however I would prefer to use
>> > something more standard for my list storage (e.g. serialised JSON), or
>> > perhaps something even more optimised for a search - performance really
>> > does matter here.
>> >
>> > Any opinions on this solution and possible enhancements are much
>> > appreciated.
>> >
>> > Many thanks,
>> > Stas
>> >
>>
>

Re: storing lists in columns

Posted by Stas Maksimov <ma...@gmail.com>.
Hi Jean-Marc,

I've validated this, it works perfectly. Very easy to implement and it's
very fast!

Thankfully in this project there isn't a lot of lists in each table, so I
won't have to create too many column families. In other scenarios it could
be a problem.

Many thanks,
Stas


On 16 February 2013 02:29, Jean-Marc Spaggiari <je...@spaggiari.org>wrote:

> Hi Stas,
>
> Few options are coming into my mind.
>
> Quickly:
> 1) Why not storing the products in specif columns instead of in the
> same one? Like:
> table, rowid1, cf:list, c:aa, value:true
> table, rowid1, cf:list, c:bb, value:true
> table, rowid1, cf:list, c:cc, value:true
> table, rowid2, cf:list, c:aabb, value:true
> table, rowid2, cf:list, c:cc, value:true
> That way when you do a search you query directly the right column for
> the right row. And using "exist" call with also reduce the size of the
> data transfered.
>
> 2) You can store the data in the oposite way. Like:
> table, aa, cf:products, c:rowid1, value:true
> table, aabb, cf:products, c:rowid2, value:true
> table, bb, cf:products, c:rowid1, value:true
> table, cc, cf:products, c:rowid1, value:true
> table, cc, cf:products, c:rowid2, value:true
> Here, you query by your product ID, and you search the column based on
> your previous rowid.
>
>
> I will say the 2 solutions are equivalent, but it will really depend
> on your data pattern and you query pattern.
>
> JM
>
> 2013/2/15, Stas Maksimov <ma...@gmail.com>:
> > Hi all,
> >
> > I have a requirement to store lists in HBase columns like this:
> > "table", "rowid1", "f:list", "aa, bb, cc"
> > "table", "rowid2", "f:list", "aabb, cc"
> >
> > There is a further requirement to be able to find rows where f:list
> > contains a particular item, e.g. when I need to find rows having item
> "aa"
> > only "rowid1" should match, and for item "cc" both "rowid1" and "rowid2"
> > should match.
> >
> > For now I decided to use SingleColumnValueFilter with substring matching.
> > As using comma-separated list proved difficult to search through, I'm
> using
> > pipe symbols to separate items like this: "|aa|bb|cc|", so that I could
> > pass the search item surrounded by pipes into the filter:
> > SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|')
> >
> > This proved to work effectively enough, however I would prefer to use
> > something more standard for my list storage (e.g. serialised JSON), or
> > perhaps something even more optimised for a search - performance really
> > does matter here.
> >
> > Any opinions on this solution and possible enhancements are much
> > appreciated.
> >
> > Many thanks,
> > Stas
> >
>

Re: storing lists in columns

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Stas,

Few options are coming into my mind.

Quickly:
1) Why not storing the products in specif columns instead of in the
same one? Like:
table, rowid1, cf:list, c:aa, value:true
table, rowid1, cf:list, c:bb, value:true
table, rowid1, cf:list, c:cc, value:true
table, rowid2, cf:list, c:aabb, value:true
table, rowid2, cf:list, c:cc, value:true
That way when you do a search you query directly the right column for
the right row. And using "exist" call with also reduce the size of the
data transfered.

2) You can store the data in the oposite way. Like:
table, aa, cf:products, c:rowid1, value:true
table, aabb, cf:products, c:rowid2, value:true
table, bb, cf:products, c:rowid1, value:true
table, cc, cf:products, c:rowid1, value:true
table, cc, cf:products, c:rowid2, value:true
Here, you query by your product ID, and you search the column based on
your previous rowid.


I will say the 2 solutions are equivalent, but it will really depend
on your data pattern and you query pattern.

JM

2013/2/15, Stas Maksimov <ma...@gmail.com>:
> Hi all,
>
> I have a requirement to store lists in HBase columns like this:
> "table", "rowid1", "f:list", "aa, bb, cc"
> "table", "rowid2", "f:list", "aabb, cc"
>
> There is a further requirement to be able to find rows where f:list
> contains a particular item, e.g. when I need to find rows having item "aa"
> only "rowid1" should match, and for item "cc" both "rowid1" and "rowid2"
> should match.
>
> For now I decided to use SingleColumnValueFilter with substring matching.
> As using comma-separated list proved difficult to search through, I'm using
> pipe symbols to separate items like this: "|aa|bb|cc|", so that I could
> pass the search item surrounded by pipes into the filter:
> SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|')
>
> This proved to work effectively enough, however I would prefer to use
> something more standard for my list storage (e.g. serialised JSON), or
> perhaps something even more optimised for a search - performance really
> does matter here.
>
> Any opinions on this solution and possible enhancements are much
> appreciated.
>
> Many thanks,
> Stas
>