You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Marko Dinic <ha...@gmail.com> on 2015/11/29 23:19:11 UTC

Rowkey design

Hello, everyone!

I'm new to HBase and I need help designing rowkeys for use case that looks
like this:

- Products are listed, where each product has a product id.
- Each product has a timestamp.
- Each product is created in certain place (e.g. city)
- Each product is created by some unit (e.g. factory)

I would like to be able to scan products from a certain time period on one
hand, from a certain place, or from a certain unit.

I read about salting to avoid hot-spotting and I understand that rows are
sequential by rowkey. This will allow me to scan for a certain time period
using with following rowkey:

salt-productId-timestamp

And I can specify the period using STARTROW, ENDROW.

What confuses me is how to include place (and maybe unit) into key and be
able to select products from certain place during certain time period?

If I limit myself to be able to scan by one of the above (time range OR
place) I have an idea to duplicate data to two different tables, one with
(salt-productId-timestamp) and other with (salt-productId-place) keys. Is
that recommend or not?

So, how to construct my keys?

I should emphasize that i need this data to be input to MAPREDUCE JOB.

Any help is greatly appreciated.

-- 
Best regards,
Marko

Re: Rowkey design

Posted by Marko Dinic <ha...@gmail.com>.
Hi Tariq,

Thank you for your answer.

But won't that break the ordering of my rows by timestamp thus making it
impossible to scan by time range using STARTROW ENDROW?

Best regards,
Marko

On Monday, November 30, 2015, Mohammad Tariq <do...@gmail.com> wrote:

> Hi Marko,
>
> You could add the place(and unit as well) to your key if that's not making
> it very long. And then use RowFilter with SubstringComparator to get the
> desired rows.
>
>
> [image: http://]
> Tariq, Mohammad
> about.me/mti
> [image: http://]
> <http://about.me/mti>
>
>
> On Mon, Nov 30, 2015 at 3:49 AM, Marko Dinic <hacker.marko@gmail.com
> <javascript:;>> wrote:
>
> > Hello, everyone!
> >
> > I'm new to HBase and I need help designing rowkeys for use case that
> looks
> > like this:
> >
> > - Products are listed, where each product has a product id.
> > - Each product has a timestamp.
> > - Each product is created in certain place (e.g. city)
> > - Each product is created by some unit (e.g. factory)
> >
> > I would like to be able to scan products from a certain time period on
> one
> > hand, from a certain place, or from a certain unit.
> >
> > I read about salting to avoid hot-spotting and I understand that rows are
> > sequential by rowkey. This will allow me to scan for a certain time
> period
> > using with following rowkey:
> >
> > salt-productId-timestamp
> >
> > And I can specify the period using STARTROW, ENDROW.
> >
> > What confuses me is how to include place (and maybe unit) into key and be
> > able to select products from certain place during certain time period?
> >
> > If I limit myself to be able to scan by one of the above (time range OR
> > place) I have an idea to duplicate data to two different tables, one with
> > (salt-productId-timestamp) and other with (salt-productId-place) keys. Is
> > that recommend or not?
> >
> > So, how to construct my keys?
> >
> > I should emphasize that i need this data to be input to MAPREDUCE JOB.
> >
> > Any help is greatly appreciated.
> >
> > --
> > Best regards,
> > Marko
> >
>


-- 
Marko Dinic

Re: Rowkey design

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Marko,

You could add the place(and unit as well) to your key if that's not making
it very long. And then use RowFilter with SubstringComparator to get the
desired rows.


[image: http://]
Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>


On Mon, Nov 30, 2015 at 3:49 AM, Marko Dinic <ha...@gmail.com> wrote:

> Hello, everyone!
>
> I'm new to HBase and I need help designing rowkeys for use case that looks
> like this:
>
> - Products are listed, where each product has a product id.
> - Each product has a timestamp.
> - Each product is created in certain place (e.g. city)
> - Each product is created by some unit (e.g. factory)
>
> I would like to be able to scan products from a certain time period on one
> hand, from a certain place, or from a certain unit.
>
> I read about salting to avoid hot-spotting and I understand that rows are
> sequential by rowkey. This will allow me to scan for a certain time period
> using with following rowkey:
>
> salt-productId-timestamp
>
> And I can specify the period using STARTROW, ENDROW.
>
> What confuses me is how to include place (and maybe unit) into key and be
> able to select products from certain place during certain time period?
>
> If I limit myself to be able to scan by one of the above (time range OR
> place) I have an idea to duplicate data to two different tables, one with
> (salt-productId-timestamp) and other with (salt-productId-place) keys. Is
> that recommend or not?
>
> So, how to construct my keys?
>
> I should emphasize that i need this data to be input to MAPREDUCE JOB.
>
> Any help is greatly appreciated.
>
> --
> Best regards,
> Marko
>

Re: Rowkey design

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Marko,

Scan expects complete start and end row keys, IIRC. Order would anyway get
disturbed as you are salting your keys.


[image: http://]
Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>


On Mon, Nov 30, 2015 at 1:19 PM, Marko Dinic <ha...@gmail.com> wrote:

> Hi Ted,
>
> Thank you for that information. Do you have some other suggestion, perhaps?
>
> Best regards,
> Marko
>
> On Monday, November 30, 2015, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. duplicate data to two different tables, one with
> > (salt-productId-timestamp)
> > and other with (salt-productId-place) keys
> >
> > I suggest think twice about the above schema. It may become tricky
> keeping
> > data in the two tables in sync.
> > Meaning, when update to table1 succeeds but update to table2 fails, you
> > need to take additional action either retrying write to table2 or rolling
> > back update to table1.
> >
> > Cheers
> >
> > On Sun, Nov 29, 2015 at 2:19 PM, Marko Dinic <hacker.marko@gmail.com
> > <javascript:;>> wrote:
> >
> > > Hello, everyone!
> > >
> > > I'm new to HBase and I need help designing rowkeys for use case that
> > looks
> > > like this:
> > >
> > > - Products are listed, where each product has a product id.
> > > - Each product has a timestamp.
> > > - Each product is created in certain place (e.g. city)
> > > - Each product is created by some unit (e.g. factory)
> > >
> > > I would like to be able to scan products from a certain time period on
> > one
> > > hand, from a certain place, or from a certain unit.
> > >
> > > I read about salting to avoid hot-spotting and I understand that rows
> are
> > > sequential by rowkey. This will allow me to scan for a certain time
> > period
> > > using with following rowkey:
> > >
> > > salt-productId-timestamp
> > >
> > > And I can specify the period using STARTROW, ENDROW.
> > >
> > > What confuses me is how to include place (and maybe unit) into key and
> be
> > > able to select products from certain place during certain time period?
> > >
> > > If I limit myself to be able to scan by one of the above (time range OR
> > > place) I have an idea to duplicate data to two different tables, one
> with
> > > (salt-productId-timestamp) and other with (salt-productId-place) keys.
> Is
> > > that recommend or not?
> > >
> > > So, how to construct my keys?
> > >
> > > I should emphasize that i need this data to be input to MAPREDUCE JOB.
> > >
> > > Any help is greatly appreciated.
> > >
> > > --
> > > Best regards,
> > > Marko
> > >
> >
>
>
> --
> Marko Dinic
>

Re: Rowkey design

Posted by Marko Dinic <ha...@gmail.com>.
Hi Ted,

Thank you for that information. Do you have some other suggestion, perhaps?

Best regards,
Marko

On Monday, November 30, 2015, Ted Yu <yu...@gmail.com> wrote:

> bq. duplicate data to two different tables, one with
> (salt-productId-timestamp)
> and other with (salt-productId-place) keys
>
> I suggest think twice about the above schema. It may become tricky keeping
> data in the two tables in sync.
> Meaning, when update to table1 succeeds but update to table2 fails, you
> need to take additional action either retrying write to table2 or rolling
> back update to table1.
>
> Cheers
>
> On Sun, Nov 29, 2015 at 2:19 PM, Marko Dinic <hacker.marko@gmail.com
> <javascript:;>> wrote:
>
> > Hello, everyone!
> >
> > I'm new to HBase and I need help designing rowkeys for use case that
> looks
> > like this:
> >
> > - Products are listed, where each product has a product id.
> > - Each product has a timestamp.
> > - Each product is created in certain place (e.g. city)
> > - Each product is created by some unit (e.g. factory)
> >
> > I would like to be able to scan products from a certain time period on
> one
> > hand, from a certain place, or from a certain unit.
> >
> > I read about salting to avoid hot-spotting and I understand that rows are
> > sequential by rowkey. This will allow me to scan for a certain time
> period
> > using with following rowkey:
> >
> > salt-productId-timestamp
> >
> > And I can specify the period using STARTROW, ENDROW.
> >
> > What confuses me is how to include place (and maybe unit) into key and be
> > able to select products from certain place during certain time period?
> >
> > If I limit myself to be able to scan by one of the above (time range OR
> > place) I have an idea to duplicate data to two different tables, one with
> > (salt-productId-timestamp) and other with (salt-productId-place) keys. Is
> > that recommend or not?
> >
> > So, how to construct my keys?
> >
> > I should emphasize that i need this data to be input to MAPREDUCE JOB.
> >
> > Any help is greatly appreciated.
> >
> > --
> > Best regards,
> > Marko
> >
>


-- 
Marko Dinic

Re: Rowkey design

Posted by Ted Yu <yu...@gmail.com>.
bq. duplicate data to two different tables, one with (salt-productId-timestamp)
and other with (salt-productId-place) keys

I suggest think twice about the above schema. It may become tricky keeping
data in the two tables in sync.
Meaning, when update to table1 succeeds but update to table2 fails, you
need to take additional action either retrying write to table2 or rolling
back update to table1.

Cheers

On Sun, Nov 29, 2015 at 2:19 PM, Marko Dinic <ha...@gmail.com> wrote:

> Hello, everyone!
>
> I'm new to HBase and I need help designing rowkeys for use case that looks
> like this:
>
> - Products are listed, where each product has a product id.
> - Each product has a timestamp.
> - Each product is created in certain place (e.g. city)
> - Each product is created by some unit (e.g. factory)
>
> I would like to be able to scan products from a certain time period on one
> hand, from a certain place, or from a certain unit.
>
> I read about salting to avoid hot-spotting and I understand that rows are
> sequential by rowkey. This will allow me to scan for a certain time period
> using with following rowkey:
>
> salt-productId-timestamp
>
> And I can specify the period using STARTROW, ENDROW.
>
> What confuses me is how to include place (and maybe unit) into key and be
> able to select products from certain place during certain time period?
>
> If I limit myself to be able to scan by one of the above (time range OR
> place) I have an idea to duplicate data to two different tables, one with
> (salt-productId-timestamp) and other with (salt-productId-place) keys. Is
> that recommend or not?
>
> So, how to construct my keys?
>
> I should emphasize that i need this data to be input to MAPREDUCE JOB.
>
> Any help is greatly appreciated.
>
> --
> Best regards,
> Marko
>