You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Sam Seigal <se...@yahoo.com> on 2011/09/30 03:27:43 UTC

querying values by row

Hi,

I am wondering what is the best way to query a record when only the
leading and trailing letters of a row are known.

For example, if my row looks something like:

event_type-timestamp-eventid

If i know the event_type and eventid, but do not really care about the
timestamp, what is the most efficient way to get this record ?

I know that the eventid and event_type combination will be unique.

I see that the RegExpRowFilter has been deprecated, so I cannot query
for something like event_type*event_id. What are other options ?

Thank you,

Sam

Re: querying values by row

Posted by Ted Yu <yu...@gmail.com>.

>> Is there a way (even if non performant) to do atomic operations between
tables in HBase ?
Not that I know of.
We're working on HBASE-2856 which targets multi-family ACID guarantee.

On Fri, Sep 30, 2011 at 11:59 AM, Sam Seigal <se...@yahoo.com> wrote:

> The reason the timestamp is the key is because I want to be able to do
> range scans for a particular event_id_type.
>
> Persisting the record with a different row key in another table is an
> option.
>
> Since there are no transactions, is the only way to fix data that
> might be out sync within two tables an M/R job ?
>
> I saw a TransactionManager class in HBase 0.26 API, but it is not
> available in the most recent releases.
>
> Is there a way (even if non performant) to do atomic operations
> between tables in HBase ?
>
> Thank you.
>
>
> On Thu, Sep 29, 2011 at 7:30 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
> > I assume with "row" you mean the row-key.
> >
> >
> > The only way to query this is do a scan with just event_type as start key
> (i.e. scan starting with a prefix of the key).
> > That will be inefficient if there are many key starting with the same
> event_type.
> >
> > If that is a common query you should consider populating a second table
> with event_type-eventid as key, and timestamp as value.
> > Why is the timestamp part of the key?
> >
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Sam Seigal <se...@yahoo.com>
> > To: hbase-user@hadoop.apache.org
> > Cc:
> > Sent: Thursday, September 29, 2011 6:27 PM
> > Subject: querying values by row
> >
> > Hi,
> >
> > I am wondering what is the best way to query a record when only the
> > leading and trailing letters of a row are known.
> >
> > For example, if my row looks something like:
> >
> > event_type-timestamp-eventid
> >
> > If i know the event_type and eventid, but do not really care about the
> > timestamp, what is the most efficient way to get this record ?
> >
> > I know that the eventid and event_type combination will be unique.
> >
> > I see that the RegExpRowFilter has been deprecated, so I cannot query
> > for something like event_type*event_id. What are other options ?
> >
> > Thank you,
> >
> > Sam
> >
>

Re: querying values by row

Posted by Sam Seigal <se...@yahoo.com>.

The reason the timestamp is the key is because I want to be able to do
range scans for a particular event_id_type.

Persisting the record with a different row key in another table is an option.

Since there are no transactions, is the only way to fix data that
might be out sync within two tables an M/R job ?

I saw a TransactionManager class in HBase 0.26 API, but it is not
available in the most recent releases.

Is there a way (even if non performant) to do atomic operations
between tables in HBase ?

Thank you.


On Thu, Sep 29, 2011 at 7:30 PM, lars hofhansl <lh...@yahoo.com> wrote:
> I assume with "row" you mean the row-key.
>
>
> The only way to query this is do a scan with just event_type as start key (i.e. scan starting with a prefix of the key).
> That will be inefficient if there are many key starting with the same event_type.
>
> If that is a common query you should consider populating a second table with event_type-eventid as key, and timestamp as value.
> Why is the timestamp part of the key?
>
>
> -- Lars
>
>
> ----- Original Message -----
> From: Sam Seigal <se...@yahoo.com>
> To: hbase-user@hadoop.apache.org
> Cc:
> Sent: Thursday, September 29, 2011 6:27 PM
> Subject: querying values by row
>
> Hi,
>
> I am wondering what is the best way to query a record when only the
> leading and trailing letters of a row are known.
>
> For example, if my row looks something like:
>
> event_type-timestamp-eventid
>
> If i know the event_type and eventid, but do not really care about the
> timestamp, what is the most efficient way to get this record ?
>
> I know that the eventid and event_type combination will be unique.
>
> I see that the RegExpRowFilter has been deprecated, so I cannot query
> for something like event_type*event_id. What are other options ?
>
> Thank you,
>
> Sam
>

Re: querying values by row

Posted by lars hofhansl <lh...@yahoo.com>.

I assume with "row" you mean the row-key.


The only way to query this is do a scan with just event_type as start key (i.e. scan starting with a prefix of the key).
That will be inefficient if there are many key starting with the same event_type.

If that is a common query you should consider populating a second table with event_type-eventid as key, and timestamp as value.
Why is the timestamp part of the key?


-- Lars


----- Original Message -----
From: Sam Seigal <se...@yahoo.com>
To: hbase-user@hadoop.apache.org
Cc: 
Sent: Thursday, September 29, 2011 6:27 PM
Subject: querying values by row

Hi,

I am wondering what is the best way to query a record when only the
leading and trailing letters of a row are known.

For example, if my row looks something like:

event_type-timestamp-eventid

If i know the event_type and eventid, but do not really care about the
timestamp, what is the most efficient way to get this record ?

I know that the eventid and event_type combination will be unique.

I see that the RegExpRowFilter has been deprecated, so I cannot query
for something like event_type*event_id. What are other options ?

Thank you,

Sam

Re: querying values by row

Posted by Sam Seigal <se...@yahoo.com>.

On another note, if I had to update the value of such a row key, can
this be done without two operations ?

i.e I don't know the exact timestamp of when the record was created,
but I do know the event_type and event_id.

The only way I see to update a value for a given event_type and
event_id is to first do a GET or a Scan to get the value, determine
the exact timestamp for the record and then write the updated value.
Is there a better way to do this in one server call ?

Thanks !

Sam

On Thu, Sep 29, 2011 at 6:27 PM, Sam Seigal <se...@yahoo.com> wrote:
> Hi,
>
> I am wondering what is the best way to query a record when only the
> leading and trailing letters of a row are known.
>
> For example, if my row looks something like:
>
> event_type-timestamp-eventid
>
> If i know the event_type and eventid, but do not really care about the
> timestamp, what is the most efficient way to get this record ?
>
> I know that the eventid and event_type combination will be unique.
>
> I see that the RegExpRowFilter has been deprecated, so I cannot query
> for something like event_type*event_id. What are other options ?
>
> Thank you,
>
> Sam
>