You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mark <st...@gmail.com> on 2011/08/13 20:00:39 UTC
Generic Schema Question
Hi all, I'm trying to wrap my head around HBase schema design and I am
having trouble modeling the following use case:
We store all our use behavior (clicks, searches, page views) in Hadoop
and we would like to add this into HBase so we can interactively
"explore" what our users are doing. For example we would like, given an
IP address get back a list of all searches, page views, clicks etc that
this user has attempted.
My initial thought for something like this would be to create a table
"Logs" with a CF "Data" that have qualifiers of "Search", "Click" and
"View". Each column would have a row with the IP as its key.
Is this along the right lines or am I missing something... sure feels
like I am. Would anyone please explain how I would accomplish what I am
looking for.
Thanks
RE: Generic Schema Question
Posted by "Buttler, David" <bu...@llnl.gov>.
If you are interested in the most recent 100 transactions, instead of using currentTimeMillis() as part of your key, you can use Long.MAX_VALUE-System.currentTimeMillis(). That way new entries get put at the top. Then you can have a start row of your scan to be "192.168.1.2" and the first result will be the most recent entry. You can then just scan for 100 rows and get all of what you want.
Dave
-----Original Message-----
From: Mark [mailto:static.void.dev@gmail.com]
Sent: Saturday, August 13, 2011 5:16 PM
To: user@hbase.apache.org
Subject: Re: Generic Schema Question
Ok so something like this?
row cf:qual value
-----------------------------------------
192.168.1.2/1313280451 data:page "/foo/bar"
192.168.1.2/1313280451 data:referrer "google.com"
192.168.1.2/1313280451 data:session "f306e5af69b48568323fdc3018e40e7e"
-----------------------------------------
192.168.1.2/1313281242 data:page "/foo/baz"
192.168.1.2/1313281242 data:page ""
192.168.1.2/1313281242 data:page "f306e5af69b48568323fdc3018e40e7e"
....
Will this allow me to query the last 100 rows for ip "192.168.1.2". If
so, how? Will it be efficient? Also, would you mind explaining an
alternative way of accomplishing this as I'm still trying to figure out
all the possibilities.
Thanks again
On 8/13/11 4:53 PM, Blake Lemoine wrote:
> You need to have the ip address followed by a slash followed by the time as
> the row key. Or some other such a way of getting multiple rows per ip.
> Then you could scan for the ip prefix. Of course that's just one possible
> solution.
> On Aug 13, 2011 1:01 PM, "Mark"<st...@gmail.com> wrote:
>> Hi all, I'm trying to wrap my head around HBase schema design and I am
>> having trouble modeling the following use case:
>>
>> We store all our use behavior (clicks, searches, page views) in Hadoop
>> and we would like to add this into HBase so we can interactively
>> "explore" what our users are doing. For example we would like, given an
>> IP address get back a list of all searches, page views, clicks etc that
>> this user has attempted.
>>
>> My initial thought for something like this would be to create a table
>> "Logs" with a CF "Data" that have qualifiers of "Search", "Click" and
>> "View". Each column would have a row with the IP as its key.
>>
>> Is this along the right lines or am I missing something... sure feels
>> like I am. Would anyone please explain how I would accomplish what I am
>> looking for.
>>
>> Thanks
Re: Generic Schema Question
Posted by Li Pi <li...@cloudera.com>.
You can do a range scan for 192.168.1.2/1313280451 to 192.168.1.2/1313281242
.
Do setbatch to 100.
Alternatively, you can just use the IP as the key alone, and let hbase keep
track of versions. Set maxversions to an Integer.MAX when creating the
column, and just do a get of 192.168.1.2 with
*setMaxVersions<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions(int)>
*(int maxVersions) with maxversions = 100.
On Sat, Aug 13, 2011 at 5:16 PM, Mark <st...@gmail.com> wrote:
> Ok so something like this?
>
> row cf:qual value
> ------------------------------**-----------
> 192.168.1.2/1313280451 data:page "/foo/bar"
> 192.168.1.2/1313280451 data:referrer "google.com"
> 192.168.1.2/1313280451 data:session "**f306e5af69b48568323fdc3018e40e**
> 7e"
>
> ------------------------------**-----------
> 192.168.1.2/1313281242 data:page "/foo/baz"
> 192.168.1.2/1313281242 data:page ""
> 192.168.1.2/1313281242 data:page "**f306e5af69b48568323fdc3018e40e**7e"
> ....
>
> Will this allow me to query the last 100 rows for ip "192.168.1.2". If so,
> how? Will it be efficient? Also, would you mind explaining an alternative
> way of accomplishing this as I'm still trying to figure out all the
> possibilities.
>
> Thanks again
>
>
>
> On 8/13/11 4:53 PM, Blake Lemoine wrote:
>
>> You need to have the ip address followed by a slash followed by the time
>> as
>> the row key. Or some other such a way of getting multiple rows per ip.
>> Then you could scan for the ip prefix. Of course that's just one possible
>> solution.
>> On Aug 13, 2011 1:01 PM, "Mark"<st...@gmail.com>>
>> wrote:
>>
>>> Hi all, I'm trying to wrap my head around HBase schema design and I am
>>> having trouble modeling the following use case:
>>>
>>> We store all our use behavior (clicks, searches, page views) in Hadoop
>>> and we would like to add this into HBase so we can interactively
>>> "explore" what our users are doing. For example we would like, given an
>>> IP address get back a list of all searches, page views, clicks etc that
>>> this user has attempted.
>>>
>>> My initial thought for something like this would be to create a table
>>> "Logs" with a CF "Data" that have qualifiers of "Search", "Click" and
>>> "View". Each column would have a row with the IP as its key.
>>>
>>> Is this along the right lines or am I missing something... sure feels
>>> like I am. Would anyone please explain how I would accomplish what I am
>>> looking for.
>>>
>>> Thanks
>>>
>>
Re: Generic Schema Question
Posted by Doug Meil <do...@explorysmedical.com>.
See this section in the Hbase book...
11.6.3. Close ResultScanners
There is a snippet of how to use a Scan, which is what you'd what for that.
I just realized that there should be a better Scan example in the Data
Model chapter. I'll add it.
Doug Meil
Chief Software Architect, Explorys
doug.meil@explorys.com
On 8/13/11 8:16 PM, "Mark" <st...@gmail.com> wrote:
>Ok so something like this?
>
>row cf:qual value
>-----------------------------------------
>192.168.1.2/1313280451 data:page "/foo/bar"
>192.168.1.2/1313280451 data:referrer "google.com"
>192.168.1.2/1313280451 data:session "f306e5af69b48568323fdc3018e40e7e"
>
>-----------------------------------------
>192.168.1.2/1313281242 data:page "/foo/baz"
>192.168.1.2/1313281242 data:page ""
>192.168.1.2/1313281242 data:page "f306e5af69b48568323fdc3018e40e7e"
>....
>
>Will this allow me to query the last 100 rows for ip "192.168.1.2". If
>so, how? Will it be efficient? Also, would you mind explaining an
>alternative way of accomplishing this as I'm still trying to figure out
>all the possibilities.
>
>Thanks again
>
>
>On 8/13/11 4:53 PM, Blake Lemoine wrote:
>> You need to have the ip address followed by a slash followed by the
>>time as
>> the row key. Or some other such a way of getting multiple rows per ip.
>> Then you could scan for the ip prefix. Of course that's just one
>>possible
>> solution.
>> On Aug 13, 2011 1:01 PM, "Mark"<st...@gmail.com> wrote:
>>> Hi all, I'm trying to wrap my head around HBase schema design and I am
>>> having trouble modeling the following use case:
>>>
>>> We store all our use behavior (clicks, searches, page views) in Hadoop
>>> and we would like to add this into HBase so we can interactively
>>> "explore" what our users are doing. For example we would like, given an
>>> IP address get back a list of all searches, page views, clicks etc that
>>> this user has attempted.
>>>
>>> My initial thought for something like this would be to create a table
>>> "Logs" with a CF "Data" that have qualifiers of "Search", "Click" and
>>> "View". Each column would have a row with the IP as its key.
>>>
>>> Is this along the right lines or am I missing something... sure feels
>>> like I am. Would anyone please explain how I would accomplish what I am
>>> looking for.
>>>
>>> Thanks
Re: Generic Schema Question
Posted by Mark <st...@gmail.com>.
Ok so something like this?
row cf:qual value
-----------------------------------------
192.168.1.2/1313280451 data:page "/foo/bar"
192.168.1.2/1313280451 data:referrer "google.com"
192.168.1.2/1313280451 data:session "f306e5af69b48568323fdc3018e40e7e"
-----------------------------------------
192.168.1.2/1313281242 data:page "/foo/baz"
192.168.1.2/1313281242 data:page ""
192.168.1.2/1313281242 data:page "f306e5af69b48568323fdc3018e40e7e"
....
Will this allow me to query the last 100 rows for ip "192.168.1.2". If
so, how? Will it be efficient? Also, would you mind explaining an
alternative way of accomplishing this as I'm still trying to figure out
all the possibilities.
Thanks again
On 8/13/11 4:53 PM, Blake Lemoine wrote:
> You need to have the ip address followed by a slash followed by the time as
> the row key. Or some other such a way of getting multiple rows per ip.
> Then you could scan for the ip prefix. Of course that's just one possible
> solution.
> On Aug 13, 2011 1:01 PM, "Mark"<st...@gmail.com> wrote:
>> Hi all, I'm trying to wrap my head around HBase schema design and I am
>> having trouble modeling the following use case:
>>
>> We store all our use behavior (clicks, searches, page views) in Hadoop
>> and we would like to add this into HBase so we can interactively
>> "explore" what our users are doing. For example we would like, given an
>> IP address get back a list of all searches, page views, clicks etc that
>> this user has attempted.
>>
>> My initial thought for something like this would be to create a table
>> "Logs" with a CF "Data" that have qualifiers of "Search", "Click" and
>> "View". Each column would have a row with the IP as its key.
>>
>> Is this along the right lines or am I missing something... sure feels
>> like I am. Would anyone please explain how I would accomplish what I am
>> looking for.
>>
>> Thanks
Re: Generic Schema Question
Posted by Blake Lemoine <ba...@gmail.com>.
You need to have the ip address followed by a slash followed by the time as
the row key. Or some other such a way of getting multiple rows per ip.
Then you could scan for the ip prefix. Of course that's just one possible
solution.
On Aug 13, 2011 1:01 PM, "Mark" <st...@gmail.com> wrote:
> Hi all, I'm trying to wrap my head around HBase schema design and I am
> having trouble modeling the following use case:
>
> We store all our use behavior (clicks, searches, page views) in Hadoop
> and we would like to add this into HBase so we can interactively
> "explore" what our users are doing. For example we would like, given an
> IP address get back a list of all searches, page views, clicks etc that
> this user has attempted.
>
> My initial thought for something like this would be to create a table
> "Logs" with a CF "Data" that have qualifiers of "Search", "Click" and
> "View". Each column would have a row with the IP as its key.
>
> Is this along the right lines or am I missing something... sure feels
> like I am. Would anyone please explain how I would accomplish what I am
> looking for.
>
> Thanks