You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shrijeet Paliwal <sh...@rocketfuel.com> on 2012/09/25 18:34:35 UTC

[Schema] Put or Increment ?

Hi,
Suppose I am tracking user activity by storing his IP each time he hits the
web service. The row id will be uid of user and column qualifiers will be
IPs themselves. I am contemplating whether to use a Put or Increment API.
The must have requirement is distinct IPs associated with the user. It will
be good to have count of visits per IP, but not having the count is OK too
(if its expensive). Please help me compare the performance of increment vs
put in this context. Will I see better throughput using one over other?
Better space utilization? What else?

-Shrijeet

Re: [Schema] Put or Increment ?

Posted by lars hofhansl <lh...@yahoo.com>.
Increment is slightly more expensive, since the RegionServer executing the Increment needs to retrieve the old value(s) first (while holding the row lock).

-- Lars



----- Original Message -----
From: Shrijeet Paliwal <sh...@rocketfuel.com>
To: user@hbase.apache.org
Cc: 
Sent: Tuesday, September 25, 2012 10:02 AM
Subject: Re: [Schema] Put or Increment ?

On Tue, Sep 25, 2012 at 9:56 AM, Pamecha, Abhishek <ap...@x.com> wrote:

> Hi Shrijeet
>
> What's your usecase? That should drive your decision. Put will overwrite
> in case your userid and ip address is same. Increment would just bump up
> the counter.
>

#1 Keep a list of distinct IPs
#2 Counts per IP (only if comes cheap)
#3 Do blind writes (instead of read-modify-write)

Given #3 , overwrite is okay. My question is about #2, if the cost is
trivial I will use increment.


> -abhishek
>
>
> -----Original Message-----
> From: Shrijeet Paliwal [mailto:shrijeet@rocketfuel.com]
> Sent: Tuesday, September 25, 2012 9:35 AM
> To: user@hbase.apache.org
> Subject: [Schema] Put or Increment ?
>
> Hi,
> Suppose I am tracking user activity by storing his IP each time he hits
> the web service. The row id will be uid of user and column qualifiers will
> be IPs themselves. I am contemplating whether to use a Put or Increment API.
> The must have requirement is distinct IPs associated with the user. It
> will be good to have count of visits per IP, but not having the count is OK
> too (if its expensive). Please help me compare the performance of increment
> vs put in this context. Will I see better throughput using one over other?
> Better space utilization? What else?
>
> -Shrijeet
>


Re: [Schema] Put or Increment ?

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
On Tue, Sep 25, 2012 at 9:56 AM, Pamecha, Abhishek <ap...@x.com> wrote:

> Hi Shrijeet
>
> What's your usecase? That should drive your decision. Put will overwrite
> in case your userid and ip address is same. Increment would just bump up
> the counter.
>

#1 Keep a list of distinct IPs
#2 Counts per IP (only if comes cheap)
#3 Do blind writes (instead of read-modify-write)

Given #3 , overwrite is okay. My question is about #2, if the cost is
trivial I will use increment.


> -abhishek
>
>
> -----Original Message-----
> From: Shrijeet Paliwal [mailto:shrijeet@rocketfuel.com]
> Sent: Tuesday, September 25, 2012 9:35 AM
> To: user@hbase.apache.org
> Subject: [Schema] Put or Increment ?
>
> Hi,
> Suppose I am tracking user activity by storing his IP each time he hits
> the web service. The row id will be uid of user and column qualifiers will
> be IPs themselves. I am contemplating whether to use a Put or Increment API.
> The must have requirement is distinct IPs associated with the user. It
> will be good to have count of visits per IP, but not having the count is OK
> too (if its expensive). Please help me compare the performance of increment
> vs put in this context. Will I see better throughput using one over other?
> Better space utilization? What else?
>
> -Shrijeet
>

RE: [Schema] Put or Increment ?

Posted by "Pamecha, Abhishek" <ap...@x.com>.
Hi Shrijeet

What's your usecase? That should drive your decision. Put will overwrite in case your userid and ip address is same. Increment would just bump up the counter.   

-abhishek


-----Original Message-----
From: Shrijeet Paliwal [mailto:shrijeet@rocketfuel.com] 
Sent: Tuesday, September 25, 2012 9:35 AM
To: user@hbase.apache.org
Subject: [Schema] Put or Increment ?

Hi,
Suppose I am tracking user activity by storing his IP each time he hits the web service. The row id will be uid of user and column qualifiers will be IPs themselves. I am contemplating whether to use a Put or Increment API.
The must have requirement is distinct IPs associated with the user. It will be good to have count of visits per IP, but not having the count is OK too (if its expensive). Please help me compare the performance of increment vs put in this context. Will I see better throughput using one over other?
Better space utilization? What else?

-Shrijeet