You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Thomas Thevis <Th...@semgine.com> on 2008/04/16 13:22:29 UTC

ID Service with HBase?

Hello list readers,

I'd like to perform mass data operations resulting in several output 
files with cross-references between lines in different files. For this 
purpose, I want to use a kind of ID service and I wonder whether I could 
use HBase for this task.
However, until now I was not able to use the HBase locking mechanism in 
a way that newly created IDs are unique.

The setup:
- each Mapper has its own instance of an IDSevice implementation
- each IDService instance has its own reference to the ID table in the HBase

The code snippet which is used to return and update IDs:
[code]
final String columnName = this.config.get(ID_COLUMN_ID);
final Text column = new Text(columnName);
final String tableName = this.config.get(ID_SERVICE_TABLE_ID);
final HTable table = new HTable(this.config, new Text(tableName));
final Text rowName = new Text(namespace);
final long startValue;

final long lockid = table.startUpdate(rowName);
final byte[] bytes = table.get(rowName, column);
if (bytes == null) {
     startValue = 0;
} else {
     final ByteArrayInputStream byteArrayInputStream
         = new ByteArrayInputStream(bytes);
     final LongWritable longWritable = new LongWritable();
     longWritable.readFields(new DataInputStream(byteArrayInputStream));
     startValue = longWritable.get();
}
final long stopValue = startValue + size;
table.put(lockid, column, new LongWritable(stopValue));
table.commit(lockid);
[/code]

As stated above, resulting IDs are not unique, about a quarter of all 
created IDs appears several times.
Now my question: Do I use the locking mechanism the wrong way or is my 
approach to use HBase locking and synchronizing for this task completely 
wrong?

Thanks,

Thomas

Re: ID Service with HBase?

Posted by Bryan Duxbury <br...@rapleaf.com>.

HBASE-493 was created, and seems similar. It's a write-if-not- 
modified-since.

I would guess that you probably don't want to use HBase to maintain a  
distributed auto-increment. You need to think of some other approach  
the produces unique ids across concurrent access, like hash or GUID  
or something like that.

-Bryan

On Apr 16, 2008, at 2:18 PM, Jim Kellerman wrote:

> Row locks do not apply to reads, only updates. They prevent two  
> applications from updating the same row simultaneously. There is no  
> other locking mechanism in HBase. (It follows Bigtable in this  
> regard. See http://labs.google.com/papers/bigtable.html )
>
> There has been some discussion about adding a conditional write  
> (i.e. only completes successfully if the current value of the cell  
> being updated has value x), but noone has thought it important  
> enough to enter an enhancement request on the HBase Jira: https:// 
> issues.apache.org/jira/browse/HBASE
>
> By the way, you will get a more timely response to HBase questions  
> if you address them to the hbase mailing list: hbase- 
> user@hadoop.apache.org
>
> ---
> Jim Kellerman, Senior Engineer; Powerset
>
>
>> -----Original Message-----
>> From: Thomas Thevis [mailto:Thomas.Thevis@semgine.com]
>> Sent: Wednesday, April 16, 2008 4:22 AM
>> To: core-user@hadoop.apache.org
>> Subject: ID Service with HBase?
>>
>> Hello list readers,
>>
>> I'd like to perform mass data operations resulting in several
>> output files with cross-references between lines in different
>> files. For this purpose, I want to use a kind of ID service
>> and I wonder whether I could use HBase for this task.
>> However, until now I was not able to use the HBase locking
>> mechanism in a way that newly created IDs are unique.
>>
>> The setup:
>> - each Mapper has its own instance of an IDSevice implementation
>> - each IDService instance has its own reference to the ID
>> table in the HBase
>>
>> The code snippet which is used to return and update IDs:
>> [code]
>> final String columnName = this.config.get(ID_COLUMN_ID);
>> final Text column = new Text(columnName); final String
>> tableName = this.config.get(ID_SERVICE_TABLE_ID);
>> final HTable table = new HTable(this.config, new
>> Text(tableName)); final Text rowName = new Text(namespace);
>> final long startValue;
>>
>> final long lockid = table.startUpdate(rowName); final byte[]
>> bytes = table.get(rowName, column); if (bytes == null) {
>>      startValue = 0;
>> } else {
>>      final ByteArrayInputStream byteArrayInputStream
>>          = new ByteArrayInputStream(bytes);
>>      final LongWritable longWritable = new LongWritable();
>>      longWritable.readFields(new
>> DataInputStream(byteArrayInputStream));
>>      startValue = longWritable.get();
>> }
>> final long stopValue = startValue + size; table.put(lockid,
>> column, new LongWritable(stopValue)); table.commit(lockid); [/code]
>>
>> As stated above, resulting IDs are not unique, about a
>> quarter of all created IDs appears several times.
>> Now my question: Do I use the locking mechanism the wrong way
>> or is my approach to use HBase locking and synchronizing for
>> this task completely wrong?
>>
>> Thanks,
>>
>> Thomas
>>
>> No virus found in this incoming message.
>> Checked by AVG.
>> Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release
>> Date: 4/16/2008 9:34 AM
>>
>>
>
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release Date:  
> 4/16/2008 9:34 AM
>

Re: ID Service with HBase?

Posted by Bryan Duxbury <br...@rapleaf.com>.

HBASE-493 was created, and seems similar. It's a write-if-not- 
modified-since.

I would guess that you probably don't want to use HBase to maintain a  
distributed auto-increment. You need to think of some other approach  
the produces unique ids across concurrent access, like hash or GUID  
or something like that.

-Bryan

On Apr 16, 2008, at 2:18 PM, Jim Kellerman wrote:

> Row locks do not apply to reads, only updates. They prevent two  
> applications from updating the same row simultaneously. There is no  
> other locking mechanism in HBase. (It follows Bigtable in this  
> regard. See http://labs.google.com/papers/bigtable.html )
>
> There has been some discussion about adding a conditional write  
> (i.e. only completes successfully if the current value of the cell  
> being updated has value x), but noone has thought it important  
> enough to enter an enhancement request on the HBase Jira: https:// 
> issues.apache.org/jira/browse/HBASE
>
> By the way, you will get a more timely response to HBase questions  
> if you address them to the hbase mailing list: hbase- 
> user@hadoop.apache.org
>
> ---
> Jim Kellerman, Senior Engineer; Powerset
>
>
>> -----Original Message-----
>> From: Thomas Thevis [mailto:Thomas.Thevis@semgine.com]
>> Sent: Wednesday, April 16, 2008 4:22 AM
>> To: core-user@hadoop.apache.org
>> Subject: ID Service with HBase?
>>
>> Hello list readers,
>>
>> I'd like to perform mass data operations resulting in several
>> output files with cross-references between lines in different
>> files. For this purpose, I want to use a kind of ID service
>> and I wonder whether I could use HBase for this task.
>> However, until now I was not able to use the HBase locking
>> mechanism in a way that newly created IDs are unique.
>>
>> The setup:
>> - each Mapper has its own instance of an IDSevice implementation
>> - each IDService instance has its own reference to the ID
>> table in the HBase
>>
>> The code snippet which is used to return and update IDs:
>> [code]
>> final String columnName = this.config.get(ID_COLUMN_ID);
>> final Text column = new Text(columnName); final String
>> tableName = this.config.get(ID_SERVICE_TABLE_ID);
>> final HTable table = new HTable(this.config, new
>> Text(tableName)); final Text rowName = new Text(namespace);
>> final long startValue;
>>
>> final long lockid = table.startUpdate(rowName); final byte[]
>> bytes = table.get(rowName, column); if (bytes == null) {
>>      startValue = 0;
>> } else {
>>      final ByteArrayInputStream byteArrayInputStream
>>          = new ByteArrayInputStream(bytes);
>>      final LongWritable longWritable = new LongWritable();
>>      longWritable.readFields(new
>> DataInputStream(byteArrayInputStream));
>>      startValue = longWritable.get();
>> }
>> final long stopValue = startValue + size; table.put(lockid,
>> column, new LongWritable(stopValue)); table.commit(lockid); [/code]
>>
>> As stated above, resulting IDs are not unique, about a
>> quarter of all created IDs appears several times.
>> Now my question: Do I use the locking mechanism the wrong way
>> or is my approach to use HBase locking and synchronizing for
>> this task completely wrong?
>>
>> Thanks,
>>
>> Thomas
>>
>> No virus found in this incoming message.
>> Checked by AVG.
>> Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release
>> Date: 4/16/2008 9:34 AM
>>
>>
>
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release Date:  
> 4/16/2008 9:34 AM
>

Re: ID Service with HBase?

Posted by Thomas Thevis <Th...@semgine.com>.

Hello Jim,

> Row locks do not apply to reads, only updates. They prevent two applications from updating the same row simultaneously. There is no other locking mechanism in HBase. (It follows Bigtable in this regard. See http://labs.google.com/papers/bigtable.html )
Thank you for clarification. This was exactly the point I missed before.

> There has been some discussion about adding a conditional write (i.e. only completes successfully if the current value of the cell being updated has value x), but noone has thought it important enough to enter an enhancement request on the HBase Jira: https://issues.apache.org/jira/browse/HBASE
> 
> By the way, you will get a more timely response to HBase questions if you address them to the hbase mailing list: hbase-user@hadoop.apache.org
Sorry, you're right.

Thanks a lot!

Thomas

>> -----Original Message-----
>> From: Thomas Thevis [mailto:Thomas.Thevis@semgine.com]
>> Sent: Wednesday, April 16, 2008 4:22 AM
>> To: core-user@hadoop.apache.org
>> Subject: ID Service with HBase?
>>
>> Hello list readers,
>>
>> I'd like to perform mass data operations resulting in several
>> output files with cross-references between lines in different
>> files. For this purpose, I want to use a kind of ID service
>> and I wonder whether I could use HBase for this task.
>> However, until now I was not able to use the HBase locking
>> mechanism in a way that newly created IDs are unique.
>>
>> The setup:
>> - each Mapper has its own instance of an IDSevice implementation
>> - each IDService instance has its own reference to the ID
>> table in the HBase
>>
>> The code snippet which is used to return and update IDs:
>> [code]
>> final String columnName = this.config.get(ID_COLUMN_ID);
>> final Text column = new Text(columnName); final String
>> tableName = this.config.get(ID_SERVICE_TABLE_ID);
>> final HTable table = new HTable(this.config, new
>> Text(tableName)); final Text rowName = new Text(namespace);
>> final long startValue;
>>
>> final long lockid = table.startUpdate(rowName); final byte[]
>> bytes = table.get(rowName, column); if (bytes == null) {
>>      startValue = 0;
>> } else {
>>      final ByteArrayInputStream byteArrayInputStream
>>          = new ByteArrayInputStream(bytes);
>>      final LongWritable longWritable = new LongWritable();
>>      longWritable.readFields(new
>> DataInputStream(byteArrayInputStream));
>>      startValue = longWritable.get();
>> }
>> final long stopValue = startValue + size; table.put(lockid,
>> column, new LongWritable(stopValue)); table.commit(lockid); [/code]
>>
>> As stated above, resulting IDs are not unique, about a
>> quarter of all created IDs appears several times.
>> Now my question: Do I use the locking mechanism the wrong way
>> or is my approach to use HBase locking and synchronizing for
>> this task completely wrong?
>>
>> Thanks,
>>
>> Thomas
>>
>> No virus found in this incoming message.
>> Checked by AVG.
>> Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release
>> Date: 4/16/2008 9:34 AM
>>
>>
> 
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release Date: 4/16/2008 9:34 AM
> 
>

RE: ID Service with HBase?

Posted by Jim Kellerman <ji...@powerset.com>.

Row locks do not apply to reads, only updates. They prevent two applications from updating the same row simultaneously. There is no other locking mechanism in HBase. (It follows Bigtable in this regard. See http://labs.google.com/papers/bigtable.html )

There has been some discussion about adding a conditional write (i.e. only completes successfully if the current value of the cell being updated has value x), but noone has thought it important enough to enter an enhancement request on the HBase Jira: https://issues.apache.org/jira/browse/HBASE

By the way, you will get a more timely response to HBase questions if you address them to the hbase mailing list: hbase-user@hadoop.apache.org

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Thomas Thevis [mailto:Thomas.Thevis@semgine.com]
> Sent: Wednesday, April 16, 2008 4:22 AM
> To: core-user@hadoop.apache.org
> Subject: ID Service with HBase?
>
> Hello list readers,
>
> I'd like to perform mass data operations resulting in several
> output files with cross-references between lines in different
> files. For this purpose, I want to use a kind of ID service
> and I wonder whether I could use HBase for this task.
> However, until now I was not able to use the HBase locking
> mechanism in a way that newly created IDs are unique.
>
> The setup:
> - each Mapper has its own instance of an IDSevice implementation
> - each IDService instance has its own reference to the ID
> table in the HBase
>
> The code snippet which is used to return and update IDs:
> [code]
> final String columnName = this.config.get(ID_COLUMN_ID);
> final Text column = new Text(columnName); final String
> tableName = this.config.get(ID_SERVICE_TABLE_ID);
> final HTable table = new HTable(this.config, new
> Text(tableName)); final Text rowName = new Text(namespace);
> final long startValue;
>
> final long lockid = table.startUpdate(rowName); final byte[]
> bytes = table.get(rowName, column); if (bytes == null) {
>      startValue = 0;
> } else {
>      final ByteArrayInputStream byteArrayInputStream
>          = new ByteArrayInputStream(bytes);
>      final LongWritable longWritable = new LongWritable();
>      longWritable.readFields(new
> DataInputStream(byteArrayInputStream));
>      startValue = longWritable.get();
> }
> final long stopValue = startValue + size; table.put(lockid,
> column, new LongWritable(stopValue)); table.commit(lockid); [/code]
>
> As stated above, resulting IDs are not unique, about a
> quarter of all created IDs appears several times.
> Now my question: Do I use the locking mechanism the wrong way
> or is my approach to use HBase locking and synchronizing for
> this task completely wrong?
>
> Thanks,
>
> Thomas
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release
> Date: 4/16/2008 9:34 AM
>
>

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release Date: 4/16/2008 9:34 AM

RE: ID Service with HBase?

Posted by Jim Kellerman <ji...@powerset.com>.

Row locks do not apply to reads, only updates. They prevent two applications from updating the same row simultaneously. There is no other locking mechanism in HBase. (It follows Bigtable in this regard. See http://labs.google.com/papers/bigtable.html )

There has been some discussion about adding a conditional write (i.e. only completes successfully if the current value of the cell being updated has value x), but noone has thought it important enough to enter an enhancement request on the HBase Jira: https://issues.apache.org/jira/browse/HBASE

By the way, you will get a more timely response to HBase questions if you address them to the hbase mailing list: hbase-user@hadoop.apache.org

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Thomas Thevis [mailto:Thomas.Thevis@semgine.com]
> Sent: Wednesday, April 16, 2008 4:22 AM
> To: core-user@hadoop.apache.org
> Subject: ID Service with HBase?
>
> Hello list readers,
>
> I'd like to perform mass data operations resulting in several
> output files with cross-references between lines in different
> files. For this purpose, I want to use a kind of ID service
> and I wonder whether I could use HBase for this task.
> However, until now I was not able to use the HBase locking
> mechanism in a way that newly created IDs are unique.
>
> The setup:
> - each Mapper has its own instance of an IDSevice implementation
> - each IDService instance has its own reference to the ID
> table in the HBase
>
> The code snippet which is used to return and update IDs:
> [code]
> final String columnName = this.config.get(ID_COLUMN_ID);
> final Text column = new Text(columnName); final String
> tableName = this.config.get(ID_SERVICE_TABLE_ID);
> final HTable table = new HTable(this.config, new
> Text(tableName)); final Text rowName = new Text(namespace);
> final long startValue;
>
> final long lockid = table.startUpdate(rowName); final byte[]
> bytes = table.get(rowName, column); if (bytes == null) {
>      startValue = 0;
> } else {
>      final ByteArrayInputStream byteArrayInputStream
>          = new ByteArrayInputStream(bytes);
>      final LongWritable longWritable = new LongWritable();
>      longWritable.readFields(new
> DataInputStream(byteArrayInputStream));
>      startValue = longWritable.get();
> }
> final long stopValue = startValue + size; table.put(lockid,
> column, new LongWritable(stopValue)); table.commit(lockid); [/code]
>
> As stated above, resulting IDs are not unique, about a
> quarter of all created IDs appears several times.
> Now my question: Do I use the locking mechanism the wrong way
> or is my approach to use HBase locking and synchronizing for
> this task completely wrong?
>
> Thanks,
>
> Thomas
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release
> Date: 4/16/2008 9:34 AM
>
>

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.0/1381 - Release Date: 4/16/2008 9:34 AM