You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Jim R. Wilson" <wi...@gmail.com> on 2008/05/07 20:12:57 UTC

hbase on ec2 with s3 anyone?

Hi all,

I'm about to embark on a mystical journey through hosted web-services
with my trusted friend hbase.  Here are some questions for my fellow
travelers:

1) Has anyone done this before? If so, what lifesaving tips can you offer?
2) Should I attempt to build an hdfs out of ec2 persistent storage, or
just use S3?
3) How many images will I need? Just one, or master/slave?
4) What version of hadoop/hbase should I use?  (The hadoop/ec2
instructions[1] seem to favor the unreleased 0.17, but there doesn't
seem to be a public image with 0.17 at the ready)

Thanks in advance for any advice, I'm gearing up for quite a trip :)

[1] http://wiki.apache.org/hadoop/AmazonEC2

-- Jim R. Wilson (jimbojw)

Re: Does HBase support single-row transaction?

Posted by Zhou Wei <zh...@mails.tsinghua.edu.cn>.
SongJing Zhang 写道:
> long lockId = table.startUpdate(new Text("myRow"));
> ...
> ...
> ....
> table.commit(lockId);  ||   table.abort(lockId);
>   
You're right, but I just wrote simplified version of yours.

>
>   


Re: Does HBase support single-row transaction?

Posted by SongJing Zhang <zh...@gmail.com>.
long lockId = table.startUpdate(new Text("myRow"));
...
...
....
table.commit(lockId);  ||   table.abort(lockId);




On Thu, May 8, 2008 at 10:48 AM, Zhou Wei
<zh...@mails.tsinghua.edu.cn> wrote:
> Hi
>  Does HBase support single-row transaction as described in Bigtable paper?
>
>  "Bigtable supports single-row transactions, which can be
>  used to perform atomic read-modify-write sequences on
>  data stored under a single row key."  --Bigtable paper
>
>  If so, how can I define a transaction in HBase,
>  is it looks like this:
>
>  lid=startUpdate
>  get(lid)
>  ..
>  put(lid)
>  ...
>  commit(lid)
>
>  Are these transactions isolated with each other?
>  If not, is there a way to achieve that?
>
>  Thanks
>
>  Zhou
>

Re: Does HBase support single-row transaction?

Posted by Clint Morgan <cl...@gmail.com>.
> "When the application creates an entity, it can assign another entity as the
> parent of the new entity. Assigning a parent to a new entity puts the new
> entity in the same entity group as the parent entity."
>
> I think I need to sign up for app engine and use it to see if I can figure
> how the above is done.

Was thinking this may be done with row key prefix. So all members of
an entity group have the same prefix and are collocated. Then the
regions (or tablets or datastore nodes) must know not to split in the
middle of such a prefix.

Also, it would make sense that they have one table per app engine
user, and each table stores all the kinds (types) that the application
uses...

> We'd need to have HBASE-493 in place building any kind of OCC.
I see the value of 493 for OCC with single row transactions, but for
multi-row transactions i think its not useful. Basically we would have
to hold of on all row puts if any relevant row has conflicts.
cheers,
-clint

Re: Does HBase support single-row transaction?

Posted by stack <st...@duboce.net>.
Clint Morgan wrote:
> So if we wrote all operations for a transaction first to ZooKeeper, we
> still need something like a Distributed Transaction Manager to
> orchestrate the commit process: Send BatchUpdates to each
> RegionServer, ask them to commit, then commit or rollback based on
> results from all participating RegionServers. 
Yes.

> Or is there some more
> clever way to use ZooKeeper? Maybe encoding a commit protocol into the
> Zookeeper nodes...
>
>   
This page is interesting discussing how you can build various 
cluster-wide primitives such as locks and two-phase commit using 
zookeeper:  http://zookeeper.wiki.sourceforge.net/ZooKeeperRecipes.  
Still would need a transaction orchestrator of some sort.

> Looks like google's datastore has a mechanism for keeping groups of
> rows (entity groups) together on the same server (datastore node).
>   
>From 
http://code.google.com/appengine/docs/datastore/keysandentitygroups.html:

"When the application creates an entity, it can assign another entity as 
the parent of the new entity. Assigning a parent to a new entity puts 
the new entity in the same entity group as the parent entity."

I think I need to sign up for app engine and use it to see if I can 
figure how the above is done.
> Then they allow transactions only on rows in the same group. This way
> they don't have to worry about distributed transactions. Rather than
> locking, they use optimistic concurrency control. This means they do
> the transaction in a sandbox, then check for conflicts from other
> transactions before committing.
We'd need to have HBASE-493 in place building any kind of OCC.

St.Ack



> -clint
>
> On Tue, May 27, 2008 at 2:13 PM, stack <st...@duboce.net> wrote:
>   
>> Clint Morgan wrote:
>>     
>>> Zookeeper makes good sense for distributed locking to get isolation.
>>> But we still need transaction start, commit, and rollback to get
>>> atomicity. I think this properly belongs in hbase.
>>>
>>>       
>> Since all clients are going via zookeeper anyways ('isolation'), maybe
>> it'd be better to just run the whole transaction management out of
>> zookeeper? Clients would open a transaction on zookeeper and put their
>> edits there so they were available for rollback and/or commit. If client
>> died midway, could ask zookeeper for outstanding transactions and pickup
>> whereever it'd left off. Otherwise, on success (or rollback), clean up
>> the transaction log.
>>
>> Alternatively, all clients would have to go via the hbase master so it
>> could orchestrate row access. Master would need to hold outstanding
>> transactions somewhere either in an in-memory transactions catalog table
>> or itself over in zookeeper.
>>
>> St.Ack
>>
>>     


Re: Does HBase support single-row transaction?

Posted by Clint Morgan <cl...@gmail.com>.
So if we wrote all operations for a transaction first to ZooKeeper, we
still need something like a Distributed Transaction Manager to
orchestrate the commit process: Send BatchUpdates to each
RegionServer, ask them to commit, then commit or rollback based on
results from all participating RegionServers. Or is there some more
clever way to use ZooKeeper? Maybe encoding a commit protocol into the
Zookeeper nodes...

Looks like google's datastore has a mechanism for keeping groups of
rows (entity groups) together on the same server (datastore node).
Then they allow transactions only on rows in the same group. This way
they don't have to worry about distributed transactions. Rather than
locking, they use optimistic concurrency control. This means they do
the transaction in a sandbox, then check for conflicts from other
transactions before committing.

-clint

On Tue, May 27, 2008 at 2:13 PM, stack <st...@duboce.net> wrote:
> Clint Morgan wrote:
>> Zookeeper makes good sense for distributed locking to get isolation.
>> But we still need transaction start, commit, and rollback to get
>> atomicity. I think this properly belongs in hbase.
>>
> Since all clients are going via zookeeper anyways ('isolation'), maybe
> it'd be better to just run the whole transaction management out of
> zookeeper? Clients would open a transaction on zookeeper and put their
> edits there so they were available for rollback and/or commit. If client
> died midway, could ask zookeeper for outstanding transactions and pickup
> whereever it'd left off. Otherwise, on success (or rollback), clean up
> the transaction log.
>
> Alternatively, all clients would have to go via the hbase master so it
> could orchestrate row access. Master would need to hold outstanding
> transactions somewhere either in an in-memory transactions catalog table
> or itself over in zookeeper.
>
> St.Ack
>

Re: Does HBase support single-row transaction?

Posted by stack <st...@duboce.net>.
Clint Morgan wrote:
> Zookeeper makes good sense for distributed locking to get isolation.
> But we still need transaction start, commit, and rollback to get
> atomicity. I think this properly belongs in hbase.
>   
Since all clients are going via zookeeper anyways ('isolation'), maybe
it'd be better to just run the whole transaction management out of
zookeeper? Clients would open a transaction on zookeeper and put their
edits there so they were available for rollback and/or commit. If client
died midway, could ask zookeeper for outstanding transactions and pickup
whereever it'd left off. Otherwise, on success (or rollback), clean up
the transaction log.

Alternatively, all clients would have to go via the hbase master so it
could orchestrate row access. Master would need to hold outstanding
transactions somewhere either in an in-memory transactions catalog table
or itself over in zookeeper.

St.Ack

Re: Does HBase support single-row transaction?

Posted by stack <st...@duboce.net>.
Clint Morgan wrote:
> Responses inline:
>
> 2008/5/27 Bryan Duxbury <br...@rapleaf.com>:
>   
>> It seems like if you wanted to do some manner of multi-row transactional
>> put, the only real way to manage it is with deletes. That is, if the first
>> put succeeds but the second fails, you can "invert" the first put into a
>> bunch of deletes.
>>     
>
> Yes, this is what I was thinking by using the timestamp/multiple
> versions. To roll back you delete everything you wrote and then we get
> back to the previous version. Alternatively you could save the
> original values before they are overwritten.
>   

Deletes would be the way to go I'd say (what to do if we can't insert 
the delete for the very reason the transactions failing?).

We'd have to do a bit of work to support this case first though.  IIRC, 
deletes X-out cells of same timestamp when getting but when scanning, if 
we encounter a delete, it blocks being able to see whats behind the delete.

St.Ack

Re: Does HBase support single-row transaction?

Posted by Bryan Duxbury <br...@rapleaf.com>.
I see what you're saying. I need to think on this. Stack, care to  
weigh in?

-Bryan

On May 27, 2008, at 1:56 PM, Clint Morgan wrote:

> Responses inline:
>
> 2008/5/27 Bryan Duxbury <br...@rapleaf.com>:
>> It seems like if you wanted to do some manner of multi-row  
>> transactional
>> put, the only real way to manage it is with deletes. That is, if  
>> the first
>> put succeeds but the second fails, you can "invert" the first put  
>> into a
>> bunch of deletes.
>
> Yes, this is what I was thinking by using the timestamp/multiple
> versions. To roll back you delete everything you wrote and then we get
> back to the previous version. Alternatively you could save the
> original values before they are overwritten.
>
>> Trying to make the regions themselves maintain the transactional  
>> state seems
>> like a terrible idea. You'd have to not allow a region to get  
>> migrated to
>> another server if it's serving a transaction. This would introduce  
>> a lot of
>> potential performance problems, I think.
>
> I'm envisioning transactions being relatively short-lived: 100 ms to a
> few seconds. I don't see this getting in the way of eg region
> migration any more than scanners do. But maybe I'm missing something.
>
> So the transactional state for a region is (roughly) a transaction
> lease, and a collection of the corresponding BatchUpdates.
>
>> Can you help me understand why atomic transactions are needed?  
>> Can't the
>> atomicity problems be sort of resolved by the whole row versioning  
>> thing?
>
> Simply, we need to ensure that all updates happen together. Otherwise,
> the data is in an inconsistent state. Take the standard example of
> debiting one account and crediting another. If only one of these rows
> gets updated, then the resulting table is corrupted and will not make
> sense to the application. (Money has been created or destroyed)
>
> So that is why one needs atomicity: the application-level semantics  
> demand it.
>
> When we encounter an exception midway through the transaction, we can
> recover the old state of the modified row(s) by reverting to the
> previous version. So the question is who recognizes this and does the
> rollback? I'd like hbase to do it because it seems like a logical
> place to put the behavior. So if the client crashed halfway through
> the transaction, then when his transaction lease expires, hbase will
> revert the relevant BatchUpdates. And the integrity of our table is
> preserved!
>
>> Other databases that do transactions and rollbacks use versioning to
>> accomplish that, I think.
>
> I don't know much about this. But however other (R)DBMS implement it,
> it is provided as a primitive rather than implemented on top of
> underlying versioning functionality (by users). This way the database
> will maintain the consistency rather than the user having to recognize
> problems and revert the state itself.
>
> -clint


Re: Does HBase support single-row transaction?

Posted by Clint Morgan <cl...@gmail.com>.
Responses inline:

2008/5/27 Bryan Duxbury <br...@rapleaf.com>:
> It seems like if you wanted to do some manner of multi-row transactional
> put, the only real way to manage it is with deletes. That is, if the first
> put succeeds but the second fails, you can "invert" the first put into a
> bunch of deletes.

Yes, this is what I was thinking by using the timestamp/multiple
versions. To roll back you delete everything you wrote and then we get
back to the previous version. Alternatively you could save the
original values before they are overwritten.

> Trying to make the regions themselves maintain the transactional state seems
> like a terrible idea. You'd have to not allow a region to get migrated to
> another server if it's serving a transaction. This would introduce a lot of
> potential performance problems, I think.

I'm envisioning transactions being relatively short-lived: 100 ms to a
few seconds. I don't see this getting in the way of eg region
migration any more than scanners do. But maybe I'm missing something.

So the transactional state for a region is (roughly) a transaction
lease, and a collection of the corresponding BatchUpdates.

> Can you help me understand why atomic transactions are needed? Can't the
> atomicity problems be sort of resolved by the whole row versioning thing?

Simply, we need to ensure that all updates happen together. Otherwise,
the data is in an inconsistent state. Take the standard example of
debiting one account and crediting another. If only one of these rows
gets updated, then the resulting table is corrupted and will not make
sense to the application. (Money has been created or destroyed)

So that is why one needs atomicity: the application-level semantics demand it.

When we encounter an exception midway through the transaction, we can
recover the old state of the modified row(s) by reverting to the
previous version. So the question is who recognizes this and does the
rollback? I'd like hbase to do it because it seems like a logical
place to put the behavior. So if the client crashed halfway through
the transaction, then when his transaction lease expires, hbase will
revert the relevant BatchUpdates. And the integrity of our table is
preserved!

> Other databases that do transactions and rollbacks use versioning to
> accomplish that, I think.

I don't know much about this. But however other (R)DBMS implement it,
it is provided as a primitive rather than implemented on top of
underlying versioning functionality (by users). This way the database
will maintain the consistency rather than the user having to recognize
problems and revert the state itself.

-clint

Re: Does HBase support single-row transaction?

Posted by Bryan Duxbury <br...@rapleaf.com>.
It seems like if you wanted to do some manner of multi-row  
transactional put, the only real way to manage it is with deletes.  
That is, if the first put succeeds but the second fails, you can  
"invert" the first put into a bunch of deletes.

Trying to make the regions themselves maintain the transactional  
state seems like a terrible idea. You'd have to not allow a region to  
get migrated to another server if it's serving a transaction. This  
would introduce a lot of potential performance problems, I think.

Can you help me understand why atomic transactions are needed? Can't  
the atomicity problems be sort of resolved by the whole row  
versioning thing? Other databases that do transactions and rollbacks  
use versioning to accomplish that, I think.

-Bryan

On May 27, 2008, at 12:29 PM, Clint Morgan wrote:

> Zookeeper makes good sense for distributed locking to get isolation.
> But we still need transaction start, commit, and rollback to get
> atomicity. I think this properly belongs in hbase.
>
> So suppose I want to read two rows, and then update them as an
> isolated, atomic action:
>
> try {
>   getZookeeperLock(table)
>   tranId = table.beginTransaction();
>   row1 = table.get() // Normal get, but isolated due to distributed  
> lock
>   row2 = table.get()
>   BatchUpdate b1 = new BatchUpdate(row1)
>   b1.put(...)
>   table.addUpdate(tranId, b1);
>   BatchUpdate b2 = new BatchUpdate(row2)
>   b2.put(...);
>   table.addUpdate(tranId, b2);
>   table.commit(tranId);
> } catch(Exception e) {
>   table.rollback(tranId);
> } finally {
>   releaseZookeeperLock(table)
> }
>
> So then on the hbase side we hold on to the batchUpdates until the
> table.commit is called. Then we roll through and apply the updates.
>
> I'm sure rollback()/commit() is tricky to implement, as the updates
> could be on different region servers, so we need a failure on one to
> trigger a rollback on others. We could use timestamp/old versions to
> implement rollback on batchUpdates we have already applied.
>
> Alternatively, this may all be implemented above hbase. The client
> keeps track of updates, and trys to roll back using timestamps.
> Problem here is if the client dies midway through we have half the
> transaction committed and loose atomicity/consistency.
>
> We will eventually want/need atomic transactions on hbase, so I'll
> look into this further. Any input would be appreciated. Would be
> interesting to know how/what google provides...
>
> cheers,
> -clint
>
>
> On Sun, May 11, 2008 at 7:48 AM, Bryan Duxbury <br...@rapleaf.com>  
> wrote:
>> Currently, it's not on our list of things to do. There are a  
>> number of
>> reasons why it would be better to use Zookeeper here than to try  
>> and build
>> it into HBase.
>>
>> That said, I think you could get everything you need if you tried  
>> Zookeeper,
>> using that to acquire locks on the row you need a transaction on.  
>> It's
>> supposedly very high performance and supports your use case  
>> precisely.
>>
>> -Bryan
>>
>> On May 10, 2008, at 11:52 PM, Zhou Wei wrote:
>>
>>> Bryan Duxbury 写道:
>>>>
>>>> startUpdate is deprecated in TRUNK. Also, it doesn't do what you  
>>>> are
>>>> thinking it does. Committing a BatchUpdate is atomic across the  
>>>> whole row,
>>>> however. There is currently no way to make a get and a commit  
>>>> transactional,
>>>> though there is an issue open for write-if-not-modified-since  
>>>> support. If
>>>> this is something you need we can talk about how it might be  
>>>> supported.
>>>
>>> Thanks for answering my questions.
>>>
>>> So currently HBase is not suitable for transactional web  
>>> applications.
>>> A simple counting transaction can not work by concurrent accesses:
>>> transaction{
>>> get(x);
>>> x++;
>>> write(x);
>>> }
>>>
>>> In my opinion, "write-if-not-modified-since" support may not be  
>>> the best
>>> idea of implement single-row transaction.
>>> Because if write can not be performed, application has to try  
>>> again and
>>> again, or just return error and leave user to choose again or abort.
>>> Probably locking, waiting and scheduling at region server might be
>>> preferable in this case.
>>> Is the single-row transaction feature currently in the roadmap of  
>>> HBase?
>>>
>>> Zhou
>>>>
>>>> -Bryan
>>>>
>>>> On May 7, 2008, at 7:48 PM, Zhou Wei wrote:
>>>>
>>>>> Hi
>>>>> Does HBase support single-row transaction as described in Bigtable
>>>>> paper?
>>>>>
>>>>> "Bigtable supports single-row transactions, which can be
>>>>> used to perform atomic read-modify-write sequences on
>>>>> data stored under a single row key." --Bigtable paper
>>>>>
>>>>> If so, how can I define a transaction in HBase,
>>>>> is it looks like this:
>>>>>
>>>>> lid=startUpdate
>>>>> get(lid)
>>>>> ..
>>>>> put(lid)
>>>>> ...
>>>>> commit(lid)
>>>>>
>>>>> Are these transactions isolated with each other?
>>>>> If not, is there a way to achieve that?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Zhou
>>>>
>>>>
>>>>
>>>
>>
>>


Re: Does HBase support single-row transaction?

Posted by Clint Morgan <cl...@gmail.com>.
Zookeeper makes good sense for distributed locking to get isolation.
But we still need transaction start, commit, and rollback to get
atomicity. I think this properly belongs in hbase.

So suppose I want to read two rows, and then update them as an
isolated, atomic action:

try {
  getZookeeperLock(table)
  tranId = table.beginTransaction();
  row1 = table.get() // Normal get, but isolated due to distributed lock
  row2 = table.get()
  BatchUpdate b1 = new BatchUpdate(row1)
  b1.put(...)
  table.addUpdate(tranId, b1);
  BatchUpdate b2 = new BatchUpdate(row2)
  b2.put(...);
  table.addUpdate(tranId, b2);
  table.commit(tranId);
} catch(Exception e) {
  table.rollback(tranId);
} finally {
  releaseZookeeperLock(table)
}

So then on the hbase side we hold on to the batchUpdates until the
table.commit is called. Then we roll through and apply the updates.

I'm sure rollback()/commit() is tricky to implement, as the updates
could be on different region servers, so we need a failure on one to
trigger a rollback on others. We could use timestamp/old versions to
implement rollback on batchUpdates we have already applied.

Alternatively, this may all be implemented above hbase. The client
keeps track of updates, and trys to roll back using timestamps.
Problem here is if the client dies midway through we have half the
transaction committed and loose atomicity/consistency.

We will eventually want/need atomic transactions on hbase, so I'll
look into this further. Any input would be appreciated. Would be
interesting to know how/what google provides...

cheers,
-clint


On Sun, May 11, 2008 at 7:48 AM, Bryan Duxbury <br...@rapleaf.com> wrote:
> Currently, it's not on our list of things to do. There are a number of
> reasons why it would be better to use Zookeeper here than to try and build
> it into HBase.
>
> That said, I think you could get everything you need if you tried Zookeeper,
> using that to acquire locks on the row you need a transaction on. It's
> supposedly very high performance and supports your use case precisely.
>
> -Bryan
>
> On May 10, 2008, at 11:52 PM, Zhou Wei wrote:
>
>> Bryan Duxbury 写道:
>>>
>>> startUpdate is deprecated in TRUNK. Also, it doesn't do what you are
>>> thinking it does. Committing a BatchUpdate is atomic across the whole row,
>>> however. There is currently no way to make a get and a commit transactional,
>>> though there is an issue open for write-if-not-modified-since support. If
>>> this is something you need we can talk about how it might be supported.
>>
>> Thanks for answering my questions.
>>
>> So currently HBase is not suitable for transactional web applications.
>> A simple counting transaction can not work by concurrent accesses:
>> transaction{
>> get(x);
>> x++;
>> write(x);
>> }
>>
>> In my opinion, "write-if-not-modified-since" support may not be the best
>> idea of implement single-row transaction.
>> Because if write can not be performed, application has to try again and
>> again, or just return error and leave user to choose again or abort.
>> Probably locking, waiting and scheduling at region server might be
>> preferable in this case.
>> Is the single-row transaction feature currently in the roadmap of HBase?
>>
>> Zhou
>>>
>>> -Bryan
>>>
>>> On May 7, 2008, at 7:48 PM, Zhou Wei wrote:
>>>
>>>> Hi
>>>> Does HBase support single-row transaction as described in Bigtable
>>>> paper?
>>>>
>>>> "Bigtable supports single-row transactions, which can be
>>>> used to perform atomic read-modify-write sequences on
>>>> data stored under a single row key." --Bigtable paper
>>>>
>>>> If so, how can I define a transaction in HBase,
>>>> is it looks like this:
>>>>
>>>> lid=startUpdate
>>>> get(lid)
>>>> ..
>>>> put(lid)
>>>> ...
>>>> commit(lid)
>>>>
>>>> Are these transactions isolated with each other?
>>>> If not, is there a way to achieve that?
>>>>
>>>> Thanks
>>>>
>>>> Zhou
>>>
>>>
>>>
>>
>
>

Re: Does HBase support single-row transaction?

Posted by Zhou Wei <zh...@mails.tsinghua.edu.cn>.
Bryan Duxbury wrote:
> Currently, it's not on our list of things to do. There are a number of 
> reasons why it would be better to use Zookeeper here than to try and 
> build it into HBase.
>
> That said, I think you could get everything you need if you tried 
> Zookeeper, using that to acquire locks on the row you need a 
> transaction on. It's supposedly very high performance and supports 
> your use case precisely.
>
> -Bryan

Thanks.
>
> On May 10, 2008, at 11:52 PM, Zhou Wei wrote:
>
>> Bryan Duxbury 写道:
>>> startUpdate is deprecated in TRUNK. Also, it doesn't do what you are 
>>> thinking it does. Committing a BatchUpdate is atomic across the 
>>> whole row, however. There is currently no way to make a get and a 
>>> commit transactional, though there is an issue open for 
>>> write-if-not-modified-since support. If this is something you need 
>>> we can talk about how it might be supported.
>
>
>
>



Re: Does HBase support single-row transaction?

Posted by Bryan Duxbury <br...@rapleaf.com>.
Currently, it's not on our list of things to do. There are a number  
of reasons why it would be better to use Zookeeper here than to try  
and build it into HBase.

That said, I think you could get everything you need if you tried  
Zookeeper, using that to acquire locks on the row you need a  
transaction on. It's supposedly very high performance and supports  
your use case precisely.

-Bryan

On May 10, 2008, at 11:52 PM, Zhou Wei wrote:

> Bryan Duxbury 写道:
>> startUpdate is deprecated in TRUNK. Also, it doesn't do what you  
>> are thinking it does. Committing a BatchUpdate is atomic across  
>> the whole row, however. There is currently no way to make a get  
>> and a commit transactional, though there is an issue open for  
>> write-if-not-modified-since support. If this is something you need  
>> we can talk about how it might be supported.
> Thanks for answering my questions.
>
> So currently HBase is not suitable for transactional web applications.
> A simple counting transaction can not work by concurrent accesses:
> transaction{
> get(x);
> x++;
> write(x);
> }
>
> In my opinion, "write-if-not-modified-since" support may not be the  
> best idea of implement single-row transaction.
> Because if write can not be performed, application has to try again  
> and again, or just return error and leave user to choose again or  
> abort.
> Probably locking, waiting and scheduling at region server might be  
> preferable in this case.
> Is the single-row transaction feature currently in the roadmap of  
> HBase?
>
> Zhou
>>
>> -Bryan
>>
>> On May 7, 2008, at 7:48 PM, Zhou Wei wrote:
>>
>>> Hi
>>> Does HBase support single-row transaction as described in  
>>> Bigtable paper?
>>>
>>> "Bigtable supports single-row transactions, which can be
>>> used to perform atomic read-modify-write sequences on
>>> data stored under a single row key." --Bigtable paper
>>>
>>> If so, how can I define a transaction in HBase,
>>> is it looks like this:
>>>
>>> lid=startUpdate
>>> get(lid)
>>> ..
>>> put(lid)
>>> ...
>>> commit(lid)
>>>
>>> Are these transactions isolated with each other?
>>> If not, is there a way to achieve that?
>>>
>>> Thanks
>>>
>>> Zhou
>>
>>
>>
>


Re: Does HBase support single-row transaction?

Posted by Zhou Wei <zh...@mails.tsinghua.edu.cn>.
Bryan Duxbury 写道:
> startUpdate is deprecated in TRUNK. Also, it doesn't do what you are 
> thinking it does. Committing a BatchUpdate is atomic across the whole 
> row, however. There is currently no way to make a get and a commit 
> transactional, though there is an issue open for 
> write-if-not-modified-since support. If this is something you need we 
> can talk about how it might be supported.
Thanks for answering my questions.

So currently HBase is not suitable for transactional web applications.
A simple counting transaction can not work by concurrent accesses:
transaction{
get(x);
x++;
write(x);
}

In my opinion, "write-if-not-modified-since" support may not be the best 
idea of implement single-row transaction.
Because if write can not be performed, application has to try again and 
again, or just return error and leave user to choose again or abort.
Probably locking, waiting and scheduling at region server might be 
preferable in this case.
Is the single-row transaction feature currently in the roadmap of HBase?

Zhou
>
> -Bryan
>
> On May 7, 2008, at 7:48 PM, Zhou Wei wrote:
>
>> Hi
>> Does HBase support single-row transaction as described in Bigtable 
>> paper?
>>
>> "Bigtable supports single-row transactions, which can be
>> used to perform atomic read-modify-write sequences on
>> data stored under a single row key." --Bigtable paper
>>
>> If so, how can I define a transaction in HBase,
>> is it looks like this:
>>
>> lid=startUpdate
>> get(lid)
>> ..
>> put(lid)
>> ...
>> commit(lid)
>>
>> Are these transactions isolated with each other?
>> If not, is there a way to achieve that?
>>
>> Thanks
>>
>> Zhou
>
>
>


Re: Does HBase support single-row transaction?

Posted by Bryan Duxbury <br...@rapleaf.com>.
startUpdate is deprecated in TRUNK. Also, it doesn't do what you are  
thinking it does. Committing a BatchUpdate is atomic across the whole  
row, however. There is currently no way to make a get and a commit  
transactional, though there is an issue open for write-if-not- 
modified-since support. If this is something you need we can talk  
about how it might be supported.

-Bryan

On May 7, 2008, at 7:48 PM, Zhou Wei wrote:

> Hi
> Does HBase support single-row transaction as described in Bigtable  
> paper?
>
> "Bigtable supports single-row transactions, which can be
> used to perform atomic read-modify-write sequences on
> data stored under a single row key."  --Bigtable paper
>
> If so, how can I define a transaction in HBase,
> is it looks like this:
>
> lid=startUpdate
> get(lid)
> ..
> put(lid)
> ...
> commit(lid)
>
> Are these transactions isolated with each other?
> If not, is there a way to achieve that?
>
> Thanks
>
> Zhou


Does HBase support single-row transaction?

Posted by Zhou Wei <zh...@mails.tsinghua.edu.cn>.
Hi
Does HBase support single-row transaction as described in Bigtable paper?

"Bigtable supports single-row transactions, which can be
used to perform atomic read-modify-write sequences on
data stored under a single row key."  --Bigtable paper

If so, how can I define a transaction in HBase,
is it looks like this:

lid=startUpdate
get(lid)
..
put(lid)
...
commit(lid)

Are these transactions isolated with each other?
If not, is there a way to achieve that?

Thanks

Zhou

Re: hbase on ec2 with s3 anyone?

Posted by stack <st...@duboce.net>.
HBase 0.1.2 is pegged against 0.16.3, not hadoop 0.17.0.  I don't think 
the two will work together.

Also, be sure to pick up the new 0.1.2 candidate (trying to put it up 
now).  May improvements over the first candidate.

St.Ack


Jim R. Wilson wrote:
>>  you would need to build new images anyway since you need HBase installed
>> and started at boot time.
>>     
>
> Right-o.  I'll be setting up a local dev environment to build fresh
> fedora-8 AMI's with hadoop-0.17-pre and hbase-0.1.2-pre starting in
> the morning.
>
> Of course, I'll probably be creating an ec2 contrib dir in hbase with
> hbase-init script etc.  I'm tempted to try and make my image
> multi-versioned so that it has both the latest releases hadoop/hbase
> as well as pre-released versions,  then make it up to user-data params
> to decide which will be started.
>
> The idea of course being that the running the the latest bleeding edge
> hbase/hadoop or latest stable could be done with the same image.
>
> -- Jim
>
> On Wed, May 7, 2008 at 5:22 PM, Chris K Wensel <ch...@wensel.net> wrote:
>   
>> sorry if I wasn't clear that the new scripts and old scripts were not
>> compatible. thus their being in 0.17.0, not 0.16.4.
>>
>>  you would need to build new images anyway since you need HBase installed
>> and started at boot time.
>>
>>
>>
>>
>>  On May 7, 2008, at 3:15 PM, Jim R. Wilson wrote:
>>
>>
>>     
>>> I've come to the conclusion that using the contrib/ec2 scripts from
>>> hadoop 0.17 is incompatible with the prebuilt hadoop-0.16.1 image
>>> currently available in the hadoop-ec2-images bucket (ami-461df82f
>>> hadoop-ec2-images/hadoop-0.16.1.manifest.xml to be precise).
>>>
>>> The problem is that the user-data passed in by 0.17 has a different
>>> format than what is expected by the hadoop-init script packaged with
>>> 0.16.1.  Specifically, 0.17's user -data is meant to be a comma
>>> delimited list of bash var settings of the form KEY=VAL, whereas
>>> 0.16.x seems to expect just a comma delimited list of values whose
>>> keys are known by their ordinal placement (that is, the first value is
>>> the number of instances, the second value is the name of the master
>>> node).
>>>
>>> So now I'm back to the idea that I'm going to have to build myself an
>>> ec2 AMI with hadoop 0.17 from "scratch" (using the create-instance
>>> scripts of course).  This isn't /too/ much more work than I'd have to
>>> do anyway.  I plan on running hbase on my cluster as well as python
>>> hadoop-streaming jobs which was going to require other libraries (like
>>> SQLAlchemy and Thrift).  These items were going to necessitate
>>> creating my own images anyway :/
>>>
>>> -- Jim
>>>
>>> On Wed, May 7, 2008 at 3:51 PM, Jim R. Wilson <wi...@gmail.com>
>>>       
>> wrote:
>>     
>>>>> keep the questions coming. will be glad to see HBase running on ec2,
>>>>>           
>> maybe
>>     
>>>>> we can put your changes back into the tree.
>>>>>
>>>>>           
>>>> Thanks :)
>>>>
>>>> Just prior to this excursion, I found some things needing tweaking to
>>>> work with my hosting provider.  Now that I'm moving to ec2, I'll be on
>>>> the lookout for similar issues.  I'll submit any patches I end up
>>>> making.
>>>>
>>>> -- Jim
>>>>
>>>>
>>>>
>>>> On Wed, May 7, 2008 at 3:39 PM, Chris K Wensel <ch...@wensel.net> wrote:
>>>>
>>>>         
>>>>>           
>>>>>> In the aforementioned EC2 wiki page, it has a configuration section
>>>>>> labeled "(Pre 0.17) Hadoop cluster variables (GROUP, MASTER_HOST,
>>>>>> NO_INSTANCES)".  I had assumed that I would actually need
>>>>>>             
>> hadoop-0.17
>>     
>>>>>> on my ec2 instances in order to forgo those instructions.  It sounds
>>>>>> like you're telling me that all the "pre 0.17" or "0.17" specific
>>>>>> instructions in the wiki page refer only to the ec2 creation
>>>>>>             
>> scripts,
>>     
>>>>>> not the actual running version in the cluster, is that correct?
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> correct.
>>>>>
>>>>> the only reason these scripts are in the 0.17 branch is because they
>>>>>           
>> are
>>     
>>>>> not backward compatible with themselves.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Also, how safe is it to run a different version of the ec2 scripts
>>>>>> from the actual running hadoop instance?  I'm guessing it's pretty
>>>>>> safe since you suggested it :)
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> there are no dependencies between the EC2 scripts and Hadoop core. you
>>>>>           
>> can
>>     
>>>>> use them with any version, as long as you build EC2 Images for the
>>>>>           
>> versions
>>     
>>>>> of Hadoop you are after with the 'new' scripts.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Thanks again for all the help - still wrapping my mind around this
>>>>>>             
>> stuff.
>>     
>>>>>>
>>>>>>             
>>>>> keep the questions coming. will be glad to see HBase running on ec2,
>>>>>           
>> maybe
>>     
>>>>> we can put your changes back into the tree.
>>>>>
>>>>>
>>>>>
>>>>> Chris K Wensel
>>>>> chris@wensel.net
>>>>> http://chris.wensel.net/
>>>>> http://www.cascading.org/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>         
>>  Chris K Wensel
>>  chris@wensel.net
>>  http://chris.wensel.net/
>>  http://www.cascading.org/
>>
>>
>>
>>
>>
>>     


Re: hbase on ec2 with s3 anyone?

Posted by "Jim R. Wilson" <wi...@gmail.com>.
>  you would need to build new images anyway since you need HBase installed
> and started at boot time.

Right-o.  I'll be setting up a local dev environment to build fresh
fedora-8 AMI's with hadoop-0.17-pre and hbase-0.1.2-pre starting in
the morning.

Of course, I'll probably be creating an ec2 contrib dir in hbase with
hbase-init script etc.  I'm tempted to try and make my image
multi-versioned so that it has both the latest releases hadoop/hbase
as well as pre-released versions,  then make it up to user-data params
to decide which will be started.

The idea of course being that the running the the latest bleeding edge
hbase/hadoop or latest stable could be done with the same image.

-- Jim

On Wed, May 7, 2008 at 5:22 PM, Chris K Wensel <ch...@wensel.net> wrote:
> sorry if I wasn't clear that the new scripts and old scripts were not
> compatible. thus their being in 0.17.0, not 0.16.4.
>
>  you would need to build new images anyway since you need HBase installed
> and started at boot time.
>
>
>
>
>  On May 7, 2008, at 3:15 PM, Jim R. Wilson wrote:
>
>
> > I've come to the conclusion that using the contrib/ec2 scripts from
> > hadoop 0.17 is incompatible with the prebuilt hadoop-0.16.1 image
> > currently available in the hadoop-ec2-images bucket (ami-461df82f
> > hadoop-ec2-images/hadoop-0.16.1.manifest.xml to be precise).
> >
> > The problem is that the user-data passed in by 0.17 has a different
> > format than what is expected by the hadoop-init script packaged with
> > 0.16.1.  Specifically, 0.17's user -data is meant to be a comma
> > delimited list of bash var settings of the form KEY=VAL, whereas
> > 0.16.x seems to expect just a comma delimited list of values whose
> > keys are known by their ordinal placement (that is, the first value is
> > the number of instances, the second value is the name of the master
> > node).
> >
> > So now I'm back to the idea that I'm going to have to build myself an
> > ec2 AMI with hadoop 0.17 from "scratch" (using the create-instance
> > scripts of course).  This isn't /too/ much more work than I'd have to
> > do anyway.  I plan on running hbase on my cluster as well as python
> > hadoop-streaming jobs which was going to require other libraries (like
> > SQLAlchemy and Thrift).  These items were going to necessitate
> > creating my own images anyway :/
> >
> > -- Jim
> >
> > On Wed, May 7, 2008 at 3:51 PM, Jim R. Wilson <wi...@gmail.com>
> wrote:
> >
> > >
> > > > keep the questions coming. will be glad to see HBase running on ec2,
> maybe
> > > > we can put your changes back into the tree.
> > > >
> > >
> > > Thanks :)
> > >
> > > Just prior to this excursion, I found some things needing tweaking to
> > > work with my hosting provider.  Now that I'm moving to ec2, I'll be on
> > > the lookout for similar issues.  I'll submit any patches I end up
> > > making.
> > >
> > > -- Jim
> > >
> > >
> > >
> > > On Wed, May 7, 2008 at 3:39 PM, Chris K Wensel <ch...@wensel.net> wrote:
> > >
> > > >
> > > >
> > > > > In the aforementioned EC2 wiki page, it has a configuration section
> > > > > labeled "(Pre 0.17) Hadoop cluster variables (GROUP, MASTER_HOST,
> > > > > NO_INSTANCES)".  I had assumed that I would actually need
> hadoop-0.17
> > > > > on my ec2 instances in order to forgo those instructions.  It sounds
> > > > > like you're telling me that all the "pre 0.17" or "0.17" specific
> > > > > instructions in the wiki page refer only to the ec2 creation
> scripts,
> > > > > not the actual running version in the cluster, is that correct?
> > > > >
> > > > >
> > > > >
> > > >
> > > > correct.
> > > >
> > > > the only reason these scripts are in the 0.17 branch is because they
> are
> > > > not backward compatible with themselves.
> > > >
> > > >
> > > >
> > > >
> > > > > Also, how safe is it to run a different version of the ec2 scripts
> > > > > from the actual running hadoop instance?  I'm guessing it's pretty
> > > > > safe since you suggested it :)
> > > > >
> > > > >
> > > > >
> > > >
> > > > there are no dependencies between the EC2 scripts and Hadoop core. you
> can
> > > > use them with any version, as long as you build EC2 Images for the
> versions
> > > > of Hadoop you are after with the 'new' scripts.
> > > >
> > > >
> > > >
> > > >
> > > > > Thanks again for all the help - still wrapping my mind around this
> stuff.
> > > > >
> > > > >
> > > > >
> > > >
> > > > keep the questions coming. will be glad to see HBase running on ec2,
> maybe
> > > > we can put your changes back into the tree.
> > > >
> > > >
> > > >
> > > > Chris K Wensel
> > > > chris@wensel.net
> > > > http://chris.wensel.net/
> > > > http://www.cascading.org/
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
>
>  Chris K Wensel
>  chris@wensel.net
>  http://chris.wensel.net/
>  http://www.cascading.org/
>
>
>
>
>

Re: hbase on ec2 with s3 anyone?

Posted by Chris K Wensel <ch...@wensel.net>.
sorry if I wasn't clear that the new scripts and old scripts were not  
compatible. thus their being in 0.17.0, not 0.16.4.

you would need to build new images anyway since you need HBase  
installed and started at boot time.


On May 7, 2008, at 3:15 PM, Jim R. Wilson wrote:

> I've come to the conclusion that using the contrib/ec2 scripts from
> hadoop 0.17 is incompatible with the prebuilt hadoop-0.16.1 image
> currently available in the hadoop-ec2-images bucket (ami-461df82f
> hadoop-ec2-images/hadoop-0.16.1.manifest.xml to be precise).
>
> The problem is that the user-data passed in by 0.17 has a different
> format than what is expected by the hadoop-init script packaged with
> 0.16.1.  Specifically, 0.17's user -data is meant to be a comma
> delimited list of bash var settings of the form KEY=VAL, whereas
> 0.16.x seems to expect just a comma delimited list of values whose
> keys are known by their ordinal placement (that is, the first value is
> the number of instances, the second value is the name of the master
> node).
>
> So now I'm back to the idea that I'm going to have to build myself an
> ec2 AMI with hadoop 0.17 from "scratch" (using the create-instance
> scripts of course).  This isn't /too/ much more work than I'd have to
> do anyway.  I plan on running hbase on my cluster as well as python
> hadoop-streaming jobs which was going to require other libraries (like
> SQLAlchemy and Thrift).  These items were going to necessitate
> creating my own images anyway :/
>
> -- Jim
>
> On Wed, May 7, 2008 at 3:51 PM, Jim R. Wilson  
> <wi...@gmail.com> wrote:
>>> keep the questions coming. will be glad to see HBase running on  
>>> ec2, maybe
>>> we can put your changes back into the tree.
>>
>> Thanks :)
>>
>> Just prior to this excursion, I found some things needing tweaking to
>> work with my hosting provider.  Now that I'm moving to ec2, I'll be  
>> on
>> the lookout for similar issues.  I'll submit any patches I end up
>> making.
>>
>> -- Jim
>>
>>
>>
>> On Wed, May 7, 2008 at 3:39 PM, Chris K Wensel <ch...@wensel.net>  
>> wrote:
>>>
>>>> In the aforementioned EC2 wiki page, it has a configuration section
>>>> labeled "(Pre 0.17) Hadoop cluster variables (GROUP, MASTER_HOST,
>>>> NO_INSTANCES)".  I had assumed that I would actually need  
>>>> hadoop-0.17
>>>> on my ec2 instances in order to forgo those instructions.  It  
>>>> sounds
>>>> like you're telling me that all the "pre 0.17" or "0.17" specific
>>>> instructions in the wiki page refer only to the ec2 creation  
>>>> scripts,
>>>> not the actual running version in the cluster, is that correct?
>>>>
>>>>
>>>
>>> correct.
>>>
>>> the only reason these scripts are in the 0.17 branch is because  
>>> they are
>>> not backward compatible with themselves.
>>>
>>>
>>>
>>>> Also, how safe is it to run a different version of the ec2 scripts
>>>> from the actual running hadoop instance?  I'm guessing it's pretty
>>>> safe since you suggested it :)
>>>>
>>>>
>>>
>>> there are no dependencies between the EC2 scripts and Hadoop core.  
>>> you can
>>> use them with any version, as long as you build EC2 Images for the  
>>> versions
>>> of Hadoop you are after with the 'new' scripts.
>>>
>>>
>>>
>>>> Thanks again for all the help - still wrapping my mind around  
>>>> this stuff.
>>>>
>>>>
>>>
>>> keep the questions coming. will be glad to see HBase running on  
>>> ec2, maybe
>>> we can put your changes back into the tree.
>>>
>>>
>>>
>>> Chris K Wensel
>>> chris@wensel.net
>>> http://chris.wensel.net/
>>> http://www.cascading.org/
>>>
>>>
>>>
>>>
>>>
>>

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/
http://www.cascading.org/





Re: hbase on ec2 with s3 anyone?

Posted by "Jim R. Wilson" <wi...@gmail.com>.
I've come to the conclusion that using the contrib/ec2 scripts from
hadoop 0.17 is incompatible with the prebuilt hadoop-0.16.1 image
currently available in the hadoop-ec2-images bucket (ami-461df82f
hadoop-ec2-images/hadoop-0.16.1.manifest.xml to be precise).

The problem is that the user-data passed in by 0.17 has a different
format than what is expected by the hadoop-init script packaged with
0.16.1.  Specifically, 0.17's user -data is meant to be a comma
delimited list of bash var settings of the form KEY=VAL, whereas
0.16.x seems to expect just a comma delimited list of values whose
keys are known by their ordinal placement (that is, the first value is
the number of instances, the second value is the name of the master
node).

So now I'm back to the idea that I'm going to have to build myself an
ec2 AMI with hadoop 0.17 from "scratch" (using the create-instance
scripts of course).  This isn't /too/ much more work than I'd have to
do anyway.  I plan on running hbase on my cluster as well as python
hadoop-streaming jobs which was going to require other libraries (like
SQLAlchemy and Thrift).  These items were going to necessitate
creating my own images anyway :/

-- Jim

On Wed, May 7, 2008 at 3:51 PM, Jim R. Wilson <wi...@gmail.com> wrote:
> >  keep the questions coming. will be glad to see HBase running on ec2, maybe
>  > we can put your changes back into the tree.
>
>  Thanks :)
>
>  Just prior to this excursion, I found some things needing tweaking to
>  work with my hosting provider.  Now that I'm moving to ec2, I'll be on
>  the lookout for similar issues.  I'll submit any patches I end up
>  making.
>
>  -- Jim
>
>
>
>  On Wed, May 7, 2008 at 3:39 PM, Chris K Wensel <ch...@wensel.net> wrote:
>  >
>  > > In the aforementioned EC2 wiki page, it has a configuration section
>  > > labeled "(Pre 0.17) Hadoop cluster variables (GROUP, MASTER_HOST,
>  > > NO_INSTANCES)".  I had assumed that I would actually need hadoop-0.17
>  > > on my ec2 instances in order to forgo those instructions.  It sounds
>  > > like you're telling me that all the "pre 0.17" or "0.17" specific
>  > > instructions in the wiki page refer only to the ec2 creation scripts,
>  > > not the actual running version in the cluster, is that correct?
>  > >
>  > >
>  >
>  >  correct.
>  >
>  >  the only reason these scripts are in the 0.17 branch is because they are
>  > not backward compatible with themselves.
>  >
>  >
>  >
>  > > Also, how safe is it to run a different version of the ec2 scripts
>  > > from the actual running hadoop instance?  I'm guessing it's pretty
>  > > safe since you suggested it :)
>  > >
>  > >
>  >
>  >  there are no dependencies between the EC2 scripts and Hadoop core. you can
>  > use them with any version, as long as you build EC2 Images for the versions
>  > of Hadoop you are after with the 'new' scripts.
>  >
>  >
>  >
>  > > Thanks again for all the help - still wrapping my mind around this stuff.
>  > >
>  > >
>  >
>  >  keep the questions coming. will be glad to see HBase running on ec2, maybe
>  > we can put your changes back into the tree.
>  >
>  >
>  >
>  >  Chris K Wensel
>  >  chris@wensel.net
>  >  http://chris.wensel.net/
>  >  http://www.cascading.org/
>  >
>  >
>  >
>  >
>  >
>

Re: hbase on ec2 with s3 anyone?

Posted by "Jim R. Wilson" <wi...@gmail.com>.
>  keep the questions coming. will be glad to see HBase running on ec2, maybe
> we can put your changes back into the tree.

Thanks :)

Just prior to this excursion, I found some things needing tweaking to
work with my hosting provider.  Now that I'm moving to ec2, I'll be on
the lookout for similar issues.  I'll submit any patches I end up
making.

-- Jim

On Wed, May 7, 2008 at 3:39 PM, Chris K Wensel <ch...@wensel.net> wrote:
>
> > In the aforementioned EC2 wiki page, it has a configuration section
> > labeled "(Pre 0.17) Hadoop cluster variables (GROUP, MASTER_HOST,
> > NO_INSTANCES)".  I had assumed that I would actually need hadoop-0.17
> > on my ec2 instances in order to forgo those instructions.  It sounds
> > like you're telling me that all the "pre 0.17" or "0.17" specific
> > instructions in the wiki page refer only to the ec2 creation scripts,
> > not the actual running version in the cluster, is that correct?
> >
> >
>
>  correct.
>
>  the only reason these scripts are in the 0.17 branch is because they are
> not backward compatible with themselves.
>
>
>
> > Also, how safe is it to run a different version of the ec2 scripts
> > from the actual running hadoop instance?  I'm guessing it's pretty
> > safe since you suggested it :)
> >
> >
>
>  there are no dependencies between the EC2 scripts and Hadoop core. you can
> use them with any version, as long as you build EC2 Images for the versions
> of Hadoop you are after with the 'new' scripts.
>
>
>
> > Thanks again for all the help - still wrapping my mind around this stuff.
> >
> >
>
>  keep the questions coming. will be glad to see HBase running on ec2, maybe
> we can put your changes back into the tree.
>
>
>
>  Chris K Wensel
>  chris@wensel.net
>  http://chris.wensel.net/
>  http://www.cascading.org/
>
>
>
>
>

Re: hbase on ec2 with s3 anyone?

Posted by Chris K Wensel <ch...@wensel.net>.
> In the aforementioned EC2 wiki page, it has a configuration section
> labeled "(Pre 0.17) Hadoop cluster variables (GROUP, MASTER_HOST,
> NO_INSTANCES)".  I had assumed that I would actually need hadoop-0.17
> on my ec2 instances in order to forgo those instructions.  It sounds
> like you're telling me that all the "pre 0.17" or "0.17" specific
> instructions in the wiki page refer only to the ec2 creation scripts,
> not the actual running version in the cluster, is that correct?
>

correct.

the only reason these scripts are in the 0.17 branch is because they  
are not backward compatible with themselves.

> Also, how safe is it to run a different version of the ec2 scripts
> from the actual running hadoop instance?  I'm guessing it's pretty
> safe since you suggested it :)
>

there are no dependencies between the EC2 scripts and Hadoop core. you  
can use them with any version, as long as you build EC2 Images for the  
versions of Hadoop you are after with the 'new' scripts.

> Thanks again for all the help - still wrapping my mind around this  
> stuff.
>

keep the questions coming. will be glad to see HBase running on ec2,  
maybe we can put your changes back into the tree.

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/
http://www.cascading.org/





Re: hbase on ec2 with s3 anyone?

Posted by "Jim R. Wilson" <wi...@gmail.com>.
I have 0.17 checked out, so I have the updated ec2 scripts, but I'm
confused about the setup.

In the aforementioned EC2 wiki page, it has a configuration section
labeled "(Pre 0.17) Hadoop cluster variables (GROUP, MASTER_HOST,
NO_INSTANCES)".  I had assumed that I would actually need hadoop-0.17
on my ec2 instances in order to forgo those instructions.  It sounds
like you're telling me that all the "pre 0.17" or "0.17" specific
instructions in the wiki page refer only to the ec2 creation scripts,
not the actual running version in the cluster, is that correct?

Also, how safe is it to run a different version of the ec2 scripts
from the actual running hadoop instance?  I'm guessing it's pretty
safe since you suggested it :)

Thanks again for all the help - still wrapping my mind around this stuff.

-- Jim

On Wed, May 7, 2008 at 2:49 PM, Chris K Wensel <ch...@wensel.net> wrote:
> you only need the contrib/ec2 scripts from 0.17. you don't need Hadoop
> 0.17.0.
>
>  just checkout the scripts and use them with whatever version of Hadoop you
> are most comfortable with (the version that works with HBase I expect).
>
>
>
>  On May 7, 2008, at 12:45 PM, Jim R. Wilson wrote:
>
>
> > Cool cool - thanks again Chris.
> >
> > I'm thinking I should use hadoop-0.17 instead of 0.16.3 at this time
> > because it appears 0.17 has better support for ec2 (less
> > configuration, no dyndns necessary etc).
> >
> > Is there a public directory somewhere which houses nightly branch
> > builds? or do I need to build 0.17 myself, then post it somewhere
> > (like s3) and have the script access that?
> >
> > -- Jim
> >
> > On Wed, May 7, 2008 at 2:27 PM, Chris K Wensel <ch...@wensel.net> wrote:
> >
> > > you do need the whole ec2 tree for the scripts to work...
> > >
> > >
> > >
> > > On May 7, 2008, at 12:25 PM, Jim R. Wilson wrote:
> > >
> > >
> > >
> > > > Nevermind, looks like I needed these:
> > > > ./src/contrib/ec2/bin/image/create-hadoop-image-remote
> > > > ./src/contrib/ec2/bin/create-hadoop-image
> > > >
> > > > -- Jim
> > > >
> > > > On Wed, May 7, 2008 at 2:23 PM, Jim R. Wilson <wi...@gmail.com>
> > > >
> > > wrote:
> > >
> > > >
> > > >
> > > > > Thanks Chris,
> > > > >
> > > > > Where do I get this supposed "image/create-hadoop-remote" script?  I
> > > > > couldn't `find` it anywhere within the hadoop svn tree, and the link
> > > > > in the hadoop wiki is broken :/
> > > > >
> > > > > -- Jim
> > > > >
> > > > >
> > > > >
> > > > > On Wed, May 7, 2008 at 2:04 PM, Chris K Wensel <ch...@wensel.net>
> wrote:
> > > > >
> > > > >
> > > > > > You don't need 0.17 to use the scripts mentioned in the EC2 wiki
> page.
> > > > > >
> > > > >
> > > >
> > > Just
> > >
> > > >
> > > > >
> > > > > > grab contrib/ec2 from the 0.17.0 branch.
> > > > > >
> > > > > > as for images, you will need to update the
> image/create-hadoop-remote
> > > > > >
> > > > >
> > > >
> > > bash
> > >
> > > >
> > > > >
> > > > > > script to download and install hbase.
> > > > > >
> > > > > > and update hadoop-init to start it with the proper properties.
> > > > > >
> > > > > > once you look at these scripts, it should be fairly obvious what
> you
> > > > > >
> > > > >
> > > >
> > > need
> > >
> > > >
> > > > >
> > > > > > to do.
> > > > > >
> > > > > > then just run 'create-image' command to stuff this new image into
> one
> > > > > >
> > > > >
> > > >
> > > of
> > >
> > > >
> > > > >
> > > > > > your buckets.
> > > > > >
> > > > > > enjoy
> > > > > > ckw
> > > > > >
> > > > > >
> > > > > >
> > > > > > On May 7, 2008, at 11:12 AM, Jim R. Wilson wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'm about to embark on a mystical journey through hosted
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > web-services
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > with my trusted friend hbase.  Here are some questions for my
> fellow
> > > > > > > travelers:
> > > > > > >
> > > > > > > 1) Has anyone done this before? If so, what lifesaving tips can
> you
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > offer?
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > 2) Should I attempt to build an hdfs out of ec2 persistent
> storage,
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > or
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > just use S3?
> > > > > > > 3) How many images will I need? Just one, or master/slave?
> > > > > > > 4) What version of hadoop/hbase should I use?  (The hadoop/ec2
> > > > > > > instructions[1] seem to favor the unreleased 0.17, but there
> doesn't
> > > > > > > seem to be a public image with 0.17 at the ready)
> > > > > > >
> > > > > > > Thanks in advance for any advice, I'm gearing up for quite a
> trip :)
> > > > > > >
> > > > > > > [1] http://wiki.apache.org/hadoop/AmazonEC2
> > > > > > >
> > > > > > > -- Jim R. Wilson (jimbojw)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Chris K Wensel
> > > > > > chris@wensel.net
> > > > > > http://chris.wensel.net/
> > > > > > http://www.cascading.org/
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > Chris K Wensel
> > > chris@wensel.net
> > > http://chris.wensel.net/
> > > http://www.cascading.org/
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
>  Chris K Wensel
>  chris@wensel.net
>  http://chris.wensel.net/
>  http://www.cascading.org/
>
>
>
>
>

Re: hbase on ec2 with s3 anyone?

Posted by Chris K Wensel <ch...@wensel.net>.
you only need the contrib/ec2 scripts from 0.17. you don't need Hadoop  
0.17.0.

just checkout the scripts and use them with whatever version of Hadoop  
you are most comfortable with (the version that works with HBase I  
expect).

On May 7, 2008, at 12:45 PM, Jim R. Wilson wrote:

> Cool cool - thanks again Chris.
>
> I'm thinking I should use hadoop-0.17 instead of 0.16.3 at this time
> because it appears 0.17 has better support for ec2 (less
> configuration, no dyndns necessary etc).
>
> Is there a public directory somewhere which houses nightly branch
> builds? or do I need to build 0.17 myself, then post it somewhere
> (like s3) and have the script access that?
>
> -- Jim
>
> On Wed, May 7, 2008 at 2:27 PM, Chris K Wensel <ch...@wensel.net>  
> wrote:
>> you do need the whole ec2 tree for the scripts to work...
>>
>>
>>
>> On May 7, 2008, at 12:25 PM, Jim R. Wilson wrote:
>>
>>
>>> Nevermind, looks like I needed these:
>>> ./src/contrib/ec2/bin/image/create-hadoop-image-remote
>>> ./src/contrib/ec2/bin/create-hadoop-image
>>>
>>> -- Jim
>>>
>>> On Wed, May 7, 2008 at 2:23 PM, Jim R. Wilson <wilson.jim.r@gmail.com 
>>> >
>> wrote:
>>>
>>>> Thanks Chris,
>>>>
>>>> Where do I get this supposed "image/create-hadoop-remote"  
>>>> script?  I
>>>> couldn't `find` it anywhere within the hadoop svn tree, and the  
>>>> link
>>>> in the hadoop wiki is broken :/
>>>>
>>>> -- Jim
>>>>
>>>>
>>>>
>>>> On Wed, May 7, 2008 at 2:04 PM, Chris K Wensel <ch...@wensel.net>  
>>>> wrote:
>>>>
>>>>> You don't need 0.17 to use the scripts mentioned in the EC2 wiki  
>>>>> page.
>> Just
>>>>> grab contrib/ec2 from the 0.17.0 branch.
>>>>>
>>>>> as for images, you will need to update the image/create-hadoop- 
>>>>> remote
>> bash
>>>>> script to download and install hbase.
>>>>>
>>>>> and update hadoop-init to start it with the proper properties.
>>>>>
>>>>> once you look at these scripts, it should be fairly obvious what  
>>>>> you
>> need
>>>>> to do.
>>>>>
>>>>> then just run 'create-image' command to stuff this new image  
>>>>> into one
>> of
>>>>> your buckets.
>>>>>
>>>>> enjoy
>>>>> ckw
>>>>>
>>>>>
>>>>>
>>>>> On May 7, 2008, at 11:12 AM, Jim R. Wilson wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm about to embark on a mystical journey through hosted
>> web-services
>>>>>> with my trusted friend hbase.  Here are some questions for my  
>>>>>> fellow
>>>>>> travelers:
>>>>>>
>>>>>> 1) Has anyone done this before? If so, what lifesaving tips can  
>>>>>> you
>> offer?
>>>>>> 2) Should I attempt to build an hdfs out of ec2 persistent  
>>>>>> storage,
>> or
>>>>>> just use S3?
>>>>>> 3) How many images will I need? Just one, or master/slave?
>>>>>> 4) What version of hadoop/hbase should I use?  (The hadoop/ec2
>>>>>> instructions[1] seem to favor the unreleased 0.17, but there  
>>>>>> doesn't
>>>>>> seem to be a public image with 0.17 at the ready)
>>>>>>
>>>>>> Thanks in advance for any advice, I'm gearing up for quite a  
>>>>>> trip :)
>>>>>>
>>>>>> [1] http://wiki.apache.org/hadoop/AmazonEC2
>>>>>>
>>>>>> -- Jim R. Wilson (jimbojw)
>>>>>>
>>>>>>
>>>>>
>>>>> Chris K Wensel
>>>>> chris@wensel.net
>>>>> http://chris.wensel.net/
>>>>> http://www.cascading.org/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>> Chris K Wensel
>> chris@wensel.net
>> http://chris.wensel.net/
>> http://www.cascading.org/
>>
>>
>>
>>
>>

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/
http://www.cascading.org/





Re: hbase on ec2 with s3 anyone?

Posted by "Jim R. Wilson" <wi...@gmail.com>.
Cool cool - thanks again Chris.

I'm thinking I should use hadoop-0.17 instead of 0.16.3 at this time
because it appears 0.17 has better support for ec2 (less
configuration, no dyndns necessary etc).

Is there a public directory somewhere which houses nightly branch
builds? or do I need to build 0.17 myself, then post it somewhere
(like s3) and have the script access that?

-- Jim

On Wed, May 7, 2008 at 2:27 PM, Chris K Wensel <ch...@wensel.net> wrote:
> you do need the whole ec2 tree for the scripts to work...
>
>
>
>  On May 7, 2008, at 12:25 PM, Jim R. Wilson wrote:
>
>
> > Nevermind, looks like I needed these:
> > ./src/contrib/ec2/bin/image/create-hadoop-image-remote
> > ./src/contrib/ec2/bin/create-hadoop-image
> >
> > -- Jim
> >
> > On Wed, May 7, 2008 at 2:23 PM, Jim R. Wilson <wi...@gmail.com>
> wrote:
> >
> > > Thanks Chris,
> > >
> > > Where do I get this supposed "image/create-hadoop-remote" script?  I
> > > couldn't `find` it anywhere within the hadoop svn tree, and the link
> > > in the hadoop wiki is broken :/
> > >
> > > -- Jim
> > >
> > >
> > >
> > > On Wed, May 7, 2008 at 2:04 PM, Chris K Wensel <ch...@wensel.net> wrote:
> > >
> > > > You don't need 0.17 to use the scripts mentioned in the EC2 wiki page.
> Just
> > > > grab contrib/ec2 from the 0.17.0 branch.
> > > >
> > > > as for images, you will need to update the image/create-hadoop-remote
> bash
> > > > script to download and install hbase.
> > > >
> > > > and update hadoop-init to start it with the proper properties.
> > > >
> > > > once you look at these scripts, it should be fairly obvious what you
> need
> > > > to do.
> > > >
> > > > then just run 'create-image' command to stuff this new image into one
> of
> > > > your buckets.
> > > >
> > > > enjoy
> > > > ckw
> > > >
> > > >
> > > >
> > > > On May 7, 2008, at 11:12 AM, Jim R. Wilson wrote:
> > > >
> > > >
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'm about to embark on a mystical journey through hosted
> web-services
> > > > > with my trusted friend hbase.  Here are some questions for my fellow
> > > > > travelers:
> > > > >
> > > > > 1) Has anyone done this before? If so, what lifesaving tips can you
> offer?
> > > > > 2) Should I attempt to build an hdfs out of ec2 persistent storage,
> or
> > > > > just use S3?
> > > > > 3) How many images will I need? Just one, or master/slave?
> > > > > 4) What version of hadoop/hbase should I use?  (The hadoop/ec2
> > > > > instructions[1] seem to favor the unreleased 0.17, but there doesn't
> > > > > seem to be a public image with 0.17 at the ready)
> > > > >
> > > > > Thanks in advance for any advice, I'm gearing up for quite a trip :)
> > > > >
> > > > > [1] http://wiki.apache.org/hadoop/AmazonEC2
> > > > >
> > > > > -- Jim R. Wilson (jimbojw)
> > > > >
> > > > >
> > > >
> > > > Chris K Wensel
> > > > chris@wensel.net
> > > > http://chris.wensel.net/
> > > > http://www.cascading.org/
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
>
>  Chris K Wensel
>  chris@wensel.net
>  http://chris.wensel.net/
>  http://www.cascading.org/
>
>
>
>
>

Re: hbase on ec2 with s3 anyone?

Posted by Chris K Wensel <ch...@wensel.net>.
you do need the whole ec2 tree for the scripts to work...

On May 7, 2008, at 12:25 PM, Jim R. Wilson wrote:

> Nevermind, looks like I needed these:
> ./src/contrib/ec2/bin/image/create-hadoop-image-remote
> ./src/contrib/ec2/bin/create-hadoop-image
>
> -- Jim
>
> On Wed, May 7, 2008 at 2:23 PM, Jim R. Wilson  
> <wi...@gmail.com> wrote:
>> Thanks Chris,
>>
>> Where do I get this supposed "image/create-hadoop-remote" script?  I
>> couldn't `find` it anywhere within the hadoop svn tree, and the link
>> in the hadoop wiki is broken :/
>>
>> -- Jim
>>
>>
>>
>> On Wed, May 7, 2008 at 2:04 PM, Chris K Wensel <ch...@wensel.net>  
>> wrote:
>>> You don't need 0.17 to use the scripts mentioned in the EC2 wiki  
>>> page. Just
>>> grab contrib/ec2 from the 0.17.0 branch.
>>>
>>> as for images, you will need to update the image/create-hadoop- 
>>> remote bash
>>> script to download and install hbase.
>>>
>>> and update hadoop-init to start it with the proper properties.
>>>
>>> once you look at these scripts, it should be fairly obvious what  
>>> you need
>>> to do.
>>>
>>> then just run 'create-image' command to stuff this new image into  
>>> one of
>>> your buckets.
>>>
>>> enjoy
>>> ckw
>>>
>>>
>>>
>>> On May 7, 2008, at 11:12 AM, Jim R. Wilson wrote:
>>>
>>>
>>>> Hi all,
>>>>
>>>> I'm about to embark on a mystical journey through hosted web- 
>>>> services
>>>> with my trusted friend hbase.  Here are some questions for my  
>>>> fellow
>>>> travelers:
>>>>
>>>> 1) Has anyone done this before? If so, what lifesaving tips can  
>>>> you offer?
>>>> 2) Should I attempt to build an hdfs out of ec2 persistent  
>>>> storage, or
>>>> just use S3?
>>>> 3) How many images will I need? Just one, or master/slave?
>>>> 4) What version of hadoop/hbase should I use?  (The hadoop/ec2
>>>> instructions[1] seem to favor the unreleased 0.17, but there  
>>>> doesn't
>>>> seem to be a public image with 0.17 at the ready)
>>>>
>>>> Thanks in advance for any advice, I'm gearing up for quite a  
>>>> trip :)
>>>>
>>>> [1] http://wiki.apache.org/hadoop/AmazonEC2
>>>>
>>>> -- Jim R. Wilson (jimbojw)
>>>>
>>>
>>> Chris K Wensel
>>> chris@wensel.net
>>> http://chris.wensel.net/
>>> http://www.cascading.org/
>>>
>>>
>>>
>>>
>>>
>>

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/
http://www.cascading.org/





Re: hbase on ec2 with s3 anyone?

Posted by "Jim R. Wilson" <wi...@gmail.com>.
Nevermind, looks like I needed these:
./src/contrib/ec2/bin/image/create-hadoop-image-remote
./src/contrib/ec2/bin/create-hadoop-image

-- Jim

On Wed, May 7, 2008 at 2:23 PM, Jim R. Wilson <wi...@gmail.com> wrote:
> Thanks Chris,
>
>  Where do I get this supposed "image/create-hadoop-remote" script?  I
>  couldn't `find` it anywhere within the hadoop svn tree, and the link
>  in the hadoop wiki is broken :/
>
>  -- Jim
>
>
>
>  On Wed, May 7, 2008 at 2:04 PM, Chris K Wensel <ch...@wensel.net> wrote:
>  > You don't need 0.17 to use the scripts mentioned in the EC2 wiki page. Just
>  > grab contrib/ec2 from the 0.17.0 branch.
>  >
>  >  as for images, you will need to update the image/create-hadoop-remote bash
>  > script to download and install hbase.
>  >
>  >  and update hadoop-init to start it with the proper properties.
>  >
>  >  once you look at these scripts, it should be fairly obvious what you need
>  > to do.
>  >
>  >  then just run 'create-image' command to stuff this new image into one of
>  > your buckets.
>  >
>  >  enjoy
>  >  ckw
>  >
>  >
>  >
>  >  On May 7, 2008, at 11:12 AM, Jim R. Wilson wrote:
>  >
>  >
>  > > Hi all,
>  > >
>  > > I'm about to embark on a mystical journey through hosted web-services
>  > > with my trusted friend hbase.  Here are some questions for my fellow
>  > > travelers:
>  > >
>  > > 1) Has anyone done this before? If so, what lifesaving tips can you offer?
>  > > 2) Should I attempt to build an hdfs out of ec2 persistent storage, or
>  > > just use S3?
>  > > 3) How many images will I need? Just one, or master/slave?
>  > > 4) What version of hadoop/hbase should I use?  (The hadoop/ec2
>  > > instructions[1] seem to favor the unreleased 0.17, but there doesn't
>  > > seem to be a public image with 0.17 at the ready)
>  > >
>  > > Thanks in advance for any advice, I'm gearing up for quite a trip :)
>  > >
>  > > [1] http://wiki.apache.org/hadoop/AmazonEC2
>  > >
>  > > -- Jim R. Wilson (jimbojw)
>  > >
>  >
>  >  Chris K Wensel
>  >  chris@wensel.net
>  >  http://chris.wensel.net/
>  >  http://www.cascading.org/
>  >
>  >
>  >
>  >
>  >
>

Re: hbase on ec2 with s3 anyone?

Posted by Chris K Wensel <ch...@wensel.net>.
here is the base, what you should check out from svn:
http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.17/src/contrib/ec2/

image create scripts live here:
http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.17/src/contrib/ec2/bin/image/


On May 7, 2008, at 12:23 PM, Jim R. Wilson wrote:

> Thanks Chris,
>
> Where do I get this supposed "image/create-hadoop-remote" script?  I
> couldn't `find` it anywhere within the hadoop svn tree, and the link
> in the hadoop wiki is broken :/
>
> -- Jim
>
> On Wed, May 7, 2008 at 2:04 PM, Chris K Wensel <ch...@wensel.net>  
> wrote:
>> You don't need 0.17 to use the scripts mentioned in the EC2 wiki  
>> page. Just
>> grab contrib/ec2 from the 0.17.0 branch.
>>
>> as for images, you will need to update the image/create-hadoop- 
>> remote bash
>> script to download and install hbase.
>>
>> and update hadoop-init to start it with the proper properties.
>>
>> once you look at these scripts, it should be fairly obvious what  
>> you need
>> to do.
>>
>> then just run 'create-image' command to stuff this new image into  
>> one of
>> your buckets.
>>
>> enjoy
>> ckw
>>
>>
>>
>> On May 7, 2008, at 11:12 AM, Jim R. Wilson wrote:
>>
>>
>>> Hi all,
>>>
>>> I'm about to embark on a mystical journey through hosted web- 
>>> services
>>> with my trusted friend hbase.  Here are some questions for my fellow
>>> travelers:
>>>
>>> 1) Has anyone done this before? If so, what lifesaving tips can  
>>> you offer?
>>> 2) Should I attempt to build an hdfs out of ec2 persistent  
>>> storage, or
>>> just use S3?
>>> 3) How many images will I need? Just one, or master/slave?
>>> 4) What version of hadoop/hbase should I use?  (The hadoop/ec2
>>> instructions[1] seem to favor the unreleased 0.17, but there doesn't
>>> seem to be a public image with 0.17 at the ready)
>>>
>>> Thanks in advance for any advice, I'm gearing up for quite a trip :)
>>>
>>> [1] http://wiki.apache.org/hadoop/AmazonEC2
>>>
>>> -- Jim R. Wilson (jimbojw)
>>>
>>
>> Chris K Wensel
>> chris@wensel.net
>> http://chris.wensel.net/
>> http://www.cascading.org/
>>
>>
>>
>>
>>

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/
http://www.cascading.org/





Re: hbase on ec2 with s3 anyone?

Posted by "Jim R. Wilson" <wi...@gmail.com>.
Thanks Chris,

Where do I get this supposed "image/create-hadoop-remote" script?  I
couldn't `find` it anywhere within the hadoop svn tree, and the link
in the hadoop wiki is broken :/

-- Jim

On Wed, May 7, 2008 at 2:04 PM, Chris K Wensel <ch...@wensel.net> wrote:
> You don't need 0.17 to use the scripts mentioned in the EC2 wiki page. Just
> grab contrib/ec2 from the 0.17.0 branch.
>
>  as for images, you will need to update the image/create-hadoop-remote bash
> script to download and install hbase.
>
>  and update hadoop-init to start it with the proper properties.
>
>  once you look at these scripts, it should be fairly obvious what you need
> to do.
>
>  then just run 'create-image' command to stuff this new image into one of
> your buckets.
>
>  enjoy
>  ckw
>
>
>
>  On May 7, 2008, at 11:12 AM, Jim R. Wilson wrote:
>
>
> > Hi all,
> >
> > I'm about to embark on a mystical journey through hosted web-services
> > with my trusted friend hbase.  Here are some questions for my fellow
> > travelers:
> >
> > 1) Has anyone done this before? If so, what lifesaving tips can you offer?
> > 2) Should I attempt to build an hdfs out of ec2 persistent storage, or
> > just use S3?
> > 3) How many images will I need? Just one, or master/slave?
> > 4) What version of hadoop/hbase should I use?  (The hadoop/ec2
> > instructions[1] seem to favor the unreleased 0.17, but there doesn't
> > seem to be a public image with 0.17 at the ready)
> >
> > Thanks in advance for any advice, I'm gearing up for quite a trip :)
> >
> > [1] http://wiki.apache.org/hadoop/AmazonEC2
> >
> > -- Jim R. Wilson (jimbojw)
> >
>
>  Chris K Wensel
>  chris@wensel.net
>  http://chris.wensel.net/
>  http://www.cascading.org/
>
>
>
>
>

Re: hbase on ec2 with s3 anyone?

Posted by Chris K Wensel <ch...@wensel.net>.
You don't need 0.17 to use the scripts mentioned in the EC2 wiki page.  
Just grab contrib/ec2 from the 0.17.0 branch.

as for images, you will need to update the image/create-hadoop-remote  
bash script to download and install hbase.

and update hadoop-init to start it with the proper properties.

once you look at these scripts, it should be fairly obvious what you  
need to do.

then just run 'create-image' command to stuff this new image into one  
of your buckets.

enjoy
ckw

On May 7, 2008, at 11:12 AM, Jim R. Wilson wrote:

> Hi all,
>
> I'm about to embark on a mystical journey through hosted web-services
> with my trusted friend hbase.  Here are some questions for my fellow
> travelers:
>
> 1) Has anyone done this before? If so, what lifesaving tips can you  
> offer?
> 2) Should I attempt to build an hdfs out of ec2 persistent storage, or
> just use S3?
> 3) How many images will I need? Just one, or master/slave?
> 4) What version of hadoop/hbase should I use?  (The hadoop/ec2
> instructions[1] seem to favor the unreleased 0.17, but there doesn't
> seem to be a public image with 0.17 at the ready)
>
> Thanks in advance for any advice, I'm gearing up for quite a trip :)
>
> [1] http://wiki.apache.org/hadoop/AmazonEC2
>
> -- Jim R. Wilson (jimbojw)

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/
http://www.cascading.org/





Re: hbase on ec2 with s3 anyone?

Posted by Clint Morgan <cm...@troove.net>.
I've tried s3 on hbase. See a previous post I made on this list.
Basically its considerably slower than hdfs, especially so for random
reads. Also I think there could be consistency issues when running s3:
master creates file, tells region server to read it, and region server
gets a file not found. This happened to me a couple of times running
s3 as the main map/reduce filesystem. We've basically decided to run
our own hdfs, and just use s3 for backup.

3) I see no need for diff images (we use one for all).

have fun,
-clint


On Wed, May 7, 2008 at 11:12 AM, Jim R. Wilson <wi...@gmail.com> wrote:
> Hi all,
>
>  I'm about to embark on a mystical journey through hosted web-services
>  with my trusted friend hbase.  Here are some questions for my fellow
>  travelers:
>
>  1) Has anyone done this before? If so, what lifesaving tips can you offer?
>  2) Should I attempt to build an hdfs out of ec2 persistent storage, or
>  just use S3?
>  3) How many images will I need? Just one, or master/slave?
>  4) What version of hadoop/hbase should I use?  (The hadoop/ec2
>  instructions[1] seem to favor the unreleased 0.17, but there doesn't
>  seem to be a public image with 0.17 at the ready)
>
>  Thanks in advance for any advice, I'm gearing up for quite a trip :)
>
>  [1] http://wiki.apache.org/hadoop/AmazonEC2
>
>  -- Jim R. Wilson (jimbojw)
>