You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jagane Sundar <ja...@sundar.org> on 2011/12/27 05:06:57 UTC

Writing the WAL to a different filesystem from the HFiles

Hello HBaseistas,

I have a question: Has anyone here tried to run HBase with the WAL going to one FileSystem, and the HFiles going to another?

My casual perusal of the source code indicates that this is not possible, so I am willing to hack a configuration option to do this and try it out.

Let me explain my motivation for trying this experiment: I want to run HBase on Amazon EC2, with the WAL going to a HDFS on Ephemeral Storage, and HFiles going to S3.

Is this a reasonable thing to attempt? Or am I out of my mind?

Thanks in advance, and congratulations on building an outstanding piece of software.
Jagane

RE: Writing the WAL to a different filesystem from the HFiles

Posted by Jagane Sundar <ja...@sundar.org>.
> Not always being able to read back an object that has been written is deadly. Having the S3 client cache written > data for a while can help but isn't a complete solution because the RS can fail and its regions will be reassigned > to another RS... who then might not be able to read the data. A region might bounce around the cluster taking
> exceptions on open for a while. This availability problem could eventually stall all clients. To address this, you
> could implement a distributed write-behind cache for S3, but is it worth the effort and added complexity?

Argh. Eventual consistency bites. Perhaps HDFS on EBS is the only viable solution after all.

The trouble is cost - S3 is 14 cents a GB-month, with full redundancy (whatever that means), whereas EBS is 10 cents a GB-month. EBS' redundancy may not really be adequate. So you probably need 2 or 3 HDFS block replicas, so EBS storage may cost 20 cents a GB-month or 30 cents a GB-month, depending on your pain threshold.

I am most interested in running HBase well in the cloud - EC2 and other OpenStack based IaaSes.

Thanks for sharing your insights, Andrew.


Jagane

Re: Writing the WAL to a different filesystem from the HFiles

Posted by Andrew Purtell <ap...@apache.org>.
> Improvements in Hadoop's S3 client, and in the implementation of S3 itself 
> could help to fix throughput problems and mask transient error problems.
Not always being able to read back an object that has been written is deadly. Having the S3 client cache written data for a while can help but isn't a complete solution because the RS can fail and its regions will be reassigned to another RS... who then might not be able to read the data. A region might bounce around the cluster taking exceptions on open for a while. This availability problem could eventually stall all clients. To address this, you could implement a distributed write-behind cache for S3, but is it worth the effort and added complexity?


> Andy - are you (or other HBase experts) aware if HBase would have problems with 
> a HFile store that exhibits variable latency? Specifically, what about scenarios 
> where most HFile reads come back in milliseconds, but suddenly there is one that 
> takes a few hundred milliseconds (or more).

If we are talking about S3, then I have observed latencies in the thousands of milliseconds.

The scenario you describe won't cause a service availability problem on the HBase side, we can tolerate a wide range of read and write latencies, but this would impact performance obviously -- whenever reads affected by a latency spike are for blocks not in block cache, then the client will see it.


Best regards,


      - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Jagane Sundar <ja...@sundar.org>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Cc: 
> Sent: Wednesday, December 28, 2011 12:02 PM
> Subject: RE: Writing the WAL to a different filesystem from the HFiles
> 
> Hello Andy,
> 
> 
>>>  No, definitely not full object reads, we use HDFS positioned reads, 
> which allow us to request, within a gigabyte plus store file, much smaller byte 
>>>  ranges (e.g. 64 KB), and receive back only the requested data. We can 
> "seek" around the file.
> 
> Ahh. This is good to know. HTTP range requests should work for this mode of 
> operation. I will take a look at Hadoop's S3 FileSystemStore implementation 
> and see if it uses HTTP range requests.
> 
>>>  Aside from several IMHO showstopper performance problems, the shortest 
> answer is HBase often wants to promptly read back store files it has
>>>  written, and S3 is too eventual often enough (transient 404s or 500s) 
> to preclude reliable operation.
> 
> Hmm. OK. The potential performance problems are worrisome.
> 
> Improvements in Hadoop's S3 client, and in the implementation of S3 itself 
> could help to fix throughput problems and mask transient error problems. There 
> are rumors of a version of the Hadoop S3 client implementation that use parallel 
> reads to greatly improve throughput.
> 
> Andy - are you (or other HBase experts) aware if HBase would have problems with 
> a HFile store that exhibits variable latency? Specifically, what about scenarios 
> where most HFile reads come back in milliseconds, but suddenly there is one that 
> takes a few hundred milliseconds (or more).
> 
> Thanks,
> Jagane
> 

RE: Writing the WAL to a different filesystem from the HFiles

Posted by Jagane Sundar <ja...@sundar.org>.
Hello Andy,


>> No, definitely not full object reads, we use HDFS positioned reads, which allow us to request, within a gigabyte plus store file, much smaller byte >> ranges (e.g. 64 KB), and receive back only the requested data. We can "seek" around the file.

Ahh. This is good to know. HTTP range requests should work for this mode of operation. I will take a look at Hadoop's S3 FileSystemStore implementation and see if it uses HTTP range requests.

>> Aside from several IMHO showstopper performance problems, the shortest answer is HBase often wants to promptly read back store files it has
>> written, and S3 is too eventual often enough (transient 404s or 500s) to preclude reliable operation.

Hmm. OK. The potential performance problems are worrisome.

Improvements in Hadoop's S3 client, and in the implementation of S3 itself could help to fix throughput problems and mask transient error problems. There are rumors of a version of the Hadoop S3 client implementation that use parallel reads to greatly improve throughput.

Andy - are you (or other HBase experts) aware if HBase would have problems with a HFile store that exhibits variable latency? Specifically, what about scenarios where most HFile reads come back in milliseconds, but suddenly there is one that takes a few hundred milliseconds (or more).

Thanks,
Jagane



Re: Writing the WAL to a different filesystem from the HFiles

Posted by Andrew Purtell <ap...@apache.org>.
> HFile store wants more of an object store with full object reads and writes.

No, definitely not full object reads, we use HDFS positioned reads, which allow us to request, within a gigabyte plus store file, much smaller byte ranges (e.g. 64 KB), and receive back only the requested data. We can "seek" around the file.

> >  > Is this a reasonable thing to attempt? Or am I out of my mind?
> >  Check back in this mailing list archive for opinions on hbase atop s3.

Aside from several IMHO showstopper performance problems, the shortest answer is HBase often wants to promptly read back store files it has written, and S3 is too eventual often enough (transient 404s or 500s) to preclude reliable operation.

Best regards,

       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Jagane Sundar <ja...@sundar.org>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Cc: 
> Sent: Wednesday, December 28, 2011 10:42 AM
> Subject: RE: Writing the WAL to a different filesystem from the HFiles
> 
> Hello Mikael,
> 
> I am fairly new to HBase, so correct me if I am wrong - it seems to me that the 
> WAL and the HFile store impose distinctly different requirements upon the 
> underlying storage. The WAL really wants Posix semantics (append, flush, etc.) 
> in a distributed filesystem whereas the HFile store wants more of an object 
> store with full object reads and writes.
> 
> Adding this configuration option, or better yet, making the WAL fully pluggable 
> as described in https://issues.apache.org/jira/browse/HBASE-4529, opens up a lot 
> more storage architectures to HBase.
> 
> Thanks,
> Jagane
> ________________________________________
> From: Mikael Sitruk [mikael.sitruk@gmail.com]
> Sent: Wednesday, December 28, 2011 7:29 AM
> To: dev@hbase.apache.org
> Subject: Re: Writing the WAL to a different filesystem from the HFiles
> 
> What is the advantage of such configuration?
> Mikael.s
> On Dec 28, 2011 1:10 AM, "Stack" <st...@duboce.net> wrote:
> 
>>  On Mon, Dec 26, 2011 at 8:06 PM, Jagane Sundar <ja...@sundar.org> 
> wrote:
>>  > Is this a reasonable thing to attempt? Or am I out of my mind?
>>  >
>> 
>>  Check back in this mailing list archive for opinions on hbase atop s3.
>>  St.Ack
>> 
> 

RE: Writing the WAL to a different filesystem from the HFiles

Posted by Jagane Sundar <ja...@sundar.org>.
Hello Mikael,

I am fairly new to HBase, so correct me if I am wrong - it seems to me that the WAL and the HFile store impose distinctly different requirements upon the underlying storage. The WAL really wants Posix semantics (append, flush, etc.) in a distributed filesystem whereas the HFile store wants more of an object store with full object reads and writes.

Adding this configuration option, or better yet, making the WAL fully pluggable as described in https://issues.apache.org/jira/browse/HBASE-4529, opens up a lot more storage architectures to HBase.

Thanks,
Jagane
________________________________________
From: Mikael Sitruk [mikael.sitruk@gmail.com]
Sent: Wednesday, December 28, 2011 7:29 AM
To: dev@hbase.apache.org
Subject: Re: Writing the WAL to a different filesystem from the HFiles

What is the advantage of such configuration?
Mikael.s
On Dec 28, 2011 1:10 AM, "Stack" <st...@duboce.net> wrote:

> On Mon, Dec 26, 2011 at 8:06 PM, Jagane Sundar <ja...@sundar.org> wrote:
> > Is this a reasonable thing to attempt? Or am I out of my mind?
> >
>
> Check back in this mailing list archive for opinions on hbase atop s3.
> St.Ack
>

Re: Writing the WAL to a different filesystem from the HFiles

Posted by Mikael Sitruk <mi...@gmail.com>.
What is the advantage of such configuration?
Mikael.s
On Dec 28, 2011 1:10 AM, "Stack" <st...@duboce.net> wrote:

> On Mon, Dec 26, 2011 at 8:06 PM, Jagane Sundar <ja...@sundar.org> wrote:
> > Is this a reasonable thing to attempt? Or am I out of my mind?
> >
>
> Check back in this mailing list archive for opinions on hbase atop s3.
> St.Ack
>

Re: Writing the WAL to a different filesystem from the HFiles

Posted by Stack <st...@duboce.net>.
On Mon, Dec 26, 2011 at 8:06 PM, Jagane Sundar <ja...@sundar.org> wrote:
> Is this a reasonable thing to attempt? Or am I out of my mind?
>

Check back in this mailing list archive for opinions on hbase atop s3.
St.Ack

Re: Writing the WAL to a different filesystem from the HFiles

Posted by Ted Yu <yu...@gmail.com>.
Jagane:
Akash recently assigned https://issues.apache.org/jira/browse/HBASE-4529 to
himself.
I think 4529 fits your description.

You can contact Akash online or offline to coordinate the work between you
two.

Cheers

On Mon, Dec 26, 2011 at 8:06 PM, Jagane Sundar <ja...@sundar.org> wrote:

> Hello HBaseistas,
>
> I have a question: Has anyone here tried to run HBase with the WAL going
> to one FileSystem, and the HFiles going to another?
>
> My casual perusal of the source code indicates that this is not possible,
> so I am willing to hack a configuration option to do this and try it out.
>
> Let me explain my motivation for trying this experiment: I want to run
> HBase on Amazon EC2, with the WAL going to a HDFS on Ephemeral Storage, and
> HFiles going to S3.
>
> Is this a reasonable thing to attempt? Or am I out of my mind?
>
> Thanks in advance, and congratulations on building an outstanding piece of
> software.
> Jagane

Re: Writing the WAL to a different filesystem from the HFiles

Posted by Li Pi <li...@cloudera.com>.
Totally doable. not supported in normal configuration - but easy enough to
make. Open a jira.
On Dec 26, 2011 8:38 PM, "Jagane Sundar" <ja...@sundar.org> wrote:

> Hello HBaseistas,
>
> I have a question: Has anyone here tried to run HBase with the WAL going
> to one FileSystem, and the HFiles going to another?
>
> My casual perusal of the source code indicates that this is not possible,
> so I am willing to hack a configuration option to do this and try it out.
>
> Let me explain my motivation for trying this experiment: I want to run
> HBase on Amazon EC2, with the WAL going to a HDFS on Ephemeral Storage, and
> HFiles going to S3.
>
> Is this a reasonable thing to attempt? Or am I out of my mind?
>
> Thanks in advance, and congratulations on building an outstanding piece of
> software.
> Jagane