You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Sean Busbey <bu...@apache.org> on 2018/05/14 21:14:55 UTC

WAL storage policies and interactions with Hadoop admin tools.

Hi folks!

I'm trying to reason through our "set a storage policy for WALs"
feature and having some difficulty. I want to get some feedback before
I fix our docs or submit a patch to change behavior.

Here's the history of the feature as I understand it:

1) Starting in HBase 1.1 you can change the setting
"hbase.wal.storage.policy" and if the underlying Hadoop installation
supports storage policies[1] then we'll call the needed APIs to set
policies as we create WALs.

The main use case is to tell HDFS that you want the HBase WAL on SSDs
in a mixed hardware deployment.

2) In HBase 1.1 - 1.4, the above setting defaulted to the value
"NONE". Our utility code for setting storage policies expressly checks
any config value against the default and when it matches opts to log a
message rather than call the actual Hadoop API[2]. This is important
since "NONE" isn't actually a valid storage policy, so if we pass it
to the Hadoop API we'll get a bunch of log noise.

3) In HBase 2 and 1.5+, the setting defaults to "HOT" as of
HBASE-18118. Now if we were to pass the value to the Hadoop API we
won't get log noise. The utility code does the same check against our
default. The Hadoop default storage policy is "HOT" so presumably we
save an RPC call by not setting it again.

----

If the above is correct, how do I specify that I want WALs to have a
storage policy of HOT in the event that HDFS already has some other
policy in place for a parent directory?

e.g. In HBase 1.1 - 1.4, I can set the storage policy (via Hadoop
admin tools) for "/hbase" to be COLD and I can change
"hbase.wal.storage.policy" to HOT. In HBase 2 and 1.5+, AFAICT my WALs
will still have the COLD policy.

Related, but different problem: I can use Hadoop admin tools to set
the storage policy for "/hbase" to be "ALL_SSD" and if I leave HBase
configs on defaults then I end up with WALs having "ALL_SSD" as their
policy in all versions. But in HBase 2 and 1.5+ the HBase configs
claim the policy is HOT.

Should we always set the policy if the api is available? To avoid
having to double-configure in something like the second case, do we
still need a way to say "please do not expressly set a storage
policy"? (as an alternative we could just call out "be sure to update
your WAL config" in docs)



[1]: "Storage Policy" gets called several things in Hadoop, like
Archival Storage, Heterogenous Storage, HSM, and "Hierarchical
Storage". In all cases I'm talking about the feature documented here:

http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
http://hadoop.apache.org/docs/r3.0.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html

I think it's available in Hadoop 2.6.0+, 3.0.0+.

[2]:

In rel/1.2.0 you can see the default check by tracing starting at FSHLog:

https://s.apache.org/BqAk

The constants referred to in that code are in HConstants:

https://s.apache.org/OJyR

And in FSUtils we exit the function early when the default matches
what we pull out of configs:

 https://s.apache.org/A4GA

In rel/2.0.0 the code works essentially the same but has moved around.
The starting point is now AbstractFSWAL:

https://s.apache.org/pp6T

The constants now use HOT instead of NONE as a default:

https://s.apache.org/7K2J

and in CommonFSUtils we do the same early return:

https://s.apache.org/fYKr

Re: WAL storage policies and interactions with Hadoop admin tools.

Posted by Wei-Chiu Chuang <we...@cloudera.com>.
Thanks Sean for the insight & detailed analysis!

I think it makes sense to revert HBASE-18118.
It's not as trivial to maintain backward compatibility. Kepping "NONE" as
the default hsm doesn't harm. Having additional documentation is helpful to
avoid confusion (since "NONE" is not a supported HDFS HSM option)

On Mon, May 14, 2018 at 9:38 PM Yu Li <ca...@gmail.com> wrote:

> Thanks for pointing this out Sean. IMHO, after re-checking the codes,
> HBASE-18118 needs an addendum (at least). The proposal was to set the
> storage policy of WAL directory to HOT by default, but the current
> implementation could not achieve this: it follows the old "NONE" logic to
> escape calling the API if policy matches default, but for "HOT" we need an
> explicit call to HDFS.
>
> Further more, I think the old logic to leave default to "NONE" is even
> better: if admin set hbase.root.dir to some policy like ALL_SSD the WAL
> will simply follow, and if not the policy is HOT by default
> So maybe reverting HBASE-18118 is a better choice although I could see my
> own +1 on HBASE-18118 there?... @Andrew what's your opinion here?
>
> And btw, I have opened HBASE-20479 for documenting the whole HSM solution
> in hbase including HFile/WAL/Bulkload etc. (but still haven't got enough
> time to complete it) JFYI.
>
>
> Best Regards,
> Yu
>
> On 15 May 2018 at 05:14, Sean Busbey <bu...@apache.org> wrote:
>
>> Hi folks!
>>
>> I'm trying to reason through our "set a storage policy for WALs"
>> feature and having some difficulty. I want to get some feedback before
>> I fix our docs or submit a patch to change behavior.
>>
>> Here's the history of the feature as I understand it:
>>
>> 1) Starting in HBase 1.1 you can change the setting
>> "hbase.wal.storage.policy" and if the underlying Hadoop installation
>> supports storage policies[1] then we'll call the needed APIs to set
>> policies as we create WALs.
>>
>> The main use case is to tell HDFS that you want the HBase WAL on SSDs
>> in a mixed hardware deployment.
>>
>> 2) In HBase 1.1 - 1.4, the above setting defaulted to the value
>> "NONE". Our utility code for setting storage policies expressly checks
>> any config value against the default and when it matches opts to log a
>> message rather than call the actual Hadoop API[2]. This is important
>> since "NONE" isn't actually a valid storage policy, so if we pass it
>> to the Hadoop API we'll get a bunch of log noise.
>>
>> 3) In HBase 2 and 1.5+, the setting defaults to "HOT" as of
>> HBASE-18118. Now if we were to pass the value to the Hadoop API we
>> won't get log noise. The utility code does the same check against our
>> default. The Hadoop default storage policy is "HOT" so presumably we
>> save an RPC call by not setting it again.
>>
>> ----
>>
>> If the above is correct, how do I specify that I want WALs to have a
>> storage policy of HOT in the event that HDFS already has some other
>> policy in place for a parent directory?
>>
>> e.g. In HBase 1.1 - 1.4, I can set the storage policy (via Hadoop
>> admin tools) for "/hbase" to be COLD and I can change
>> "hbase.wal.storage.policy" to HOT. In HBase 2 and 1.5+, AFAICT my WALs
>> will still have the COLD policy.
>>
>> Related, but different problem: I can use Hadoop admin tools to set
>> the storage policy for "/hbase" to be "ALL_SSD" and if I leave HBase
>> configs on defaults then I end up with WALs having "ALL_SSD" as their
>> policy in all versions. But in HBase 2 and 1.5+ the HBase configs
>> claim the policy is HOT.
>>
>> Should we always set the policy if the api is available? To avoid
>> having to double-configure in something like the second case, do we
>> still need a way to say "please do not expressly set a storage
>> policy"? (as an alternative we could just call out "be sure to update
>> your WAL config" in docs)
>>
>>
>>
>> [1]: "Storage Policy" gets called several things in Hadoop, like
>> Archival Storage, Heterogenous Storage, HSM, and "Hierarchical
>> Storage". In all cases I'm talking about the feature documented here:
>>
>>
>> http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
>>
>> http://hadoop.apache.org/docs/r3.0.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
>>
>> I think it's available in Hadoop 2.6.0+, 3.0.0+.
>>
>> [2]:
>>
>> In rel/1.2.0 you can see the default check by tracing starting at FSHLog:
>>
>> https://s.apache.org/BqAk
>>
>> The constants referred to in that code are in HConstants:
>>
>> https://s.apache.org/OJyR
>>
>> And in FSUtils we exit the function early when the default matches
>> what we pull out of configs:
>>
>>  https://s.apache.org/A4GA
>>
>> In rel/2.0.0 the code works essentially the same but has moved around.
>> The starting point is now AbstractFSWAL:
>>
>> https://s.apache.org/pp6T
>>
>> The constants now use HOT instead of NONE as a default:
>>
>> https://s.apache.org/7K2J
>>
>> and in CommonFSUtils we do the same early return:
>>
>> https://s.apache.org/fYKr
>>
>
>

-- 
A very happy Clouderan

Re: WAL storage policies and interactions with Hadoop admin tools.

Posted by Yu Li <ca...@gmail.com>.
Thanks for pointing this out Sean. IMHO, after re-checking the codes,
HBASE-18118 needs an addendum (at least). The proposal was to set the
storage policy of WAL directory to HOT by default, but the current
implementation could not achieve this: it follows the old "NONE" logic to
escape calling the API if policy matches default, but for "HOT" we need an
explicit call to HDFS.

Further more, I think the old logic to leave default to "NONE" is even
better: if admin set hbase.root.dir to some policy like ALL_SSD the WAL
will simply follow, and if not the policy is HOT by default
So maybe reverting HBASE-18118 is a better choice although I could see my
own +1 on HBASE-18118 there?... @Andrew what's your opinion here?

And btw, I have opened HBASE-20479 for documenting the whole HSM solution
in hbase including HFile/WAL/Bulkload etc. (but still haven't got enough
time to complete it) JFYI.


Best Regards,
Yu

On 15 May 2018 at 05:14, Sean Busbey <bu...@apache.org> wrote:

> Hi folks!
>
> I'm trying to reason through our "set a storage policy for WALs"
> feature and having some difficulty. I want to get some feedback before
> I fix our docs or submit a patch to change behavior.
>
> Here's the history of the feature as I understand it:
>
> 1) Starting in HBase 1.1 you can change the setting
> "hbase.wal.storage.policy" and if the underlying Hadoop installation
> supports storage policies[1] then we'll call the needed APIs to set
> policies as we create WALs.
>
> The main use case is to tell HDFS that you want the HBase WAL on SSDs
> in a mixed hardware deployment.
>
> 2) In HBase 1.1 - 1.4, the above setting defaulted to the value
> "NONE". Our utility code for setting storage policies expressly checks
> any config value against the default and when it matches opts to log a
> message rather than call the actual Hadoop API[2]. This is important
> since "NONE" isn't actually a valid storage policy, so if we pass it
> to the Hadoop API we'll get a bunch of log noise.
>
> 3) In HBase 2 and 1.5+, the setting defaults to "HOT" as of
> HBASE-18118. Now if we were to pass the value to the Hadoop API we
> won't get log noise. The utility code does the same check against our
> default. The Hadoop default storage policy is "HOT" so presumably we
> save an RPC call by not setting it again.
>
> ----
>
> If the above is correct, how do I specify that I want WALs to have a
> storage policy of HOT in the event that HDFS already has some other
> policy in place for a parent directory?
>
> e.g. In HBase 1.1 - 1.4, I can set the storage policy (via Hadoop
> admin tools) for "/hbase" to be COLD and I can change
> "hbase.wal.storage.policy" to HOT. In HBase 2 and 1.5+, AFAICT my WALs
> will still have the COLD policy.
>
> Related, but different problem: I can use Hadoop admin tools to set
> the storage policy for "/hbase" to be "ALL_SSD" and if I leave HBase
> configs on defaults then I end up with WALs having "ALL_SSD" as their
> policy in all versions. But in HBase 2 and 1.5+ the HBase configs
> claim the policy is HOT.
>
> Should we always set the policy if the api is available? To avoid
> having to double-configure in something like the second case, do we
> still need a way to say "please do not expressly set a storage
> policy"? (as an alternative we could just call out "be sure to update
> your WAL config" in docs)
>
>
>
> [1]: "Storage Policy" gets called several things in Hadoop, like
> Archival Storage, Heterogenous Storage, HSM, and "Hierarchical
> Storage". In all cases I'm talking about the feature documented here:
>
> http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/
> hadoop-hdfs/ArchivalStorage.html
> http://hadoop.apache.org/docs/r3.0.2/hadoop-project-dist/
> hadoop-hdfs/ArchivalStorage.html
>
> I think it's available in Hadoop 2.6.0+, 3.0.0+.
>
> [2]:
>
> In rel/1.2.0 you can see the default check by tracing starting at FSHLog:
>
> https://s.apache.org/BqAk
>
> The constants referred to in that code are in HConstants:
>
> https://s.apache.org/OJyR
>
> And in FSUtils we exit the function early when the default matches
> what we pull out of configs:
>
>  https://s.apache.org/A4GA
>
> In rel/2.0.0 the code works essentially the same but has moved around.
> The starting point is now AbstractFSWAL:
>
> https://s.apache.org/pp6T
>
> The constants now use HOT instead of NONE as a default:
>
> https://s.apache.org/7K2J
>
> and in CommonFSUtils we do the same early return:
>
> https://s.apache.org/fYKr
>