You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Krishna Kumaar Natarajan <na...@umn.edu> on 2013/10/02 06:52:59 UTC

HDFS / Federated HDFS - Doubts

Hi All,

While trying to understand federated HDFS in detail I had few doubts and
listing them down for your help.

   1. In case of *HDFS(without HDFS federation)*, the metadata or the data
   about the blocks belonging to the files in HDFS is maintained in the main
   memory of the name node or it is stored on permanent storage of the
   namenode and is brought in the main memory on demand basis ? [Krishna]
   Based on my understanding, I assume the entire metadata is in main memory
   which is an issue by itself. Please correct me if my understanding is wrong.
   2. In case of* federated HDFS*, the metadata or the data about the
   blocks belonging to files in a particular namespace is maintained in the
   main memory of the namenode or it is stored on the permanent storage of the
   namenode and is brought in the main memory on demand basis ?
   3. Are the metadata information stored in separate cluster nodes(block
   management layer separation) as discussed in Appendix B of this document ?
   https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
   4. I would like to know if the following proposals are already
   implemented in federated HDFS. (
   http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
    slide-17)
   - Separation of namespace and block management layers (same as qn.3)
      - Partial namespace in memory for further scalability
      - Move partial namespace from one namenode to another

Thanks,
Krishna

Re: HDFS / Federated HDFS - Doubts

Posted by Chris Mawata <ch...@gmail.com>.
One more thing, Krishna,  when using JounalNodes as opposed to the 
native file system for the metadata storage  you do get replication.
Chris


On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts 
> and listing them down for your help.
>
>  1. In case of *_HDFS(without HDFS federation)_*, the metadata or the
>     data about the blocks belonging to the files in HDFS is maintained
>     in the main memory of the name node or it is stored on permanent
>     storage of the namenode and is brought in the main memory on
>     demand basis ?[Krishna] Based on my understanding, I assume the
>     entire metadata is in main memory which is an issue by itself.
>     Please correct me if my understanding is wrong.
>  2. In case of*_federated HDFS_*, the metadata or the data about the
>     blocks belonging to files in a particular namespace is maintained
>     in the main memory of the namenode or it is stored on the
>     permanent storage of the namenode and is brought in the main
>     memory on demand basis ?
>  3. Are the metadata information stored in separate cluster
>     nodes(block management layer separation) as discussed in Appendix
>     B of this document
>     ?https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>  4. I would like to know if the following proposals are already
>     implemented in federated HDFS.
>     (http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17)
>       * Separation of namespace and block management layers (same as qn.3)
>       * Partial namespace in memory for further scalability
>       * Move partial namespace from one namenode to another
>
> Thanks,
> Krishna


Re: HDFS / Federated HDFS - Doubts

Posted by Suresh Srinivas <su...@hortonworks.com>.
On Wed, Oct 16, 2013 at 9:22 AM, Steve Edison <se...@gmail.com> wrote:

> I have couple of questions about HDFS federation:
>
> Can I state different block store directories for each namespace on a
> datanode ?
>

No. The main idea of federation was not to physically partition the storage
across namespace, but to use all the available storage across the
namespaces, to ensure better utilzation.


> Can I have some datanodes dedicated to a particular namespace only ?
>

As I said earlier, all the datanodes are shared across namespaces. If you
want to dedicate datanodes to a particular namespace, you might as well
create it as two separate clusters with different set of datanodes and a
separate namespace.


>
> This seems quite interesting. Way to go !
>
>
> On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan <natar033@umn.edu
> > wrote:
>
>> Hi All,
>>
>> While trying to understand federated HDFS in detail I had few doubts and
>> listing them down for your help.
>>
>>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>>    data about the blocks belonging to the files in HDFS is maintained in the
>>    main memory of the name node or it is stored on permanent storage of the
>>    namenode and is brought in the main memory on demand basis ? [Krishna]
>>    Based on my understanding, I assume the entire metadata is in main memory
>>    which is an issue by itself. Please correct me if my understanding is wrong.
>>    2. In case of* federated HDFS*, the metadata or the data about the
>>    blocks belonging to files in a particular namespace is maintained in the
>>    main memory of the namenode or it is stored on the permanent storage of the
>>    namenode and is brought in the main memory on demand basis ?
>>    3. Are the metadata information stored in separate cluster
>>    nodes(block management layer separation) as discussed in Appendix B of this
>>    document ?
>>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>>    4. I would like to know if the following proposals are already
>>    implemented in federated HDFS. (
>>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>>     slide-17)
>>    - Separation of namespace and block management layers (same as qn.3)
>>       - Partial namespace in memory for further scalability
>>       - Move partial namespace from one namenode to another
>>
>> Thanks,
>> Krishna
>>
>
>


-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HDFS / Federated HDFS - Doubts

Posted by Suresh Srinivas <su...@hortonworks.com>.
On Wed, Oct 16, 2013 at 9:22 AM, Steve Edison <se...@gmail.com> wrote:

> I have couple of questions about HDFS federation:
>
> Can I state different block store directories for each namespace on a
> datanode ?
>

No. The main idea of federation was not to physically partition the storage
across namespace, but to use all the available storage across the
namespaces, to ensure better utilzation.


> Can I have some datanodes dedicated to a particular namespace only ?
>

As I said earlier, all the datanodes are shared across namespaces. If you
want to dedicate datanodes to a particular namespace, you might as well
create it as two separate clusters with different set of datanodes and a
separate namespace.


>
> This seems quite interesting. Way to go !
>
>
> On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan <natar033@umn.edu
> > wrote:
>
>> Hi All,
>>
>> While trying to understand federated HDFS in detail I had few doubts and
>> listing them down for your help.
>>
>>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>>    data about the blocks belonging to the files in HDFS is maintained in the
>>    main memory of the name node or it is stored on permanent storage of the
>>    namenode and is brought in the main memory on demand basis ? [Krishna]
>>    Based on my understanding, I assume the entire metadata is in main memory
>>    which is an issue by itself. Please correct me if my understanding is wrong.
>>    2. In case of* federated HDFS*, the metadata or the data about the
>>    blocks belonging to files in a particular namespace is maintained in the
>>    main memory of the namenode or it is stored on the permanent storage of the
>>    namenode and is brought in the main memory on demand basis ?
>>    3. Are the metadata information stored in separate cluster
>>    nodes(block management layer separation) as discussed in Appendix B of this
>>    document ?
>>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>>    4. I would like to know if the following proposals are already
>>    implemented in federated HDFS. (
>>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>>     slide-17)
>>    - Separation of namespace and block management layers (same as qn.3)
>>       - Partial namespace in memory for further scalability
>>       - Move partial namespace from one namenode to another
>>
>> Thanks,
>> Krishna
>>
>
>


-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HDFS / Federated HDFS - Doubts

Posted by Suresh Srinivas <su...@hortonworks.com>.
On Wed, Oct 16, 2013 at 9:22 AM, Steve Edison <se...@gmail.com> wrote:

> I have couple of questions about HDFS federation:
>
> Can I state different block store directories for each namespace on a
> datanode ?
>

No. The main idea of federation was not to physically partition the storage
across namespace, but to use all the available storage across the
namespaces, to ensure better utilzation.


> Can I have some datanodes dedicated to a particular namespace only ?
>

As I said earlier, all the datanodes are shared across namespaces. If you
want to dedicate datanodes to a particular namespace, you might as well
create it as two separate clusters with different set of datanodes and a
separate namespace.


>
> This seems quite interesting. Way to go !
>
>
> On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan <natar033@umn.edu
> > wrote:
>
>> Hi All,
>>
>> While trying to understand federated HDFS in detail I had few doubts and
>> listing them down for your help.
>>
>>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>>    data about the blocks belonging to the files in HDFS is maintained in the
>>    main memory of the name node or it is stored on permanent storage of the
>>    namenode and is brought in the main memory on demand basis ? [Krishna]
>>    Based on my understanding, I assume the entire metadata is in main memory
>>    which is an issue by itself. Please correct me if my understanding is wrong.
>>    2. In case of* federated HDFS*, the metadata or the data about the
>>    blocks belonging to files in a particular namespace is maintained in the
>>    main memory of the namenode or it is stored on the permanent storage of the
>>    namenode and is brought in the main memory on demand basis ?
>>    3. Are the metadata information stored in separate cluster
>>    nodes(block management layer separation) as discussed in Appendix B of this
>>    document ?
>>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>>    4. I would like to know if the following proposals are already
>>    implemented in federated HDFS. (
>>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>>     slide-17)
>>    - Separation of namespace and block management layers (same as qn.3)
>>       - Partial namespace in memory for further scalability
>>       - Move partial namespace from one namenode to another
>>
>> Thanks,
>> Krishna
>>
>
>


-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HDFS / Federated HDFS - Doubts

Posted by Suresh Srinivas <su...@hortonworks.com>.
On Wed, Oct 16, 2013 at 9:22 AM, Steve Edison <se...@gmail.com> wrote:

> I have couple of questions about HDFS federation:
>
> Can I state different block store directories for each namespace on a
> datanode ?
>

No. The main idea of federation was not to physically partition the storage
across namespace, but to use all the available storage across the
namespaces, to ensure better utilzation.


> Can I have some datanodes dedicated to a particular namespace only ?
>

As I said earlier, all the datanodes are shared across namespaces. If you
want to dedicate datanodes to a particular namespace, you might as well
create it as two separate clusters with different set of datanodes and a
separate namespace.


>
> This seems quite interesting. Way to go !
>
>
> On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan <natar033@umn.edu
> > wrote:
>
>> Hi All,
>>
>> While trying to understand federated HDFS in detail I had few doubts and
>> listing them down for your help.
>>
>>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>>    data about the blocks belonging to the files in HDFS is maintained in the
>>    main memory of the name node or it is stored on permanent storage of the
>>    namenode and is brought in the main memory on demand basis ? [Krishna]
>>    Based on my understanding, I assume the entire metadata is in main memory
>>    which is an issue by itself. Please correct me if my understanding is wrong.
>>    2. In case of* federated HDFS*, the metadata or the data about the
>>    blocks belonging to files in a particular namespace is maintained in the
>>    main memory of the namenode or it is stored on the permanent storage of the
>>    namenode and is brought in the main memory on demand basis ?
>>    3. Are the metadata information stored in separate cluster
>>    nodes(block management layer separation) as discussed in Appendix B of this
>>    document ?
>>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>>    4. I would like to know if the following proposals are already
>>    implemented in federated HDFS. (
>>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>>     slide-17)
>>    - Separation of namespace and block management layers (same as qn.3)
>>       - Partial namespace in memory for further scalability
>>       - Move partial namespace from one namenode to another
>>
>> Thanks,
>> Krishna
>>
>
>


-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: HDFS / Federated HDFS - Doubts

Posted by Steve Edison <se...@gmail.com>.
I have couple of questions about HDFS federation:

Can I state different block store directories for each namespace on a
datanode ?
Can I have some datanodes dedicated to a particular namespace only ?

This seems quite interesting. Way to go !


On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan
<na...@umn.edu>wrote:

> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts and
> listing them down for your help.
>
>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>    data about the blocks belonging to the files in HDFS is maintained in the
>    main memory of the name node or it is stored on permanent storage of the
>    namenode and is brought in the main memory on demand basis ? [Krishna]
>    Based on my understanding, I assume the entire metadata is in main memory
>    which is an issue by itself. Please correct me if my understanding is wrong.
>    2. In case of* federated HDFS*, the metadata or the data about the
>    blocks belonging to files in a particular namespace is maintained in the
>    main memory of the namenode or it is stored on the permanent storage of the
>    namenode and is brought in the main memory on demand basis ?
>    3. Are the metadata information stored in separate cluster nodes(block
>    management layer separation) as discussed in Appendix B of this document ?
>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>    4. I would like to know if the following proposals are already
>    implemented in federated HDFS. (
>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>     slide-17)
>    - Separation of namespace and block management layers (same as qn.3)
>       - Partial namespace in memory for further scalability
>       - Move partial namespace from one namenode to another
>
> Thanks,
> Krishna
>

Re: HDFS / Federated HDFS - Doubts

Posted by Chris Mawata <ch...@gmail.com>.
One more thing, Krishna,  when using JounalNodes as opposed to the 
native file system for the metadata storage  you do get replication.
Chris


On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts 
> and listing them down for your help.
>
>  1. In case of *_HDFS(without HDFS federation)_*, the metadata or the
>     data about the blocks belonging to the files in HDFS is maintained
>     in the main memory of the name node or it is stored on permanent
>     storage of the namenode and is brought in the main memory on
>     demand basis ?[Krishna] Based on my understanding, I assume the
>     entire metadata is in main memory which is an issue by itself.
>     Please correct me if my understanding is wrong.
>  2. In case of*_federated HDFS_*, the metadata or the data about the
>     blocks belonging to files in a particular namespace is maintained
>     in the main memory of the namenode or it is stored on the
>     permanent storage of the namenode and is brought in the main
>     memory on demand basis ?
>  3. Are the metadata information stored in separate cluster
>     nodes(block management layer separation) as discussed in Appendix
>     B of this document
>     ?https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>  4. I would like to know if the following proposals are already
>     implemented in federated HDFS.
>     (http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17)
>       * Separation of namespace and block management layers (same as qn.3)
>       * Partial namespace in memory for further scalability
>       * Move partial namespace from one namenode to another
>
> Thanks,
> Krishna


Re: HDFS / Federated HDFS - Doubts

Posted by Krishna Kumaar Natarajan <na...@umn.edu>.
Thanks Chris.
Hope someone answers/give pointer to get clear idea about question4.

Regards,
Krishna


On Wed, Oct 2, 2013 at 1:41 PM, Chris Mawata <ch...@gmail.com> wrote:

>  Don't know about question 4 but for the first three -- the metadata is
> in the memory of the namenode at runtime but is also persisted to disk
> (otherwise it would be lost if you shut down and re-start the namenode).
> The copy persisted to disk is on the native file system (not HDFS) and no
> is not automatically replicated. You have to protect your cluster by
> backing it up. You not get two issues:
> 1. The number of files you can store is limited by the amount of memory on
> the namenode
> 2. Since all access to files starts with getting the metadata the network
> I/O lf the namenode is a possible limit
>
> Those two issues are solved by splitting the duty across 2 namenodes hence
> the advantage of federation.
>
> [Hopefully someone who knows will tell you about question 4]
>
> Cheers
> Chris
>
>
> On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
>
>  Hi All,
>
>  While trying to understand federated HDFS in detail I had few doubts and
> listing them down for your help.
>
>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>    data about the blocks belonging to the files in HDFS is maintained in the
>    main memory of the name node or it is stored on permanent storage of the
>    namenode and is brought in the main memory on demand basis ? [Krishna]
>    Based on my understanding, I assume the entire metadata is in main memory
>    which is an issue by itself. Please correct me if my understanding is wrong.
>    2. In case of* federated HDFS*, the metadata or the data about the
>    blocks belonging to files in a particular namespace is maintained in the
>    main memory of the namenode or it is stored on the permanent storage of the
>    namenode and is brought in the main memory on demand basis ?
>    3. Are the metadata information stored in separate cluster nodes(block
>    management layer separation) as discussed in Appendix B of this document ?
>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>    4. I would like to know if the following proposals are already
>    implemented in federated HDFS. (
>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>     slide-17)
>     - Separation of namespace and block management layers (same as qn.3)
>       - Partial namespace in memory for further scalability
>       - Move partial namespace from one namenode to another
>
> Thanks,
> Krishna
>
>
>

Re: HDFS / Federated HDFS - Doubts

Posted by Krishna Kumaar Natarajan <na...@umn.edu>.
Thanks Chris.
Hope someone answers/give pointer to get clear idea about question4.

Regards,
Krishna


On Wed, Oct 2, 2013 at 1:41 PM, Chris Mawata <ch...@gmail.com> wrote:

>  Don't know about question 4 but for the first three -- the metadata is
> in the memory of the namenode at runtime but is also persisted to disk
> (otherwise it would be lost if you shut down and re-start the namenode).
> The copy persisted to disk is on the native file system (not HDFS) and no
> is not automatically replicated. You have to protect your cluster by
> backing it up. You not get two issues:
> 1. The number of files you can store is limited by the amount of memory on
> the namenode
> 2. Since all access to files starts with getting the metadata the network
> I/O lf the namenode is a possible limit
>
> Those two issues are solved by splitting the duty across 2 namenodes hence
> the advantage of federation.
>
> [Hopefully someone who knows will tell you about question 4]
>
> Cheers
> Chris
>
>
> On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
>
>  Hi All,
>
>  While trying to understand federated HDFS in detail I had few doubts and
> listing them down for your help.
>
>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>    data about the blocks belonging to the files in HDFS is maintained in the
>    main memory of the name node or it is stored on permanent storage of the
>    namenode and is brought in the main memory on demand basis ? [Krishna]
>    Based on my understanding, I assume the entire metadata is in main memory
>    which is an issue by itself. Please correct me if my understanding is wrong.
>    2. In case of* federated HDFS*, the metadata or the data about the
>    blocks belonging to files in a particular namespace is maintained in the
>    main memory of the namenode or it is stored on the permanent storage of the
>    namenode and is brought in the main memory on demand basis ?
>    3. Are the metadata information stored in separate cluster nodes(block
>    management layer separation) as discussed in Appendix B of this document ?
>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>    4. I would like to know if the following proposals are already
>    implemented in federated HDFS. (
>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>     slide-17)
>     - Separation of namespace and block management layers (same as qn.3)
>       - Partial namespace in memory for further scalability
>       - Move partial namespace from one namenode to another
>
> Thanks,
> Krishna
>
>
>

Re: HDFS / Federated HDFS - Doubts

Posted by Krishna Kumaar Natarajan <na...@umn.edu>.
Thanks Chris.
Hope someone answers/give pointer to get clear idea about question4.

Regards,
Krishna


On Wed, Oct 2, 2013 at 1:41 PM, Chris Mawata <ch...@gmail.com> wrote:

>  Don't know about question 4 but for the first three -- the metadata is
> in the memory of the namenode at runtime but is also persisted to disk
> (otherwise it would be lost if you shut down and re-start the namenode).
> The copy persisted to disk is on the native file system (not HDFS) and no
> is not automatically replicated. You have to protect your cluster by
> backing it up. You not get two issues:
> 1. The number of files you can store is limited by the amount of memory on
> the namenode
> 2. Since all access to files starts with getting the metadata the network
> I/O lf the namenode is a possible limit
>
> Those two issues are solved by splitting the duty across 2 namenodes hence
> the advantage of federation.
>
> [Hopefully someone who knows will tell you about question 4]
>
> Cheers
> Chris
>
>
> On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
>
>  Hi All,
>
>  While trying to understand federated HDFS in detail I had few doubts and
> listing them down for your help.
>
>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>    data about the blocks belonging to the files in HDFS is maintained in the
>    main memory of the name node or it is stored on permanent storage of the
>    namenode and is brought in the main memory on demand basis ? [Krishna]
>    Based on my understanding, I assume the entire metadata is in main memory
>    which is an issue by itself. Please correct me if my understanding is wrong.
>    2. In case of* federated HDFS*, the metadata or the data about the
>    blocks belonging to files in a particular namespace is maintained in the
>    main memory of the namenode or it is stored on the permanent storage of the
>    namenode and is brought in the main memory on demand basis ?
>    3. Are the metadata information stored in separate cluster nodes(block
>    management layer separation) as discussed in Appendix B of this document ?
>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>    4. I would like to know if the following proposals are already
>    implemented in federated HDFS. (
>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>     slide-17)
>     - Separation of namespace and block management layers (same as qn.3)
>       - Partial namespace in memory for further scalability
>       - Move partial namespace from one namenode to another
>
> Thanks,
> Krishna
>
>
>

Re: HDFS / Federated HDFS - Doubts

Posted by Krishna Kumaar Natarajan <na...@umn.edu>.
Thanks Chris.
Hope someone answers/give pointer to get clear idea about question4.

Regards,
Krishna


On Wed, Oct 2, 2013 at 1:41 PM, Chris Mawata <ch...@gmail.com> wrote:

>  Don't know about question 4 but for the first three -- the metadata is
> in the memory of the namenode at runtime but is also persisted to disk
> (otherwise it would be lost if you shut down and re-start the namenode).
> The copy persisted to disk is on the native file system (not HDFS) and no
> is not automatically replicated. You have to protect your cluster by
> backing it up. You not get two issues:
> 1. The number of files you can store is limited by the amount of memory on
> the namenode
> 2. Since all access to files starts with getting the metadata the network
> I/O lf the namenode is a possible limit
>
> Those two issues are solved by splitting the duty across 2 namenodes hence
> the advantage of federation.
>
> [Hopefully someone who knows will tell you about question 4]
>
> Cheers
> Chris
>
>
> On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
>
>  Hi All,
>
>  While trying to understand federated HDFS in detail I had few doubts and
> listing them down for your help.
>
>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>    data about the blocks belonging to the files in HDFS is maintained in the
>    main memory of the name node or it is stored on permanent storage of the
>    namenode and is brought in the main memory on demand basis ? [Krishna]
>    Based on my understanding, I assume the entire metadata is in main memory
>    which is an issue by itself. Please correct me if my understanding is wrong.
>    2. In case of* federated HDFS*, the metadata or the data about the
>    blocks belonging to files in a particular namespace is maintained in the
>    main memory of the namenode or it is stored on the permanent storage of the
>    namenode and is brought in the main memory on demand basis ?
>    3. Are the metadata information stored in separate cluster nodes(block
>    management layer separation) as discussed in Appendix B of this document ?
>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>    4. I would like to know if the following proposals are already
>    implemented in federated HDFS. (
>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>     slide-17)
>     - Separation of namespace and block management layers (same as qn.3)
>       - Partial namespace in memory for further scalability
>       - Move partial namespace from one namenode to another
>
> Thanks,
> Krishna
>
>
>

Re: HDFS / Federated HDFS - Doubts

Posted by Chris Mawata <ch...@gmail.com>.
Don't know about question 4 but for the first three -- the metadata is 
in the memory of the namenode at runtime but is also persisted to disk 
(otherwise it would be lost if you shut down and re-start the namenode). 
The copy persisted to disk is on the native file system (not HDFS) and 
no is not automatically replicated. You have to protect your cluster by 
backing it up. You not get two issues:
1. The number of files you can store is limited by the amount of memory 
on the namenode
2. Since all access to files starts with getting the metadata the 
network I/O lf the namenode is a possible limit

Those two issues are solved by splitting the duty across 2 namenodes 
hence the advantage of federation.

[Hopefully someone who knows will tell you about question 4]

Cheers
Chris

On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts 
> and listing them down for your help.
>
>  1. In case of *_HDFS(without HDFS federation)_*, the metadata or the
>     data about the blocks belonging to the files in HDFS is maintained
>     in the main memory of the name node or it is stored on permanent
>     storage of the namenode and is brought in the main memory on
>     demand basis ?[Krishna] Based on my understanding, I assume the
>     entire metadata is in main memory which is an issue by itself.
>     Please correct me if my understanding is wrong.
>  2. In case of*_federated HDFS_*, the metadata or the data about the
>     blocks belonging to files in a particular namespace is maintained
>     in the main memory of the namenode or it is stored on the
>     permanent storage of the namenode and is brought in the main
>     memory on demand basis ?
>  3. Are the metadata information stored in separate cluster
>     nodes(block management layer separation) as discussed in Appendix
>     B of this document
>     ?https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>  4. I would like to know if the following proposals are already
>     implemented in federated HDFS.
>     (http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17)
>       * Separation of namespace and block management layers (same as qn.3)
>       * Partial namespace in memory for further scalability
>       * Move partial namespace from one namenode to another
>
> Thanks,
> Krishna


Re: HDFS / Federated HDFS - Doubts

Posted by Chris Mawata <ch...@gmail.com>.
Don't know about question 4 but for the first three -- the metadata is 
in the memory of the namenode at runtime but is also persisted to disk 
(otherwise it would be lost if you shut down and re-start the namenode). 
The copy persisted to disk is on the native file system (not HDFS) and 
no is not automatically replicated. You have to protect your cluster by 
backing it up. You not get two issues:
1. The number of files you can store is limited by the amount of memory 
on the namenode
2. Since all access to files starts with getting the metadata the 
network I/O lf the namenode is a possible limit

Those two issues are solved by splitting the duty across 2 namenodes 
hence the advantage of federation.

[Hopefully someone who knows will tell you about question 4]

Cheers
Chris

On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts 
> and listing them down for your help.
>
>  1. In case of *_HDFS(without HDFS federation)_*, the metadata or the
>     data about the blocks belonging to the files in HDFS is maintained
>     in the main memory of the name node or it is stored on permanent
>     storage of the namenode and is brought in the main memory on
>     demand basis ?[Krishna] Based on my understanding, I assume the
>     entire metadata is in main memory which is an issue by itself.
>     Please correct me if my understanding is wrong.
>  2. In case of*_federated HDFS_*, the metadata or the data about the
>     blocks belonging to files in a particular namespace is maintained
>     in the main memory of the namenode or it is stored on the
>     permanent storage of the namenode and is brought in the main
>     memory on demand basis ?
>  3. Are the metadata information stored in separate cluster
>     nodes(block management layer separation) as discussed in Appendix
>     B of this document
>     ?https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>  4. I would like to know if the following proposals are already
>     implemented in federated HDFS.
>     (http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17)
>       * Separation of namespace and block management layers (same as qn.3)
>       * Partial namespace in memory for further scalability
>       * Move partial namespace from one namenode to another
>
> Thanks,
> Krishna


Re: HDFS / Federated HDFS - Doubts

Posted by Steve Edison <se...@gmail.com>.
I have couple of questions about HDFS federation:

Can I state different block store directories for each namespace on a
datanode ?
Can I have some datanodes dedicated to a particular namespace only ?

This seems quite interesting. Way to go !


On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan
<na...@umn.edu>wrote:

> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts and
> listing them down for your help.
>
>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>    data about the blocks belonging to the files in HDFS is maintained in the
>    main memory of the name node or it is stored on permanent storage of the
>    namenode and is brought in the main memory on demand basis ? [Krishna]
>    Based on my understanding, I assume the entire metadata is in main memory
>    which is an issue by itself. Please correct me if my understanding is wrong.
>    2. In case of* federated HDFS*, the metadata or the data about the
>    blocks belonging to files in a particular namespace is maintained in the
>    main memory of the namenode or it is stored on the permanent storage of the
>    namenode and is brought in the main memory on demand basis ?
>    3. Are the metadata information stored in separate cluster nodes(block
>    management layer separation) as discussed in Appendix B of this document ?
>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>    4. I would like to know if the following proposals are already
>    implemented in federated HDFS. (
>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>     slide-17)
>    - Separation of namespace and block management layers (same as qn.3)
>       - Partial namespace in memory for further scalability
>       - Move partial namespace from one namenode to another
>
> Thanks,
> Krishna
>

Re: HDFS / Federated HDFS - Doubts

Posted by Chris Mawata <ch...@gmail.com>.
One more thing, Krishna,  when using JounalNodes as opposed to the 
native file system for the metadata storage  you do get replication.
Chris


On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts 
> and listing them down for your help.
>
>  1. In case of *_HDFS(without HDFS federation)_*, the metadata or the
>     data about the blocks belonging to the files in HDFS is maintained
>     in the main memory of the name node or it is stored on permanent
>     storage of the namenode and is brought in the main memory on
>     demand basis ?[Krishna] Based on my understanding, I assume the
>     entire metadata is in main memory which is an issue by itself.
>     Please correct me if my understanding is wrong.
>  2. In case of*_federated HDFS_*, the metadata or the data about the
>     blocks belonging to files in a particular namespace is maintained
>     in the main memory of the namenode or it is stored on the
>     permanent storage of the namenode and is brought in the main
>     memory on demand basis ?
>  3. Are the metadata information stored in separate cluster
>     nodes(block management layer separation) as discussed in Appendix
>     B of this document
>     ?https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>  4. I would like to know if the following proposals are already
>     implemented in federated HDFS.
>     (http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17)
>       * Separation of namespace and block management layers (same as qn.3)
>       * Partial namespace in memory for further scalability
>       * Move partial namespace from one namenode to another
>
> Thanks,
> Krishna


Re: HDFS / Federated HDFS - Doubts

Posted by Steve Edison <se...@gmail.com>.
I have couple of questions about HDFS federation:

Can I state different block store directories for each namespace on a
datanode ?
Can I have some datanodes dedicated to a particular namespace only ?

This seems quite interesting. Way to go !


On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan
<na...@umn.edu>wrote:

> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts and
> listing them down for your help.
>
>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>    data about the blocks belonging to the files in HDFS is maintained in the
>    main memory of the name node or it is stored on permanent storage of the
>    namenode and is brought in the main memory on demand basis ? [Krishna]
>    Based on my understanding, I assume the entire metadata is in main memory
>    which is an issue by itself. Please correct me if my understanding is wrong.
>    2. In case of* federated HDFS*, the metadata or the data about the
>    blocks belonging to files in a particular namespace is maintained in the
>    main memory of the namenode or it is stored on the permanent storage of the
>    namenode and is brought in the main memory on demand basis ?
>    3. Are the metadata information stored in separate cluster nodes(block
>    management layer separation) as discussed in Appendix B of this document ?
>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>    4. I would like to know if the following proposals are already
>    implemented in federated HDFS. (
>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>     slide-17)
>    - Separation of namespace and block management layers (same as qn.3)
>       - Partial namespace in memory for further scalability
>       - Move partial namespace from one namenode to another
>
> Thanks,
> Krishna
>

Re: HDFS / Federated HDFS - Doubts

Posted by Chris Mawata <ch...@gmail.com>.
Don't know about question 4 but for the first three -- the metadata is 
in the memory of the namenode at runtime but is also persisted to disk 
(otherwise it would be lost if you shut down and re-start the namenode). 
The copy persisted to disk is on the native file system (not HDFS) and 
no is not automatically replicated. You have to protect your cluster by 
backing it up. You not get two issues:
1. The number of files you can store is limited by the amount of memory 
on the namenode
2. Since all access to files starts with getting the metadata the 
network I/O lf the namenode is a possible limit

Those two issues are solved by splitting the duty across 2 namenodes 
hence the advantage of federation.

[Hopefully someone who knows will tell you about question 4]

Cheers
Chris

On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts 
> and listing them down for your help.
>
>  1. In case of *_HDFS(without HDFS federation)_*, the metadata or the
>     data about the blocks belonging to the files in HDFS is maintained
>     in the main memory of the name node or it is stored on permanent
>     storage of the namenode and is brought in the main memory on
>     demand basis ?[Krishna] Based on my understanding, I assume the
>     entire metadata is in main memory which is an issue by itself.
>     Please correct me if my understanding is wrong.
>  2. In case of*_federated HDFS_*, the metadata or the data about the
>     blocks belonging to files in a particular namespace is maintained
>     in the main memory of the namenode or it is stored on the
>     permanent storage of the namenode and is brought in the main
>     memory on demand basis ?
>  3. Are the metadata information stored in separate cluster
>     nodes(block management layer separation) as discussed in Appendix
>     B of this document
>     ?https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>  4. I would like to know if the following proposals are already
>     implemented in federated HDFS.
>     (http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17)
>       * Separation of namespace and block management layers (same as qn.3)
>       * Partial namespace in memory for further scalability
>       * Move partial namespace from one namenode to another
>
> Thanks,
> Krishna


Re: HDFS / Federated HDFS - Doubts

Posted by Chris Mawata <ch...@gmail.com>.
Don't know about question 4 but for the first three -- the metadata is 
in the memory of the namenode at runtime but is also persisted to disk 
(otherwise it would be lost if you shut down and re-start the namenode). 
The copy persisted to disk is on the native file system (not HDFS) and 
no is not automatically replicated. You have to protect your cluster by 
backing it up. You not get two issues:
1. The number of files you can store is limited by the amount of memory 
on the namenode
2. Since all access to files starts with getting the metadata the 
network I/O lf the namenode is a possible limit

Those two issues are solved by splitting the duty across 2 namenodes 
hence the advantage of federation.

[Hopefully someone who knows will tell you about question 4]

Cheers
Chris

On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts 
> and listing them down for your help.
>
>  1. In case of *_HDFS(without HDFS federation)_*, the metadata or the
>     data about the blocks belonging to the files in HDFS is maintained
>     in the main memory of the name node or it is stored on permanent
>     storage of the namenode and is brought in the main memory on
>     demand basis ?[Krishna] Based on my understanding, I assume the
>     entire metadata is in main memory which is an issue by itself.
>     Please correct me if my understanding is wrong.
>  2. In case of*_federated HDFS_*, the metadata or the data about the
>     blocks belonging to files in a particular namespace is maintained
>     in the main memory of the namenode or it is stored on the
>     permanent storage of the namenode and is brought in the main
>     memory on demand basis ?
>  3. Are the metadata information stored in separate cluster
>     nodes(block management layer separation) as discussed in Appendix
>     B of this document
>     ?https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>  4. I would like to know if the following proposals are already
>     implemented in federated HDFS.
>     (http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17)
>       * Separation of namespace and block management layers (same as qn.3)
>       * Partial namespace in memory for further scalability
>       * Move partial namespace from one namenode to another
>
> Thanks,
> Krishna


Re: HDFS / Federated HDFS - Doubts

Posted by Steve Edison <se...@gmail.com>.
I have couple of questions about HDFS federation:

Can I state different block store directories for each namespace on a
datanode ?
Can I have some datanodes dedicated to a particular namespace only ?

This seems quite interesting. Way to go !


On Tue, Oct 1, 2013 at 9:52 PM, Krishna Kumaar Natarajan
<na...@umn.edu>wrote:

> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts and
> listing them down for your help.
>
>    1. In case of *HDFS(without HDFS federation)*, the metadata or the
>    data about the blocks belonging to the files in HDFS is maintained in the
>    main memory of the name node or it is stored on permanent storage of the
>    namenode and is brought in the main memory on demand basis ? [Krishna]
>    Based on my understanding, I assume the entire metadata is in main memory
>    which is an issue by itself. Please correct me if my understanding is wrong.
>    2. In case of* federated HDFS*, the metadata or the data about the
>    blocks belonging to files in a particular namespace is maintained in the
>    main memory of the namenode or it is stored on the permanent storage of the
>    namenode and is brought in the main memory on demand basis ?
>    3. Are the metadata information stored in separate cluster nodes(block
>    management layer separation) as discussed in Appendix B of this document ?
>    https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>    4. I would like to know if the following proposals are already
>    implemented in federated HDFS. (
>    http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability
>     slide-17)
>    - Separation of namespace and block management layers (same as qn.3)
>       - Partial namespace in memory for further scalability
>       - Move partial namespace from one namenode to another
>
> Thanks,
> Krishna
>

Re: HDFS / Federated HDFS - Doubts

Posted by Chris Mawata <ch...@gmail.com>.
One more thing, Krishna,  when using JounalNodes as opposed to the 
native file system for the metadata storage  you do get replication.
Chris


On 10/2/2013 12:52 AM, Krishna Kumaar Natarajan wrote:
> Hi All,
>
> While trying to understand federated HDFS in detail I had few doubts 
> and listing them down for your help.
>
>  1. In case of *_HDFS(without HDFS federation)_*, the metadata or the
>     data about the blocks belonging to the files in HDFS is maintained
>     in the main memory of the name node or it is stored on permanent
>     storage of the namenode and is brought in the main memory on
>     demand basis ?[Krishna] Based on my understanding, I assume the
>     entire metadata is in main memory which is an issue by itself.
>     Please correct me if my understanding is wrong.
>  2. In case of*_federated HDFS_*, the metadata or the data about the
>     blocks belonging to files in a particular namespace is maintained
>     in the main memory of the namenode or it is stored on the
>     permanent storage of the namenode and is brought in the main
>     memory on demand basis ?
>  3. Are the metadata information stored in separate cluster
>     nodes(block management layer separation) as discussed in Appendix
>     B of this document
>     ?https://issues.apache.org/jira/secure/attachment/12453067/high-level-design.pdf
>  4. I would like to know if the following proposals are already
>     implemented in federated HDFS.
>     (http://www.slideshare.net/hortonworks/hdfs-futures-namenode-federation-for-improved-efficiency-and-scalability slide-17)
>       * Separation of namespace and block management layers (same as qn.3)
>       * Partial namespace in memory for further scalability
>       * Move partial namespace from one namenode to another
>
> Thanks,
> Krishna