You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Mohammad Tariq <do...@gmail.com> on 2012/12/12 16:02:17 UTC

Sane max storage size for DN

Hello list,

          I don't know if this question makes any sense, but I would like
to ask, does it make sense to store 500TB (or more) data in a single DN?If
yes, then what should be the spec of other parameters *viz*. NN & DN RAM,
N/W etc?If no, what could be the alternative?

Many thanks.

Regards,
    Mohammad Tariq

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Thank you so much Hemanth.

Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 8:21 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> This is a dated blog post, so it would help if someone with current HDFS
> knowledge can validate it:
> http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
> .
>
> There is a bit about the RAM required for the Namenode and how to compute
> it:
>
> You can look at the 'Namespace limitations' section.
>
> Thanks
> hemanth
>
>
> On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Chris,
>>
>>      Thank you so much for the valuable insights. I was actually using
>> the same principle. I did the blunder and did the maths for entire (9*3)PB.
>>
>> Seems I am higher than you, that too without drinking ;)
>>
>> Many thanks.
>>
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:
>>
>>> Hi Mohammed,
>>>
>>> The amount of RAM on the NN is related to the number of blocks... so
>>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>>
>>> I'll probably mess this up so someone check my math:
>>>
>>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>>> Unless I missed this by an order of magnitude (entirely possible... I've
>>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>>  128G should kick it's ass; 256G seems like a waste of $$.
>>>
>>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>>> or kick me to the curb.
>>>
>>> YMMV ;)
>>>
>>>
>>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>>
>>>> Hello Michael,
>>>>
>>>>       It's an array. The actual size of the data could be somewhere
>>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>>> less as possible. Computations are not too frequent, as I have specified
>>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>>> michael_segel@hotmail.com> wrote:
>>>>
>>>>> 500 TB?
>>>>>
>>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>>> array?
>>>>>
>>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>>> lose 1 node?
>>>>>
>>>>>
>>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello list,
>>>>>
>>>>>           I don't know if this question makes any sense, but I would
>>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>>
>>>>> Many thanks.
>>>>>
>>>>> Regards,
>>>>>     Mohammad Tariq
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Thank you so much Hemanth.

Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 8:21 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> This is a dated blog post, so it would help if someone with current HDFS
> knowledge can validate it:
> http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
> .
>
> There is a bit about the RAM required for the Namenode and how to compute
> it:
>
> You can look at the 'Namespace limitations' section.
>
> Thanks
> hemanth
>
>
> On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Chris,
>>
>>      Thank you so much for the valuable insights. I was actually using
>> the same principle. I did the blunder and did the maths for entire (9*3)PB.
>>
>> Seems I am higher than you, that too without drinking ;)
>>
>> Many thanks.
>>
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:
>>
>>> Hi Mohammed,
>>>
>>> The amount of RAM on the NN is related to the number of blocks... so
>>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>>
>>> I'll probably mess this up so someone check my math:
>>>
>>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>>> Unless I missed this by an order of magnitude (entirely possible... I've
>>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>>  128G should kick it's ass; 256G seems like a waste of $$.
>>>
>>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>>> or kick me to the curb.
>>>
>>> YMMV ;)
>>>
>>>
>>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>>
>>>> Hello Michael,
>>>>
>>>>       It's an array. The actual size of the data could be somewhere
>>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>>> less as possible. Computations are not too frequent, as I have specified
>>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>>> michael_segel@hotmail.com> wrote:
>>>>
>>>>> 500 TB?
>>>>>
>>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>>> array?
>>>>>
>>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>>> lose 1 node?
>>>>>
>>>>>
>>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello list,
>>>>>
>>>>>           I don't know if this question makes any sense, but I would
>>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>>
>>>>> Many thanks.
>>>>>
>>>>> Regards,
>>>>>     Mohammad Tariq
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Thank you so much Hemanth.

Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 8:21 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> This is a dated blog post, so it would help if someone with current HDFS
> knowledge can validate it:
> http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
> .
>
> There is a bit about the RAM required for the Namenode and how to compute
> it:
>
> You can look at the 'Namespace limitations' section.
>
> Thanks
> hemanth
>
>
> On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Chris,
>>
>>      Thank you so much for the valuable insights. I was actually using
>> the same principle. I did the blunder and did the maths for entire (9*3)PB.
>>
>> Seems I am higher than you, that too without drinking ;)
>>
>> Many thanks.
>>
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:
>>
>>> Hi Mohammed,
>>>
>>> The amount of RAM on the NN is related to the number of blocks... so
>>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>>
>>> I'll probably mess this up so someone check my math:
>>>
>>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>>> Unless I missed this by an order of magnitude (entirely possible... I've
>>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>>  128G should kick it's ass; 256G seems like a waste of $$.
>>>
>>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>>> or kick me to the curb.
>>>
>>> YMMV ;)
>>>
>>>
>>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>>
>>>> Hello Michael,
>>>>
>>>>       It's an array. The actual size of the data could be somewhere
>>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>>> less as possible. Computations are not too frequent, as I have specified
>>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>>> michael_segel@hotmail.com> wrote:
>>>>
>>>>> 500 TB?
>>>>>
>>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>>> array?
>>>>>
>>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>>> lose 1 node?
>>>>>
>>>>>
>>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello list,
>>>>>
>>>>>           I don't know if this question makes any sense, but I would
>>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>>
>>>>> Many thanks.
>>>>>
>>>>> Regards,
>>>>>     Mohammad Tariq
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Thank you so much Hemanth.

Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 8:21 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> This is a dated blog post, so it would help if someone with current HDFS
> knowledge can validate it:
> http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
> .
>
> There is a bit about the RAM required for the Namenode and how to compute
> it:
>
> You can look at the 'Namespace limitations' section.
>
> Thanks
> hemanth
>
>
> On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Chris,
>>
>>      Thank you so much for the valuable insights. I was actually using
>> the same principle. I did the blunder and did the maths for entire (9*3)PB.
>>
>> Seems I am higher than you, that too without drinking ;)
>>
>> Many thanks.
>>
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:
>>
>>> Hi Mohammed,
>>>
>>> The amount of RAM on the NN is related to the number of blocks... so
>>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>>
>>> I'll probably mess this up so someone check my math:
>>>
>>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>>> Unless I missed this by an order of magnitude (entirely possible... I've
>>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>>  128G should kick it's ass; 256G seems like a waste of $$.
>>>
>>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>>> or kick me to the curb.
>>>
>>> YMMV ;)
>>>
>>>
>>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>>
>>>> Hello Michael,
>>>>
>>>>       It's an array. The actual size of the data could be somewhere
>>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>>> less as possible. Computations are not too frequent, as I have specified
>>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>>> michael_segel@hotmail.com> wrote:
>>>>
>>>>> 500 TB?
>>>>>
>>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>>> array?
>>>>>
>>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>>> lose 1 node?
>>>>>
>>>>>
>>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello list,
>>>>>
>>>>>           I don't know if this question makes any sense, but I would
>>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>>
>>>>> Many thanks.
>>>>>
>>>>> Regards,
>>>>>     Mohammad Tariq
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
This is a dated blog post, so it would help if someone with current HDFS
knowledge can validate it:
http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
.

There is a bit about the RAM required for the Namenode and how to compute
it:

You can look at the 'Namespace limitations' section.

Thanks
hemanth


On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Chris,
>
>      Thank you so much for the valuable insights. I was actually using the
> same principle. I did the blunder and did the maths for entire (9*3)PB.
>
> Seems I am higher than you, that too without drinking ;)
>
> Many thanks.
>
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:
>
>> Hi Mohammed,
>>
>> The amount of RAM on the NN is related to the number of blocks... so
>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>
>> I'll probably mess this up so someone check my math:
>>
>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>> Unless I missed this by an order of magnitude (entirely possible... I've
>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>  128G should kick it's ass; 256G seems like a waste of $$.
>>
>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>> or kick me to the curb.
>>
>> YMMV ;)
>>
>>
>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>
>>> Hello Michael,
>>>
>>>       It's an array. The actual size of the data could be somewhere
>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>> less as possible. Computations are not too frequent, as I have specified
>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>
>>> Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>> michael_segel@hotmail.com> wrote:
>>>
>>>> 500 TB?
>>>>
>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>> array?
>>>>
>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>> lose 1 node?
>>>>
>>>>
>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>>>
>>>> Hello list,
>>>>
>>>>           I don't know if this question makes any sense, but I would
>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
This is a dated blog post, so it would help if someone with current HDFS
knowledge can validate it:
http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
.

There is a bit about the RAM required for the Namenode and how to compute
it:

You can look at the 'Namespace limitations' section.

Thanks
hemanth


On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Chris,
>
>      Thank you so much for the valuable insights. I was actually using the
> same principle. I did the blunder and did the maths for entire (9*3)PB.
>
> Seems I am higher than you, that too without drinking ;)
>
> Many thanks.
>
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:
>
>> Hi Mohammed,
>>
>> The amount of RAM on the NN is related to the number of blocks... so
>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>
>> I'll probably mess this up so someone check my math:
>>
>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>> Unless I missed this by an order of magnitude (entirely possible... I've
>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>  128G should kick it's ass; 256G seems like a waste of $$.
>>
>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>> or kick me to the curb.
>>
>> YMMV ;)
>>
>>
>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>
>>> Hello Michael,
>>>
>>>       It's an array. The actual size of the data could be somewhere
>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>> less as possible. Computations are not too frequent, as I have specified
>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>
>>> Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>> michael_segel@hotmail.com> wrote:
>>>
>>>> 500 TB?
>>>>
>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>> array?
>>>>
>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>> lose 1 node?
>>>>
>>>>
>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>>>
>>>> Hello list,
>>>>
>>>>           I don't know if this question makes any sense, but I would
>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
This is a dated blog post, so it would help if someone with current HDFS
knowledge can validate it:
http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
.

There is a bit about the RAM required for the Namenode and how to compute
it:

You can look at the 'Namespace limitations' section.

Thanks
hemanth


On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Chris,
>
>      Thank you so much for the valuable insights. I was actually using the
> same principle. I did the blunder and did the maths for entire (9*3)PB.
>
> Seems I am higher than you, that too without drinking ;)
>
> Many thanks.
>
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:
>
>> Hi Mohammed,
>>
>> The amount of RAM on the NN is related to the number of blocks... so
>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>
>> I'll probably mess this up so someone check my math:
>>
>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>> Unless I missed this by an order of magnitude (entirely possible... I've
>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>  128G should kick it's ass; 256G seems like a waste of $$.
>>
>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>> or kick me to the curb.
>>
>> YMMV ;)
>>
>>
>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>
>>> Hello Michael,
>>>
>>>       It's an array. The actual size of the data could be somewhere
>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>> less as possible. Computations are not too frequent, as I have specified
>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>
>>> Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>> michael_segel@hotmail.com> wrote:
>>>
>>>> 500 TB?
>>>>
>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>> array?
>>>>
>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>> lose 1 node?
>>>>
>>>>
>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>>>
>>>> Hello list,
>>>>
>>>>           I don't know if this question makes any sense, but I would
>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
This is a dated blog post, so it would help if someone with current HDFS
knowledge can validate it:
http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
.

There is a bit about the RAM required for the Namenode and how to compute
it:

You can look at the 'Namespace limitations' section.

Thanks
hemanth


On Thu, Dec 13, 2012 at 10:57 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Chris,
>
>      Thank you so much for the valuable insights. I was actually using the
> same principle. I did the blunder and did the maths for entire (9*3)PB.
>
> Seems I am higher than you, that too without drinking ;)
>
> Many thanks.
>
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:
>
>> Hi Mohammed,
>>
>> The amount of RAM on the NN is related to the number of blocks... so
>> let's do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>>
>> I'll probably mess this up so someone check my math:
>>
>> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>>  according to kcalc that's 75,497,472 of 128 MB Blocks.
>> Unless I missed this by an order of magnitude (entirely possible... I've
>> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>>  128G should kick it's ass; 256G seems like a waste of $$.
>>
>> Hmm... That makes the NN sound extremely efficient.  Someone validate me
>> or kick me to the curb.
>>
>> YMMV ;)
>>
>>
>> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>
>>> Hello Michael,
>>>
>>>       It's an array. The actual size of the data could be somewhere
>>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>>> less as possible. Computations are not too frequent, as I have specified
>>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>>
>>> Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>>> michael_segel@hotmail.com> wrote:
>>>
>>>> 500 TB?
>>>>
>>>> How many nodes in the cluster? Is this attached storage or is it in an
>>>> array?
>>>>
>>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>>> lose 1 node?
>>>>
>>>>
>>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>>>
>>>> Hello list,
>>>>
>>>>           I don't know if this question makes any sense, but I would
>>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>>> DN?If yes, then what should be the spec of other parameters *viz*. NN
>>>> & DN RAM, N/W etc?If no, what could be the alternative?
>>>>
>>>> Many thanks.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Chris,

     Thank you so much for the valuable insights. I was actually using the
same principle. I did the blunder and did the maths for entire (9*3)PB.

Seems I am higher than you, that too without drinking ;)

Many thanks.


Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:

> Hi Mohammed,
>
> The amount of RAM on the NN is related to the number of blocks... so let's
> do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>
> I'll probably mess this up so someone check my math:
>
> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>  according to kcalc that's 75,497,472 of 128 MB Blocks.
> Unless I missed this by an order of magnitude (entirely possible... I've
> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>  128G should kick it's ass; 256G seems like a waste of $$.
>
> Hmm... That makes the NN sound extremely efficient.  Someone validate me
> or kick me to the curb.
>
> YMMV ;)
>
>
> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Michael,
>>
>>       It's an array. The actual size of the data could be somewhere
>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>> less as possible. Computations are not too frequent, as I have specified
>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>> michael_segel@hotmail.com> wrote:
>>
>>> 500 TB?
>>>
>>> How many nodes in the cluster? Is this attached storage or is it in an
>>> array?
>>>
>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>> lose 1 node?
>>>
>>>
>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>>
>>> Hello list,
>>>
>>>           I don't know if this question makes any sense, but I would
>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>> DN?If yes, then what should be the spec of other parameters *viz*. NN &
>>> DN RAM, N/W etc?If no, what could be the alternative?
>>>
>>> Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Chris,

     Thank you so much for the valuable insights. I was actually using the
same principle. I did the blunder and did the maths for entire (9*3)PB.

Seems I am higher than you, that too without drinking ;)

Many thanks.


Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:

> Hi Mohammed,
>
> The amount of RAM on the NN is related to the number of blocks... so let's
> do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>
> I'll probably mess this up so someone check my math:
>
> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>  according to kcalc that's 75,497,472 of 128 MB Blocks.
> Unless I missed this by an order of magnitude (entirely possible... I've
> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>  128G should kick it's ass; 256G seems like a waste of $$.
>
> Hmm... That makes the NN sound extremely efficient.  Someone validate me
> or kick me to the curb.
>
> YMMV ;)
>
>
> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Michael,
>>
>>       It's an array. The actual size of the data could be somewhere
>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>> less as possible. Computations are not too frequent, as I have specified
>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>> michael_segel@hotmail.com> wrote:
>>
>>> 500 TB?
>>>
>>> How many nodes in the cluster? Is this attached storage or is it in an
>>> array?
>>>
>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>> lose 1 node?
>>>
>>>
>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>>
>>> Hello list,
>>>
>>>           I don't know if this question makes any sense, but I would
>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>> DN?If yes, then what should be the spec of other parameters *viz*. NN &
>>> DN RAM, N/W etc?If no, what could be the alternative?
>>>
>>> Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Chris,

     Thank you so much for the valuable insights. I was actually using the
same principle. I did the blunder and did the maths for entire (9*3)PB.

Seems I am higher than you, that too without drinking ;)

Many thanks.


Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:

> Hi Mohammed,
>
> The amount of RAM on the NN is related to the number of blocks... so let's
> do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>
> I'll probably mess this up so someone check my math:
>
> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>  according to kcalc that's 75,497,472 of 128 MB Blocks.
> Unless I missed this by an order of magnitude (entirely possible... I've
> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>  128G should kick it's ass; 256G seems like a waste of $$.
>
> Hmm... That makes the NN sound extremely efficient.  Someone validate me
> or kick me to the curb.
>
> YMMV ;)
>
>
> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Michael,
>>
>>       It's an array. The actual size of the data could be somewhere
>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>> less as possible. Computations are not too frequent, as I have specified
>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>> michael_segel@hotmail.com> wrote:
>>
>>> 500 TB?
>>>
>>> How many nodes in the cluster? Is this attached storage or is it in an
>>> array?
>>>
>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>> lose 1 node?
>>>
>>>
>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>>
>>> Hello list,
>>>
>>>           I don't know if this question makes any sense, but I would
>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>> DN?If yes, then what should be the spec of other parameters *viz*. NN &
>>> DN RAM, N/W etc?If no, what could be the alternative?
>>>
>>> Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Chris,

     Thank you so much for the valuable insights. I was actually using the
same principle. I did the blunder and did the maths for entire (9*3)PB.

Seems I am higher than you, that too without drinking ;)

Many thanks.


Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 10:38 AM, Chris Embree <ce...@gmail.com> wrote:

> Hi Mohammed,
>
> The amount of RAM on the NN is related to the number of blocks... so let's
> do some math. :)  1G of RAM to 1M blocks seems to be the general rule.
>
> I'll probably mess this up so someone check my math:
>
> 9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
>  according to kcalc that's 75,497,472 of 128 MB Blocks.
> Unless I missed this by an order of magnitude (entirely possible... I've
> been drinking since 6), that sound like 76G of RAM (above OS requirements).
>  128G should kick it's ass; 256G seems like a waste of $$.
>
> Hmm... That makes the NN sound extremely efficient.  Someone validate me
> or kick me to the curb.
>
> YMMV ;)
>
>
> On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello Michael,
>>
>>       It's an array. The actual size of the data could be somewhere
>> around 9PB(exclusive of replication) and we want to keep the no of DNs as
>> less as possible. Computations are not too frequent, as I have specified
>> earlier. If I have 500TB in 1 DN, the no of DNs would be around 49. And, if
>> the block size is 128MB, the no of blocks would be 201326592. So, I was
>> thinking of having 256GB RAM for the NN. Does this make sense to you?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <
>> michael_segel@hotmail.com> wrote:
>>
>>> 500 TB?
>>>
>>> How many nodes in the cluster? Is this attached storage or is it in an
>>> array?
>>>
>>> I mean if you have 4 nodes for a total of 2PB, what happens when you
>>> lose 1 node?
>>>
>>>
>>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>>
>>> Hello list,
>>>
>>>           I don't know if this question makes any sense, but I would
>>> like to ask, does it make sense to store 500TB (or more) data in a single
>>> DN?If yes, then what should be the spec of other parameters *viz*. NN &
>>> DN RAM, N/W etc?If no, what could be the alternative?
>>>
>>> Many thanks.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>>
>>
>

Re: Sane max storage size for DN

Posted by Chris Embree <ce...@gmail.com>.
Hi Mohammed,

The amount of RAM on the NN is related to the number of blocks... so let's
do some math. :)  1G of RAM to 1M blocks seems to be the general rule.

I'll probably mess this up so someone check my math:

9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
 according to kcalc that's 75,497,472 of 128 MB Blocks.
Unless I missed this by an order of magnitude (entirely possible... I've
been drinking since 6), that sound like 76G of RAM (above OS requirements).
 128G should kick it's ass; 256G seems like a waste of $$.

Hmm... That makes the NN sound extremely efficient.  Someone validate me or
kick me to the curb.

YMMV ;)

On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Michael,
>
>       It's an array. The actual size of the data could be somewhere around
> 9PB(exclusive of replication) and we want to keep the no of DNs as less as
> possible. Computations are not too frequent, as I have specified earlier.
> If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the
> block size is 128MB, the no of blocks would be 201326592. So, I was
> thinking of having 256GB RAM for the NN. Does this make sense to you?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <michael_segel@hotmail.com
> > wrote:
>
>> 500 TB?
>>
>> How many nodes in the cluster? Is this attached storage or is it in an
>> array?
>>
>> I mean if you have 4 nodes for a total of 2PB, what happens when you lose
>> 1 node?
>>
>>
>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>
>> Hello list,
>>
>>           I don't know if this question makes any sense, but I would like
>> to ask, does it make sense to store 500TB (or more) data in a single DN?If
>> yes, then what should be the spec of other parameters *viz*. NN & DN
>> RAM, N/W etc?If no, what could be the alternative?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>>
>

Re: Sane max storage size for DN

Posted by Chris Embree <ce...@gmail.com>.
Hi Mohammed,

The amount of RAM on the NN is related to the number of blocks... so let's
do some math. :)  1G of RAM to 1M blocks seems to be the general rule.

I'll probably mess this up so someone check my math:

9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
 according to kcalc that's 75,497,472 of 128 MB Blocks.
Unless I missed this by an order of magnitude (entirely possible... I've
been drinking since 6), that sound like 76G of RAM (above OS requirements).
 128G should kick it's ass; 256G seems like a waste of $$.

Hmm... That makes the NN sound extremely efficient.  Someone validate me or
kick me to the curb.

YMMV ;)

On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Michael,
>
>       It's an array. The actual size of the data could be somewhere around
> 9PB(exclusive of replication) and we want to keep the no of DNs as less as
> possible. Computations are not too frequent, as I have specified earlier.
> If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the
> block size is 128MB, the no of blocks would be 201326592. So, I was
> thinking of having 256GB RAM for the NN. Does this make sense to you?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <michael_segel@hotmail.com
> > wrote:
>
>> 500 TB?
>>
>> How many nodes in the cluster? Is this attached storage or is it in an
>> array?
>>
>> I mean if you have 4 nodes for a total of 2PB, what happens when you lose
>> 1 node?
>>
>>
>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>
>> Hello list,
>>
>>           I don't know if this question makes any sense, but I would like
>> to ask, does it make sense to store 500TB (or more) data in a single DN?If
>> yes, then what should be the spec of other parameters *viz*. NN & DN
>> RAM, N/W etc?If no, what could be the alternative?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>>
>

Re: Sane max storage size for DN

Posted by Chris Embree <ce...@gmail.com>.
Hi Mohammed,

The amount of RAM on the NN is related to the number of blocks... so let's
do some math. :)  1G of RAM to 1M blocks seems to be the general rule.

I'll probably mess this up so someone check my math:

9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
 according to kcalc that's 75,497,472 of 128 MB Blocks.
Unless I missed this by an order of magnitude (entirely possible... I've
been drinking since 6), that sound like 76G of RAM (above OS requirements).
 128G should kick it's ass; 256G seems like a waste of $$.

Hmm... That makes the NN sound extremely efficient.  Someone validate me or
kick me to the curb.

YMMV ;)

On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Michael,
>
>       It's an array. The actual size of the data could be somewhere around
> 9PB(exclusive of replication) and we want to keep the no of DNs as less as
> possible. Computations are not too frequent, as I have specified earlier.
> If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the
> block size is 128MB, the no of blocks would be 201326592. So, I was
> thinking of having 256GB RAM for the NN. Does this make sense to you?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <michael_segel@hotmail.com
> > wrote:
>
>> 500 TB?
>>
>> How many nodes in the cluster? Is this attached storage or is it in an
>> array?
>>
>> I mean if you have 4 nodes for a total of 2PB, what happens when you lose
>> 1 node?
>>
>>
>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>
>> Hello list,
>>
>>           I don't know if this question makes any sense, but I would like
>> to ask, does it make sense to store 500TB (or more) data in a single DN?If
>> yes, then what should be the spec of other parameters *viz*. NN & DN
>> RAM, N/W etc?If no, what could be the alternative?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>>
>

Re: Sane max storage size for DN

Posted by Chris Embree <ce...@gmail.com>.
Hi Mohammed,

The amount of RAM on the NN is related to the number of blocks... so let's
do some math. :)  1G of RAM to 1M blocks seems to be the general rule.

I'll probably mess this up so someone check my math:

9 PT ~ 9,216 TB ~ 9,437,184 GB of data.  Let's put that in 128MB blocks:
 according to kcalc that's 75,497,472 of 128 MB Blocks.
Unless I missed this by an order of magnitude (entirely possible... I've
been drinking since 6), that sound like 76G of RAM (above OS requirements).
 128G should kick it's ass; 256G seems like a waste of $$.

Hmm... That makes the NN sound extremely efficient.  Someone validate me or
kick me to the curb.

YMMV ;)

On Wed, Dec 12, 2012 at 10:52 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Michael,
>
>       It's an array. The actual size of the data could be somewhere around
> 9PB(exclusive of replication) and we want to keep the no of DNs as less as
> possible. Computations are not too frequent, as I have specified earlier.
> If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the
> block size is 128MB, the no of blocks would be 201326592. So, I was
> thinking of having 256GB RAM for the NN. Does this make sense to you?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel <michael_segel@hotmail.com
> > wrote:
>
>> 500 TB?
>>
>> How many nodes in the cluster? Is this attached storage or is it in an
>> array?
>>
>> I mean if you have 4 nodes for a total of 2PB, what happens when you lose
>> 1 node?
>>
>>
>> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>
>> Hello list,
>>
>>           I don't know if this question makes any sense, but I would like
>> to ask, does it make sense to store 500TB (or more) data in a single DN?If
>> yes, then what should be the spec of other parameters *viz*. NN & DN
>> RAM, N/W etc?If no, what could be the alternative?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Michael,

      It's an array. The actual size of the data could be somewhere around
9PB(exclusive of replication) and we want to keep the no of DNs as less as
possible. Computations are not too frequent, as I have specified earlier.
If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the
block size is 128MB, the no of blocks would be 201326592. So, I was
thinking of having 256GB RAM for the NN. Does this make sense to you?

Many thanks.

Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel
<mi...@hotmail.com>wrote:

> 500 TB?
>
> How many nodes in the cluster? Is this attached storage or is it in an
> array?
>
> I mean if you have 4 nodes for a total of 2PB, what happens when you lose
> 1 node?
>
>
> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>
> Hello list,
>
>           I don't know if this question makes any sense, but I would like
> to ask, does it make sense to store 500TB (or more) data in a single DN?If
> yes, then what should be the spec of other parameters *viz*. NN & DN RAM,
> N/W etc?If no, what could be the alternative?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Michael,

      It's an array. The actual size of the data could be somewhere around
9PB(exclusive of replication) and we want to keep the no of DNs as less as
possible. Computations are not too frequent, as I have specified earlier.
If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the
block size is 128MB, the no of blocks would be 201326592. So, I was
thinking of having 256GB RAM for the NN. Does this make sense to you?

Many thanks.

Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel
<mi...@hotmail.com>wrote:

> 500 TB?
>
> How many nodes in the cluster? Is this attached storage or is it in an
> array?
>
> I mean if you have 4 nodes for a total of 2PB, what happens when you lose
> 1 node?
>
>
> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>
> Hello list,
>
>           I don't know if this question makes any sense, but I would like
> to ask, does it make sense to store 500TB (or more) data in a single DN?If
> yes, then what should be the spec of other parameters *viz*. NN & DN RAM,
> N/W etc?If no, what could be the alternative?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Michael,

      It's an array. The actual size of the data could be somewhere around
9PB(exclusive of replication) and we want to keep the no of DNs as less as
possible. Computations are not too frequent, as I have specified earlier.
If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the
block size is 128MB, the no of blocks would be 201326592. So, I was
thinking of having 256GB RAM for the NN. Does this make sense to you?

Many thanks.

Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel
<mi...@hotmail.com>wrote:

> 500 TB?
>
> How many nodes in the cluster? Is this attached storage or is it in an
> array?
>
> I mean if you have 4 nodes for a total of 2PB, what happens when you lose
> 1 node?
>
>
> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>
> Hello list,
>
>           I don't know if this question makes any sense, but I would like
> to ask, does it make sense to store 500TB (or more) data in a single DN?If
> yes, then what should be the spec of other parameters *viz*. NN & DN RAM,
> N/W etc?If no, what could be the alternative?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Michael,

      It's an array. The actual size of the data could be somewhere around
9PB(exclusive of replication) and we want to keep the no of DNs as less as
possible. Computations are not too frequent, as I have specified earlier.
If I have 500TB in 1 DN, the no of DNs would be around 49. And, if the
block size is 128MB, the no of blocks would be 201326592. So, I was
thinking of having 256GB RAM for the NN. Does this make sense to you?

Many thanks.

Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 12:28 AM, Michael Segel
<mi...@hotmail.com>wrote:

> 500 TB?
>
> How many nodes in the cluster? Is this attached storage or is it in an
> array?
>
> I mean if you have 4 nodes for a total of 2PB, what happens when you lose
> 1 node?
>
>
> On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:
>
> Hello list,
>
>           I don't know if this question makes any sense, but I would like
> to ask, does it make sense to store 500TB (or more) data in a single DN?If
> yes, then what should be the spec of other parameters *viz*. NN & DN RAM,
> N/W etc?If no, what could be the alternative?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
>

Re: Sane max storage size for DN

Posted by Michael Segel <mi...@hotmail.com>.
500 TB? 

How many nodes in the cluster? Is this attached storage or is it in an array? 

I mean if you have 4 nodes for a total of 2PB, what happens when you lose 1 node? 


On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello list,
> 
>           I don't know if this question makes any sense, but I would like to ask, does it make sense to store 500TB (or more) data in a single DN?If yes, then what should be the spec of other parameters viz. NN & DN RAM, N/W etc?If no, what could be the alternative?
> 
> Many thanks.
> 
> Regards,
>     Mohammad Tariq
> 
> 


Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Thank you so much for the valuable response Ted.

No, there would be dedicated storage for NN as well.

Any tips on RAM & N/W?

*Computations are not really frequent.

Thanks again.

Regards,
    Mohammad Tariq



On Wed, Dec 12, 2012 at 9:14 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> Yes it does make sense, depending on how much compute each byte of data
> will require on average.  With ordinary Hadoop, it is reasonable to have
> half a dozen 2TB drives.  With specialized versions of Hadoop considerably
> more can be supported.
>
> From what you say, it sounds like you are suggesting that your name node
> get a part of a single drive with the rest being shared with other virtual
> instances or with an OS partition.  That would be a really bad idea for
> performance.  Many Hadoop programs are I/O bound so having more than one
> spindle is a good thing.
>
>
>
> On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello list,
>>
>>           I don't know if this question makes any sense, but I would like
>> to ask, does it make sense to store 500TB (or more) data in a single DN?If
>> yes, then what should be the spec of other parameters *viz*. NN & DN
>> RAM, N/W etc?If no, what could be the alternative?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Thank you so much for the valuable response Ted.

No, there would be dedicated storage for NN as well.

Any tips on RAM & N/W?

*Computations are not really frequent.

Thanks again.

Regards,
    Mohammad Tariq



On Wed, Dec 12, 2012 at 9:14 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> Yes it does make sense, depending on how much compute each byte of data
> will require on average.  With ordinary Hadoop, it is reasonable to have
> half a dozen 2TB drives.  With specialized versions of Hadoop considerably
> more can be supported.
>
> From what you say, it sounds like you are suggesting that your name node
> get a part of a single drive with the rest being shared with other virtual
> instances or with an OS partition.  That would be a really bad idea for
> performance.  Many Hadoop programs are I/O bound so having more than one
> spindle is a good thing.
>
>
>
> On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello list,
>>
>>           I don't know if this question makes any sense, but I would like
>> to ask, does it make sense to store 500TB (or more) data in a single DN?If
>> yes, then what should be the spec of other parameters *viz*. NN & DN
>> RAM, N/W etc?If no, what could be the alternative?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Thank you so much for the valuable response Ted.

No, there would be dedicated storage for NN as well.

Any tips on RAM & N/W?

*Computations are not really frequent.

Thanks again.

Regards,
    Mohammad Tariq



On Wed, Dec 12, 2012 at 9:14 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> Yes it does make sense, depending on how much compute each byte of data
> will require on average.  With ordinary Hadoop, it is reasonable to have
> half a dozen 2TB drives.  With specialized versions of Hadoop considerably
> more can be supported.
>
> From what you say, it sounds like you are suggesting that your name node
> get a part of a single drive with the rest being shared with other virtual
> instances or with an OS partition.  That would be a really bad idea for
> performance.  Many Hadoop programs are I/O bound so having more than one
> spindle is a good thing.
>
>
>
> On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello list,
>>
>>           I don't know if this question makes any sense, but I would like
>> to ask, does it make sense to store 500TB (or more) data in a single DN?If
>> yes, then what should be the spec of other parameters *viz*. NN & DN
>> RAM, N/W etc?If no, what could be the alternative?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>

Re: Sane max storage size for DN

Posted by Mohammad Tariq <do...@gmail.com>.
Thank you so much for the valuable response Ted.

No, there would be dedicated storage for NN as well.

Any tips on RAM & N/W?

*Computations are not really frequent.

Thanks again.

Regards,
    Mohammad Tariq



On Wed, Dec 12, 2012 at 9:14 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> Yes it does make sense, depending on how much compute each byte of data
> will require on average.  With ordinary Hadoop, it is reasonable to have
> half a dozen 2TB drives.  With specialized versions of Hadoop considerably
> more can be supported.
>
> From what you say, it sounds like you are suggesting that your name node
> get a part of a single drive with the rest being shared with other virtual
> instances or with an OS partition.  That would be a really bad idea for
> performance.  Many Hadoop programs are I/O bound so having more than one
> spindle is a good thing.
>
>
>
> On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Hello list,
>>
>>           I don't know if this question makes any sense, but I would like
>> to ask, does it make sense to store 500TB (or more) data in a single DN?If
>> yes, then what should be the spec of other parameters *viz*. NN & DN
>> RAM, N/W etc?If no, what could be the alternative?
>>
>> Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>

Re: Sane max storage size for DN

Posted by Ted Dunning <td...@maprtech.com>.
Yes it does make sense, depending on how much compute each byte of data
will require on average.  With ordinary Hadoop, it is reasonable to have
half a dozen 2TB drives.  With specialized versions of Hadoop considerably
more can be supported.

>From what you say, it sounds like you are suggesting that your name node
get a part of a single drive with the rest being shared with other virtual
instances or with an OS partition.  That would be a really bad idea for
performance.  Many Hadoop programs are I/O bound so having more than one
spindle is a good thing.



On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello list,
>
>           I don't know if this question makes any sense, but I would like
> to ask, does it make sense to store 500TB (or more) data in a single DN?If
> yes, then what should be the spec of other parameters *viz*. NN & DN RAM,
> N/W etc?If no, what could be the alternative?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>

Re: Sane max storage size for DN

Posted by Ted Dunning <td...@maprtech.com>.
Yes it does make sense, depending on how much compute each byte of data
will require on average.  With ordinary Hadoop, it is reasonable to have
half a dozen 2TB drives.  With specialized versions of Hadoop considerably
more can be supported.

>From what you say, it sounds like you are suggesting that your name node
get a part of a single drive with the rest being shared with other virtual
instances or with an OS partition.  That would be a really bad idea for
performance.  Many Hadoop programs are I/O bound so having more than one
spindle is a good thing.



On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello list,
>
>           I don't know if this question makes any sense, but I would like
> to ask, does it make sense to store 500TB (or more) data in a single DN?If
> yes, then what should be the spec of other parameters *viz*. NN & DN RAM,
> N/W etc?If no, what could be the alternative?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>

Re: Sane max storage size for DN

Posted by Michael Segel <mi...@hotmail.com>.
500 TB? 

How many nodes in the cluster? Is this attached storage or is it in an array? 

I mean if you have 4 nodes for a total of 2PB, what happens when you lose 1 node? 


On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello list,
> 
>           I don't know if this question makes any sense, but I would like to ask, does it make sense to store 500TB (or more) data in a single DN?If yes, then what should be the spec of other parameters viz. NN & DN RAM, N/W etc?If no, what could be the alternative?
> 
> Many thanks.
> 
> Regards,
>     Mohammad Tariq
> 
> 


Re: Sane max storage size for DN

Posted by Ted Dunning <td...@maprtech.com>.
Yes it does make sense, depending on how much compute each byte of data
will require on average.  With ordinary Hadoop, it is reasonable to have
half a dozen 2TB drives.  With specialized versions of Hadoop considerably
more can be supported.

>From what you say, it sounds like you are suggesting that your name node
get a part of a single drive with the rest being shared with other virtual
instances or with an OS partition.  That would be a really bad idea for
performance.  Many Hadoop programs are I/O bound so having more than one
spindle is a good thing.



On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello list,
>
>           I don't know if this question makes any sense, but I would like
> to ask, does it make sense to store 500TB (or more) data in a single DN?If
> yes, then what should be the spec of other parameters *viz*. NN & DN RAM,
> N/W etc?If no, what could be the alternative?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>

Re: Sane max storage size for DN

Posted by Michael Segel <mi...@hotmail.com>.
500 TB? 

How many nodes in the cluster? Is this attached storage or is it in an array? 

I mean if you have 4 nodes for a total of 2PB, what happens when you lose 1 node? 


On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello list,
> 
>           I don't know if this question makes any sense, but I would like to ask, does it make sense to store 500TB (or more) data in a single DN?If yes, then what should be the spec of other parameters viz. NN & DN RAM, N/W etc?If no, what could be the alternative?
> 
> Many thanks.
> 
> Regards,
>     Mohammad Tariq
> 
> 


Re: Sane max storage size for DN

Posted by Michael Segel <mi...@hotmail.com>.
500 TB? 

How many nodes in the cluster? Is this attached storage or is it in an array? 

I mean if you have 4 nodes for a total of 2PB, what happens when you lose 1 node? 


On Dec 12, 2012, at 9:02 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello list,
> 
>           I don't know if this question makes any sense, but I would like to ask, does it make sense to store 500TB (or more) data in a single DN?If yes, then what should be the spec of other parameters viz. NN & DN RAM, N/W etc?If no, what could be the alternative?
> 
> Many thanks.
> 
> Regards,
>     Mohammad Tariq
> 
> 


Re: Sane max storage size for DN

Posted by Ted Dunning <td...@maprtech.com>.
Yes it does make sense, depending on how much compute each byte of data
will require on average.  With ordinary Hadoop, it is reasonable to have
half a dozen 2TB drives.  With specialized versions of Hadoop considerably
more can be supported.

>From what you say, it sounds like you are suggesting that your name node
get a part of a single drive with the rest being shared with other virtual
instances or with an OS partition.  That would be a really bad idea for
performance.  Many Hadoop programs are I/O bound so having more than one
spindle is a good thing.



On Wed, Dec 12, 2012 at 7:02 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello list,
>
>           I don't know if this question makes any sense, but I would like
> to ask, does it make sense to store 500TB (or more) data in a single DN?If
> yes, then what should be the spec of other parameters *viz*. NN & DN RAM,
> N/W etc?If no, what could be the alternative?
>
> Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>