You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Mark Kerzner <ma...@shmsoft.com> on 2012/10/12 05:59:47 UTC

Using a hard drive instead of

Hi,

Imagine I have a very fast hard drive that I want to use for the NameNode.
That is, I want the NameNode to store its blocks information on this hard
drive instead of in memory.

Why would I do it? Scalability (no federation needed), many files are not a
problem, and warm fail-over is automatic. What would I need to change in
the NameNode to tell it to use the hard drive?

Thank you,
Mark

Re: Using a hard drive instead of

Posted by Michael Segel <mi...@hotmail.com>.

Meh. 

If you are worried about the memory constraints of a Linux system, I'd say go with MapR and their CLDB. 

I just did a quick look at Supermico servers and found that on a 2u server  768GB was the max.
So how many blocks can you store in that much memory? I only have 10 fingers and toes so I can't count that high. ;-)

Assuming that you use 64MB blocks what's that max size? 
Switching to 128MB blocks what's the max size then?

From Tom White's blog "Every file, directory and block in HDFS is represented as an object in the nam- enode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible."

So 10 million blocks in 3GB. 1x10^7 * 200 = 6*10^9 blocks in 600GB of memory. 

Thats 6 Billion blocks. 
At 64MB that's 384 Billion MBs which off the top of my head, is 384 PB? 
(Ok, I'll admit that makes my head spin so someone may want to check my math....)

The point is that its possible to build out your control nodes with more than enough memory to handle the largest cluster you can build with Map/Reduce 1.x (Pre YARN)

I am skeptical of Federation. 

Just Saying... 

-Mike

On Oct 17, 2012, at 5:37 PM, Colin Patrick McCabe <ra...@gmail.com> wrote:

> The direct answer to your question us to use this theoretical super-fast hard drive as Linux swap space.
> 
> The better answer is to use federation or another solution if your needs exceed those servable by a single NameNode.
> 
> Cheers.
> Colin
> 
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
> Hi,
> 
> Imagine I have a very fast hard drive that I want to use for the NameNode. That is, I want the NameNode to store its blocks information on this hard drive instead of in memory. 
> 
> Why would I do it? Scalability (no federation needed), many files are not a problem, and warm fail-over is automatic. What would I need to change in the NameNode to tell it to use the hard drive?
> 
> Thank you,
> Mark

Re: Using a hard drive instead of

Posted by Michael Segel <mi...@hotmail.com>.

Meh. 

If you are worried about the memory constraints of a Linux system, I'd say go with MapR and their CLDB. 

I just did a quick look at Supermico servers and found that on a 2u server  768GB was the max.
So how many blocks can you store in that much memory? I only have 10 fingers and toes so I can't count that high. ;-)

Assuming that you use 64MB blocks what's that max size? 
Switching to 128MB blocks what's the max size then?

From Tom White's blog "Every file, directory and block in HDFS is represented as an object in the nam- enode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible."

So 10 million blocks in 3GB. 1x10^7 * 200 = 6*10^9 blocks in 600GB of memory. 

Thats 6 Billion blocks. 
At 64MB that's 384 Billion MBs which off the top of my head, is 384 PB? 
(Ok, I'll admit that makes my head spin so someone may want to check my math....)

The point is that its possible to build out your control nodes with more than enough memory to handle the largest cluster you can build with Map/Reduce 1.x (Pre YARN)

I am skeptical of Federation. 

Just Saying... 

-Mike

On Oct 17, 2012, at 5:37 PM, Colin Patrick McCabe <ra...@gmail.com> wrote:

> The direct answer to your question us to use this theoretical super-fast hard drive as Linux swap space.
> 
> The better answer is to use federation or another solution if your needs exceed those servable by a single NameNode.
> 
> Cheers.
> Colin
> 
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
> Hi,
> 
> Imagine I have a very fast hard drive that I want to use for the NameNode. That is, I want the NameNode to store its blocks information on this hard drive instead of in memory. 
> 
> Why would I do it? Scalability (no federation needed), many files are not a problem, and warm fail-over is automatic. What would I need to change in the NameNode to tell it to use the hard drive?
> 
> Thank you,
> Mark

Re: Using a hard drive instead of

Posted by Colin McCabe <cm...@alumni.cmu.edu>.

Hi Mark,

HDFS contains a write-ahead log which will protect you from power
failure.  It's called the edit log.  If you want warm failover, you
can use HDFS HA, which is available in recent versions of HDFS.

Hope this helps.
Colin


On Wed, Oct 17, 2012 at 3:44 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Colin,
>
> swap space would give me very high memory, up to 1 TB, say, but it won't
> protect me from power failure. Very large clusters is only one application
> of this idea, warm failover is another. The drive is not theoretical, just
> look at the price tag :)
>
> Mark
>
>
> On Wed, Oct 17, 2012 at 5:37 PM, Colin Patrick McCabe <ra...@gmail.com>
> wrote:
>>
>> The direct answer to your question us to use this theoretical super-fast
>> hard drive as Linux swap space.
>>
>> The better answer is to use federation or another solution if your needs
>> exceed those servable by a single NameNode.
>>
>> Cheers.
>> Colin
>>
>> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
>>>
>>> Hi,
>>>
>>> Imagine I have a very fast hard drive that I want to use for the
>>> NameNode. That is, I want the NameNode to store its blocks information on
>>> this hard drive instead of in memory.
>>>
>>> Why would I do it? Scalability (no federation needed), many files are not
>>> a problem, and warm fail-over is automatic. What would I need to change in
>>> the NameNode to tell it to use the hard drive?
>>>
>>> Thank you,
>>> Mark
>
>

Re: Using a hard drive instead of

Posted by Colin McCabe <cm...@alumni.cmu.edu>.

Hi Mark,

HDFS contains a write-ahead log which will protect you from power
failure.  It's called the edit log.  If you want warm failover, you
can use HDFS HA, which is available in recent versions of HDFS.

Hope this helps.
Colin


On Wed, Oct 17, 2012 at 3:44 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Colin,
>
> swap space would give me very high memory, up to 1 TB, say, but it won't
> protect me from power failure. Very large clusters is only one application
> of this idea, warm failover is another. The drive is not theoretical, just
> look at the price tag :)
>
> Mark
>
>
> On Wed, Oct 17, 2012 at 5:37 PM, Colin Patrick McCabe <ra...@gmail.com>
> wrote:
>>
>> The direct answer to your question us to use this theoretical super-fast
>> hard drive as Linux swap space.
>>
>> The better answer is to use federation or another solution if your needs
>> exceed those servable by a single NameNode.
>>
>> Cheers.
>> Colin
>>
>> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
>>>
>>> Hi,
>>>
>>> Imagine I have a very fast hard drive that I want to use for the
>>> NameNode. That is, I want the NameNode to store its blocks information on
>>> this hard drive instead of in memory.
>>>
>>> Why would I do it? Scalability (no federation needed), many files are not
>>> a problem, and warm fail-over is automatic. What would I need to change in
>>> the NameNode to tell it to use the hard drive?
>>>
>>> Thank you,
>>> Mark
>
>

Re: Using a hard drive instead of

Posted by Colin McCabe <cm...@alumni.cmu.edu>.

Hi Mark,

HDFS contains a write-ahead log which will protect you from power
failure.  It's called the edit log.  If you want warm failover, you
can use HDFS HA, which is available in recent versions of HDFS.

Hope this helps.
Colin


On Wed, Oct 17, 2012 at 3:44 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Colin,
>
> swap space would give me very high memory, up to 1 TB, say, but it won't
> protect me from power failure. Very large clusters is only one application
> of this idea, warm failover is another. The drive is not theoretical, just
> look at the price tag :)
>
> Mark
>
>
> On Wed, Oct 17, 2012 at 5:37 PM, Colin Patrick McCabe <ra...@gmail.com>
> wrote:
>>
>> The direct answer to your question us to use this theoretical super-fast
>> hard drive as Linux swap space.
>>
>> The better answer is to use federation or another solution if your needs
>> exceed those servable by a single NameNode.
>>
>> Cheers.
>> Colin
>>
>> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
>>>
>>> Hi,
>>>
>>> Imagine I have a very fast hard drive that I want to use for the
>>> NameNode. That is, I want the NameNode to store its blocks information on
>>> this hard drive instead of in memory.
>>>
>>> Why would I do it? Scalability (no federation needed), many files are not
>>> a problem, and warm fail-over is automatic. What would I need to change in
>>> the NameNode to tell it to use the hard drive?
>>>
>>> Thank you,
>>> Mark
>
>

Re: Using a hard drive instead of

Posted by Colin McCabe <cm...@alumni.cmu.edu>.

Hi Mark,

HDFS contains a write-ahead log which will protect you from power
failure.  It's called the edit log.  If you want warm failover, you
can use HDFS HA, which is available in recent versions of HDFS.

Hope this helps.
Colin


On Wed, Oct 17, 2012 at 3:44 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Colin,
>
> swap space would give me very high memory, up to 1 TB, say, but it won't
> protect me from power failure. Very large clusters is only one application
> of this idea, warm failover is another. The drive is not theoretical, just
> look at the price tag :)
>
> Mark
>
>
> On Wed, Oct 17, 2012 at 5:37 PM, Colin Patrick McCabe <ra...@gmail.com>
> wrote:
>>
>> The direct answer to your question us to use this theoretical super-fast
>> hard drive as Linux swap space.
>>
>> The better answer is to use federation or another solution if your needs
>> exceed those servable by a single NameNode.
>>
>> Cheers.
>> Colin
>>
>> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
>>>
>>> Hi,
>>>
>>> Imagine I have a very fast hard drive that I want to use for the
>>> NameNode. That is, I want the NameNode to store its blocks information on
>>> this hard drive instead of in memory.
>>>
>>> Why would I do it? Scalability (no federation needed), many files are not
>>> a problem, and warm fail-over is automatic. What would I need to change in
>>> the NameNode to tell it to use the hard drive?
>>>
>>> Thank you,
>>> Mark
>
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Colin,

swap space would give me very high memory, up to 1 TB, say, but it won't
protect me from power failure. Very large clusters is only one application
of this idea, warm failover is another. The drive is not theoretical, just
look at the price tag :)

Mark

On Wed, Oct 17, 2012 at 5:37 PM, Colin Patrick McCabe
<ra...@gmail.com>wrote:

> The direct answer to your question us to use this theoretical super-fast
> hard drive as Linux swap space.
>
> The better answer is to use federation or another solution if your needs
> exceed those servable by a single NameNode.
>
> Cheers.
> Colin
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
>
>> Hi,
>>
>> Imagine I have a very fast hard drive that I want to use for the
>> NameNode. That is, I want the NameNode to store its blocks information on
>> this hard drive instead of in memory.
>>
>> Why would I do it? Scalability (no federation needed), many files are not
>> a problem, and warm fail-over is automatic. What would I need to change in
>> the NameNode to tell it to use the hard drive?
>>
>> Thank you,
>> Mark
>>
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Colin,

swap space would give me very high memory, up to 1 TB, say, but it won't
protect me from power failure. Very large clusters is only one application
of this idea, warm failover is another. The drive is not theoretical, just
look at the price tag :)

Mark

On Wed, Oct 17, 2012 at 5:37 PM, Colin Patrick McCabe
<ra...@gmail.com>wrote:

> The direct answer to your question us to use this theoretical super-fast
> hard drive as Linux swap space.
>
> The better answer is to use federation or another solution if your needs
> exceed those servable by a single NameNode.
>
> Cheers.
> Colin
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
>
>> Hi,
>>
>> Imagine I have a very fast hard drive that I want to use for the
>> NameNode. That is, I want the NameNode to store its blocks information on
>> this hard drive instead of in memory.
>>
>> Why would I do it? Scalability (no federation needed), many files are not
>> a problem, and warm fail-over is automatic. What would I need to change in
>> the NameNode to tell it to use the hard drive?
>>
>> Thank you,
>> Mark
>>
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Colin,

swap space would give me very high memory, up to 1 TB, say, but it won't
protect me from power failure. Very large clusters is only one application
of this idea, warm failover is another. The drive is not theoretical, just
look at the price tag :)

Mark

On Wed, Oct 17, 2012 at 5:37 PM, Colin Patrick McCabe
<ra...@gmail.com>wrote:

> The direct answer to your question us to use this theoretical super-fast
> hard drive as Linux swap space.
>
> The better answer is to use federation or another solution if your needs
> exceed those servable by a single NameNode.
>
> Cheers.
> Colin
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
>
>> Hi,
>>
>> Imagine I have a very fast hard drive that I want to use for the
>> NameNode. That is, I want the NameNode to store its blocks information on
>> this hard drive instead of in memory.
>>
>> Why would I do it? Scalability (no federation needed), many files are not
>> a problem, and warm fail-over is automatic. What would I need to change in
>> the NameNode to tell it to use the hard drive?
>>
>> Thank you,
>> Mark
>>
>

Re: Using a hard drive instead of

Posted by Michael Segel <mi...@hotmail.com>.

Meh. 

If you are worried about the memory constraints of a Linux system, I'd say go with MapR and their CLDB. 

I just did a quick look at Supermico servers and found that on a 2u server  768GB was the max.
So how many blocks can you store in that much memory? I only have 10 fingers and toes so I can't count that high. ;-)

Assuming that you use 64MB blocks what's that max size? 
Switching to 128MB blocks what's the max size then?

From Tom White's blog "Every file, directory and block in HDFS is represented as an object in the nam- enode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible."

So 10 million blocks in 3GB. 1x10^7 * 200 = 6*10^9 blocks in 600GB of memory. 

Thats 6 Billion blocks. 
At 64MB that's 384 Billion MBs which off the top of my head, is 384 PB? 
(Ok, I'll admit that makes my head spin so someone may want to check my math....)

The point is that its possible to build out your control nodes with more than enough memory to handle the largest cluster you can build with Map/Reduce 1.x (Pre YARN)

I am skeptical of Federation. 

Just Saying... 

-Mike

On Oct 17, 2012, at 5:37 PM, Colin Patrick McCabe <ra...@gmail.com> wrote:

> The direct answer to your question us to use this theoretical super-fast hard drive as Linux swap space.
> 
> The better answer is to use federation or another solution if your needs exceed those servable by a single NameNode.
> 
> Cheers.
> Colin
> 
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
> Hi,
> 
> Imagine I have a very fast hard drive that I want to use for the NameNode. That is, I want the NameNode to store its blocks information on this hard drive instead of in memory. 
> 
> Why would I do it? Scalability (no federation needed), many files are not a problem, and warm fail-over is automatic. What would I need to change in the NameNode to tell it to use the hard drive?
> 
> Thank you,
> Mark

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Colin,

swap space would give me very high memory, up to 1 TB, say, but it won't
protect me from power failure. Very large clusters is only one application
of this idea, warm failover is another. The drive is not theoretical, just
look at the price tag :)

Mark

On Wed, Oct 17, 2012 at 5:37 PM, Colin Patrick McCabe
<ra...@gmail.com>wrote:

> The direct answer to your question us to use this theoretical super-fast
> hard drive as Linux swap space.
>
> The better answer is to use federation or another solution if your needs
> exceed those servable by a single NameNode.
>
> Cheers.
> Colin
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
>
>> Hi,
>>
>> Imagine I have a very fast hard drive that I want to use for the
>> NameNode. That is, I want the NameNode to store its blocks information on
>> this hard drive instead of in memory.
>>
>> Why would I do it? Scalability (no federation needed), many files are not
>> a problem, and warm fail-over is automatic. What would I need to change in
>> the NameNode to tell it to use the hard drive?
>>
>> Thank you,
>> Mark
>>
>

Re: Using a hard drive instead of

Posted by Michael Segel <mi...@hotmail.com>.

Meh. 

If you are worried about the memory constraints of a Linux system, I'd say go with MapR and their CLDB. 

I just did a quick look at Supermico servers and found that on a 2u server  768GB was the max.
So how many blocks can you store in that much memory? I only have 10 fingers and toes so I can't count that high. ;-)

Assuming that you use 64MB blocks what's that max size? 
Switching to 128MB blocks what's the max size then?

From Tom White's blog "Every file, directory and block in HDFS is represented as an object in the nam- enode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible."

So 10 million blocks in 3GB. 1x10^7 * 200 = 6*10^9 blocks in 600GB of memory. 

Thats 6 Billion blocks. 
At 64MB that's 384 Billion MBs which off the top of my head, is 384 PB? 
(Ok, I'll admit that makes my head spin so someone may want to check my math....)

The point is that its possible to build out your control nodes with more than enough memory to handle the largest cluster you can build with Map/Reduce 1.x (Pre YARN)

I am skeptical of Federation. 

Just Saying... 

-Mike

On Oct 17, 2012, at 5:37 PM, Colin Patrick McCabe <ra...@gmail.com> wrote:

> The direct answer to your question us to use this theoretical super-fast hard drive as Linux swap space.
> 
> The better answer is to use federation or another solution if your needs exceed those servable by a single NameNode.
> 
> Cheers.
> Colin
> 
> On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:
> Hi,
> 
> Imagine I have a very fast hard drive that I want to use for the NameNode. That is, I want the NameNode to store its blocks information on this hard drive instead of in memory. 
> 
> Why would I do it? Scalability (no federation needed), many files are not a problem, and warm fail-over is automatic. What would I need to change in the NameNode to tell it to use the hard drive?
> 
> Thank you,
> Mark

Re: Using a hard drive instead of

Posted by Colin Patrick McCabe <ra...@gmail.com>.

The direct answer to your question us to use this theoretical super-fast
hard drive as Linux swap space.

The better answer is to use federation or another solution if your needs
exceed those servable by a single NameNode.

Cheers.
Colin
On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:

> Hi,
>
> Imagine I have a very fast hard drive that I want to use for the NameNode.
> That is, I want the NameNode to store its blocks information on this hard
> drive instead of in memory.
>
> Why would I do it? Scalability (no federation needed), many files are not
> a problem, and warm fail-over is automatic. What would I need to change in
> the NameNode to tell it to use the hard drive?
>
> Thank you,
> Mark
>

Re: Using a hard drive instead of

Posted by Colin Patrick McCabe <ra...@gmail.com>.

The direct answer to your question us to use this theoretical super-fast
hard drive as Linux swap space.

The better answer is to use federation or another solution if your needs
exceed those servable by a single NameNode.

Cheers.
Colin
On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:

> Hi,
>
> Imagine I have a very fast hard drive that I want to use for the NameNode.
> That is, I want the NameNode to store its blocks information on this hard
> drive instead of in memory.
>
> Why would I do it? Scalability (no federation needed), many files are not
> a problem, and warm fail-over is automatic. What would I need to change in
> the NameNode to tell it to use the hard drive?
>
> Thank you,
> Mark
>

Re: Using a hard drive instead of

Posted by Colin Patrick McCabe <ra...@gmail.com>.

The direct answer to your question us to use this theoretical super-fast
hard drive as Linux swap space.

The better answer is to use federation or another solution if your needs
exceed those servable by a single NameNode.

Cheers.
Colin
On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:

> Hi,
>
> Imagine I have a very fast hard drive that I want to use for the NameNode.
> That is, I want the NameNode to store its blocks information on this hard
> drive instead of in memory.
>
> Why would I do it? Scalability (no federation needed), many files are not
> a problem, and warm fail-over is automatic. What would I need to change in
> the NameNode to tell it to use the hard drive?
>
> Thank you,
> Mark
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you, everybody, for your input - it was very useful. I need to do my
homework now, and I will be back with the update. The device really exists.
It is not cheap, but it may make sense as the NN of a serious cluster.

Sincerely,
Mark

On Fri, Oct 12, 2012 at 10:46 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Maybe at a slight tangent, but for each write operation on HDFS (e.g.
> create a file, delete a file, create a directory), the NN waits until the
> edit has been *flushed* to disk. So I can imagine such a hypothetical(?)
> disk would tremendously speed up the NN even as it is. Mark, can you please
> please please send me 5 of these disks? :-P
> To answer your question, you probably want to change BlockManager and
> FSNamesystem, both basically being the crux of HDFS NN. Its going to be a
> pretty significant undertaking.
> @memory-mapped files would lose data in case of failure (unless ofcourse
> you use special hardware, thinking of which, really its not soooo special,
> so maybe worth trying). Has anyone tried this before?
>
>   ------------------------------
> *From:* Lance Norskog <go...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Friday, October 12, 2012 12:01 AM
>
> *Subject:* Re: Using a hard drive instead of
>
> This is why memory-mapped files were invented.
>
> On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
> <ga...@gmail.com> wrote:
> > If you don't mind sharing, what hard drive do you have with these
> > properties:
> > -"performance of RAM"
> > -"can accommodate very many threads"
> >
> >
> > On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> >
> > Harsh,
> >
> > I agree with you about many small files, and I was giving this only in
> way
> > of example. However, the hard drive I am talking about can be 1-2 TB in
> > size, and that's pretty good, you can't easily get that much memory. In
> > addition, it would be more resistant to power failures than RAM. And
> yes, it
> > has the performance of RAM, and can accommodate very many threads.
> >
> > Mark
> >
> > On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi Mark,
> >>
> >> Note that the NameNode does random memory access to serve back any
> >> information or mutate request you send to it, and that there can be
> >> several number of concurrent clients. So do you mean a 'very fast hard
> >> drive' thats faster than the RAM for random access itself? The
> >> NameNode does persist its block information onto disk for various
> >> purposes, but to actually make the NameNode use disk storage
> >> completely (and not specific parts of it disk-cached instead) wouldn't
> >> make too much sense to me. That'd feel like trying to communicate with
> >> a process thats swapping, performance-wise.
> >>
> >> The too many files issue is bloated up to sound like its a NameNode
> >> issue but it isn't in reality. HDFS allows you to process lots of
> >> files really fast, aside of helping store them for long periods, and a
> >> lot of tiny files only gets you down in such operations with overheads
> >> of opening and closing files in the way of reading them all at a time.
> >> With a single or a few large files, all you do is block (data) reads,
> >> and very few NameNode communications - ending up going much faster.
> >> This is the same for local filesystems as well, but not many think of
> >> that.
> >>
> >> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <mark.kerzner@shmsoft.com
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > Imagine I have a very fast hard drive that I want to use for the
> >> > NameNode.
> >> > That is, I want the NameNode to store its blocks information on this
> >> > hard
> >> > drive instead of in memory.
> >> >
> >> > Why would I do it? Scalability (no federation needed), many files are
> >> > not a
> >> > problem, and warm fail-over is automatic. What would I need to change
> in
> >> > the
> >> > NameNode to tell it to use the hard drive?
> >> >
> >> > Thank you,
> >> > Mark
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you, everybody, for your input - it was very useful. I need to do my
homework now, and I will be back with the update. The device really exists.
It is not cheap, but it may make sense as the NN of a serious cluster.

Sincerely,
Mark

On Fri, Oct 12, 2012 at 10:46 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Maybe at a slight tangent, but for each write operation on HDFS (e.g.
> create a file, delete a file, create a directory), the NN waits until the
> edit has been *flushed* to disk. So I can imagine such a hypothetical(?)
> disk would tremendously speed up the NN even as it is. Mark, can you please
> please please send me 5 of these disks? :-P
> To answer your question, you probably want to change BlockManager and
> FSNamesystem, both basically being the crux of HDFS NN. Its going to be a
> pretty significant undertaking.
> @memory-mapped files would lose data in case of failure (unless ofcourse
> you use special hardware, thinking of which, really its not soooo special,
> so maybe worth trying). Has anyone tried this before?
>
>   ------------------------------
> *From:* Lance Norskog <go...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Friday, October 12, 2012 12:01 AM
>
> *Subject:* Re: Using a hard drive instead of
>
> This is why memory-mapped files were invented.
>
> On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
> <ga...@gmail.com> wrote:
> > If you don't mind sharing, what hard drive do you have with these
> > properties:
> > -"performance of RAM"
> > -"can accommodate very many threads"
> >
> >
> > On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> >
> > Harsh,
> >
> > I agree with you about many small files, and I was giving this only in
> way
> > of example. However, the hard drive I am talking about can be 1-2 TB in
> > size, and that's pretty good, you can't easily get that much memory. In
> > addition, it would be more resistant to power failures than RAM. And
> yes, it
> > has the performance of RAM, and can accommodate very many threads.
> >
> > Mark
> >
> > On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi Mark,
> >>
> >> Note that the NameNode does random memory access to serve back any
> >> information or mutate request you send to it, and that there can be
> >> several number of concurrent clients. So do you mean a 'very fast hard
> >> drive' thats faster than the RAM for random access itself? The
> >> NameNode does persist its block information onto disk for various
> >> purposes, but to actually make the NameNode use disk storage
> >> completely (and not specific parts of it disk-cached instead) wouldn't
> >> make too much sense to me. That'd feel like trying to communicate with
> >> a process thats swapping, performance-wise.
> >>
> >> The too many files issue is bloated up to sound like its a NameNode
> >> issue but it isn't in reality. HDFS allows you to process lots of
> >> files really fast, aside of helping store them for long periods, and a
> >> lot of tiny files only gets you down in such operations with overheads
> >> of opening and closing files in the way of reading them all at a time.
> >> With a single or a few large files, all you do is block (data) reads,
> >> and very few NameNode communications - ending up going much faster.
> >> This is the same for local filesystems as well, but not many think of
> >> that.
> >>
> >> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <mark.kerzner@shmsoft.com
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > Imagine I have a very fast hard drive that I want to use for the
> >> > NameNode.
> >> > That is, I want the NameNode to store its blocks information on this
> >> > hard
> >> > drive instead of in memory.
> >> >
> >> > Why would I do it? Scalability (no federation needed), many files are
> >> > not a
> >> > problem, and warm fail-over is automatic. What would I need to change
> in
> >> > the
> >> > NameNode to tell it to use the hard drive?
> >> >
> >> > Thank you,
> >> > Mark
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you, everybody, for your input - it was very useful. I need to do my
homework now, and I will be back with the update. The device really exists.
It is not cheap, but it may make sense as the NN of a serious cluster.

Sincerely,
Mark

On Fri, Oct 12, 2012 at 10:46 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Maybe at a slight tangent, but for each write operation on HDFS (e.g.
> create a file, delete a file, create a directory), the NN waits until the
> edit has been *flushed* to disk. So I can imagine such a hypothetical(?)
> disk would tremendously speed up the NN even as it is. Mark, can you please
> please please send me 5 of these disks? :-P
> To answer your question, you probably want to change BlockManager and
> FSNamesystem, both basically being the crux of HDFS NN. Its going to be a
> pretty significant undertaking.
> @memory-mapped files would lose data in case of failure (unless ofcourse
> you use special hardware, thinking of which, really its not soooo special,
> so maybe worth trying). Has anyone tried this before?
>
>   ------------------------------
> *From:* Lance Norskog <go...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Friday, October 12, 2012 12:01 AM
>
> *Subject:* Re: Using a hard drive instead of
>
> This is why memory-mapped files were invented.
>
> On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
> <ga...@gmail.com> wrote:
> > If you don't mind sharing, what hard drive do you have with these
> > properties:
> > -"performance of RAM"
> > -"can accommodate very many threads"
> >
> >
> > On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> >
> > Harsh,
> >
> > I agree with you about many small files, and I was giving this only in
> way
> > of example. However, the hard drive I am talking about can be 1-2 TB in
> > size, and that's pretty good, you can't easily get that much memory. In
> > addition, it would be more resistant to power failures than RAM. And
> yes, it
> > has the performance of RAM, and can accommodate very many threads.
> >
> > Mark
> >
> > On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi Mark,
> >>
> >> Note that the NameNode does random memory access to serve back any
> >> information or mutate request you send to it, and that there can be
> >> several number of concurrent clients. So do you mean a 'very fast hard
> >> drive' thats faster than the RAM for random access itself? The
> >> NameNode does persist its block information onto disk for various
> >> purposes, but to actually make the NameNode use disk storage
> >> completely (and not specific parts of it disk-cached instead) wouldn't
> >> make too much sense to me. That'd feel like trying to communicate with
> >> a process thats swapping, performance-wise.
> >>
> >> The too many files issue is bloated up to sound like its a NameNode
> >> issue but it isn't in reality. HDFS allows you to process lots of
> >> files really fast, aside of helping store them for long periods, and a
> >> lot of tiny files only gets you down in such operations with overheads
> >> of opening and closing files in the way of reading them all at a time.
> >> With a single or a few large files, all you do is block (data) reads,
> >> and very few NameNode communications - ending up going much faster.
> >> This is the same for local filesystems as well, but not many think of
> >> that.
> >>
> >> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <mark.kerzner@shmsoft.com
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > Imagine I have a very fast hard drive that I want to use for the
> >> > NameNode.
> >> > That is, I want the NameNode to store its blocks information on this
> >> > hard
> >> > drive instead of in memory.
> >> >
> >> > Why would I do it? Scalability (no federation needed), many files are
> >> > not a
> >> > problem, and warm fail-over is automatic. What would I need to change
> in
> >> > the
> >> > NameNode to tell it to use the hard drive?
> >> >
> >> > Thank you,
> >> > Mark
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you, everybody, for your input - it was very useful. I need to do my
homework now, and I will be back with the update. The device really exists.
It is not cheap, but it may make sense as the NN of a serious cluster.

Sincerely,
Mark

On Fri, Oct 12, 2012 at 10:46 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Maybe at a slight tangent, but for each write operation on HDFS (e.g.
> create a file, delete a file, create a directory), the NN waits until the
> edit has been *flushed* to disk. So I can imagine such a hypothetical(?)
> disk would tremendously speed up the NN even as it is. Mark, can you please
> please please send me 5 of these disks? :-P
> To answer your question, you probably want to change BlockManager and
> FSNamesystem, both basically being the crux of HDFS NN. Its going to be a
> pretty significant undertaking.
> @memory-mapped files would lose data in case of failure (unless ofcourse
> you use special hardware, thinking of which, really its not soooo special,
> so maybe worth trying). Has anyone tried this before?
>
>   ------------------------------
> *From:* Lance Norskog <go...@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Friday, October 12, 2012 12:01 AM
>
> *Subject:* Re: Using a hard drive instead of
>
> This is why memory-mapped files were invented.
>
> On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
> <ga...@gmail.com> wrote:
> > If you don't mind sharing, what hard drive do you have with these
> > properties:
> > -"performance of RAM"
> > -"can accommodate very many threads"
> >
> >
> > On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> >
> > Harsh,
> >
> > I agree with you about many small files, and I was giving this only in
> way
> > of example. However, the hard drive I am talking about can be 1-2 TB in
> > size, and that's pretty good, you can't easily get that much memory. In
> > addition, it would be more resistant to power failures than RAM. And
> yes, it
> > has the performance of RAM, and can accommodate very many threads.
> >
> > Mark
> >
> > On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Hi Mark,
> >>
> >> Note that the NameNode does random memory access to serve back any
> >> information or mutate request you send to it, and that there can be
> >> several number of concurrent clients. So do you mean a 'very fast hard
> >> drive' thats faster than the RAM for random access itself? The
> >> NameNode does persist its block information onto disk for various
> >> purposes, but to actually make the NameNode use disk storage
> >> completely (and not specific parts of it disk-cached instead) wouldn't
> >> make too much sense to me. That'd feel like trying to communicate with
> >> a process thats swapping, performance-wise.
> >>
> >> The too many files issue is bloated up to sound like its a NameNode
> >> issue but it isn't in reality. HDFS allows you to process lots of
> >> files really fast, aside of helping store them for long periods, and a
> >> lot of tiny files only gets you down in such operations with overheads
> >> of opening and closing files in the way of reading them all at a time.
> >> With a single or a few large files, all you do is block (data) reads,
> >> and very few NameNode communications - ending up going much faster.
> >> This is the same for local filesystems as well, but not many think of
> >> that.
> >>
> >> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <mark.kerzner@shmsoft.com
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > Imagine I have a very fast hard drive that I want to use for the
> >> > NameNode.
> >> > That is, I want the NameNode to store its blocks information on this
> >> > hard
> >> > drive instead of in memory.
> >> >
> >> > Why would I do it? Scalability (no federation needed), many files are
> >> > not a
> >> > problem, and warm fail-over is automatic. What would I need to change
> in
> >> > the
> >> > NameNode to tell it to use the hard drive?
> >> >
> >> > Thank you,
> >> > Mark
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>
>

Re: Using a hard drive instead of

Posted by Ravi Prakash <ra...@ymail.com>.

Maybe at a slight tangent, but for each write operation on HDFS (e.g. create a file, delete a file, create a directory), the NN waits until the edit has been *flushed* to disk. So I can imagine such a hypothetical(?) disk would tremendously speed up the NN even as it is. Mark, can you please please please send me 5 of these disks? :-P

To answer your question, you probably want to change BlockManager and FSNamesystem, both basically being the crux of HDFS NN. Its going to be a pretty significant undertaking.
@memory-mapped files would lose data in case of failure (unless ofcourse you use special hardware, thinking of which, really its not soooo special, so maybe worth trying). Has anyone tried this before?



________________________________
 From: Lance Norskog <go...@gmail.com>
To: user@hadoop.apache.org 
Sent: Friday, October 12, 2012 12:01 AM
Subject: Re: Using a hard drive instead of
 
This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<ga...@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
>
 has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats
 swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>> > Hi,
>>
 >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Using a hard drive instead of

Posted by Ravi Prakash <ra...@ymail.com>.

Maybe at a slight tangent, but for each write operation on HDFS (e.g. create a file, delete a file, create a directory), the NN waits until the edit has been *flushed* to disk. So I can imagine such a hypothetical(?) disk would tremendously speed up the NN even as it is. Mark, can you please please please send me 5 of these disks? :-P

To answer your question, you probably want to change BlockManager and FSNamesystem, both basically being the crux of HDFS NN. Its going to be a pretty significant undertaking.
@memory-mapped files would lose data in case of failure (unless ofcourse you use special hardware, thinking of which, really its not soooo special, so maybe worth trying). Has anyone tried this before?



________________________________
 From: Lance Norskog <go...@gmail.com>
To: user@hadoop.apache.org 
Sent: Friday, October 12, 2012 12:01 AM
Subject: Re: Using a hard drive instead of
 
This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<ga...@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
>
 has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats
 swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>> > Hi,
>>
 >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Using a hard drive instead of

Posted by Ravi Prakash <ra...@ymail.com>.

Maybe at a slight tangent, but for each write operation on HDFS (e.g. create a file, delete a file, create a directory), the NN waits until the edit has been *flushed* to disk. So I can imagine such a hypothetical(?) disk would tremendously speed up the NN even as it is. Mark, can you please please please send me 5 of these disks? :-P

To answer your question, you probably want to change BlockManager and FSNamesystem, both basically being the crux of HDFS NN. Its going to be a pretty significant undertaking.
@memory-mapped files would lose data in case of failure (unless ofcourse you use special hardware, thinking of which, really its not soooo special, so maybe worth trying). Has anyone tried this before?



________________________________
 From: Lance Norskog <go...@gmail.com>
To: user@hadoop.apache.org 
Sent: Friday, October 12, 2012 12:01 AM
Subject: Re: Using a hard drive instead of
 
This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<ga...@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
>
 has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats
 swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>> > Hi,
>>
 >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Using a hard drive instead of

Posted by Ravi Prakash <ra...@ymail.com>.

Maybe at a slight tangent, but for each write operation on HDFS (e.g. create a file, delete a file, create a directory), the NN waits until the edit has been *flushed* to disk. So I can imagine such a hypothetical(?) disk would tremendously speed up the NN even as it is. Mark, can you please please please send me 5 of these disks? :-P

To answer your question, you probably want to change BlockManager and FSNamesystem, both basically being the crux of HDFS NN. Its going to be a pretty significant undertaking.
@memory-mapped files would lose data in case of failure (unless ofcourse you use special hardware, thinking of which, really its not soooo special, so maybe worth trying). Has anyone tried this before?



________________________________
 From: Lance Norskog <go...@gmail.com>
To: user@hadoop.apache.org 
Sent: Friday, October 12, 2012 12:01 AM
Subject: Re: Using a hard drive instead of
 
This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<ga...@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
>
 has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats
 swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>> > Hi,
>>
 >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Using a hard drive instead of

Posted by Lance Norskog <go...@gmail.com>.

This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<ga...@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
> has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Using a hard drive instead of

Posted by Lance Norskog <go...@gmail.com>.

This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<ga...@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
> has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Using a hard drive instead of

Posted by Lance Norskog <go...@gmail.com>.

This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<ga...@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
> has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Using a hard drive instead of

Posted by Lance Norskog <go...@gmail.com>.

This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<ga...@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
> has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Using a hard drive instead of

Posted by Gaurav Sharma <ga...@gmail.com>.

If you don't mind sharing, what hard drive do you have with these properties:
-"performance of RAM"
-"can accommodate very many threads"


On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:

> Harsh,
> 
> I agree with you about many small files, and I was giving this only in way of example. However, the hard drive I am talking about can be 1-2 TB in size, and that's pretty good, you can't easily get that much memory. In addition, it would be more resistant to power failures than RAM. And yes, it has the performance of RAM, and can accommodate very many threads.
> 
> Mark
> 
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi Mark,
>> 
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>> 
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>> 
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the NameNode.
>> > That is, I want the NameNode to store its blocks information on this hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are not a
>> > problem, and warm fail-over is automatic. What would I need to change in the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>> 
>> 
>> 
>> --
>> Harsh J
>

Re: Using a hard drive instead of

Posted by Gaurav Sharma <ga...@gmail.com>.

If you don't mind sharing, what hard drive do you have with these properties:
-"performance of RAM"
-"can accommodate very many threads"


On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:

> Harsh,
> 
> I agree with you about many small files, and I was giving this only in way of example. However, the hard drive I am talking about can be 1-2 TB in size, and that's pretty good, you can't easily get that much memory. In addition, it would be more resistant to power failures than RAM. And yes, it has the performance of RAM, and can accommodate very many threads.
> 
> Mark
> 
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi Mark,
>> 
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>> 
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>> 
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the NameNode.
>> > That is, I want the NameNode to store its blocks information on this hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are not a
>> > problem, and warm fail-over is automatic. What would I need to change in the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>> 
>> 
>> 
>> --
>> Harsh J
>

Re: Using a hard drive instead of

Posted by Gaurav Sharma <ga...@gmail.com>.

If you don't mind sharing, what hard drive do you have with these properties:
-"performance of RAM"
-"can accommodate very many threads"


On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:

> Harsh,
> 
> I agree with you about many small files, and I was giving this only in way of example. However, the hard drive I am talking about can be 1-2 TB in size, and that's pretty good, you can't easily get that much memory. In addition, it would be more resistant to power failures than RAM. And yes, it has the performance of RAM, and can accommodate very many threads.
> 
> Mark
> 
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi Mark,
>> 
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>> 
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>> 
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the NameNode.
>> > That is, I want the NameNode to store its blocks information on this hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are not a
>> > problem, and warm fail-over is automatic. What would I need to change in the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>> 
>> 
>> 
>> --
>> Harsh J
>

Re: Using a hard drive instead of

Posted by Gaurav Sharma <ga...@gmail.com>.

If you don't mind sharing, what hard drive do you have with these properties:
-"performance of RAM"
-"can accommodate very many threads"


On Oct 11, 2012, at 21:27, Mark Kerzner <ma...@shmsoft.com> wrote:

> Harsh,
> 
> I agree with you about many small files, and I was giving this only in way of example. However, the hard drive I am talking about can be 1-2 TB in size, and that's pretty good, you can't easily get that much memory. In addition, it would be more resistant to power failures than RAM. And yes, it has the performance of RAM, and can accommodate very many threads.
> 
> Mark
> 
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi Mark,
>> 
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>> 
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>> 
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the NameNode.
>> > That is, I want the NameNode to store its blocks information on this hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are not a
>> > problem, and warm fail-over is automatic. What would I need to change in the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>> 
>> 
>> 
>> --
>> Harsh J
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Harsh,

I agree with you about many small files, and I was giving this only in way
of example. However, the hard drive I am talking about can be 1-2 TB in
size, and that's pretty good, you can't easily get that much memory. In
addition, it would be more resistant to power failures than RAM. And yes,
it has the performance of RAM, and can accommodate very many threads.

Mark

On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Mark,
>
> Note that the NameNode does random memory access to serve back any
> information or mutate request you send to it, and that there can be
> several number of concurrent clients. So do you mean a 'very fast hard
> drive' thats faster than the RAM for random access itself? The
> NameNode does persist its block information onto disk for various
> purposes, but to actually make the NameNode use disk storage
> completely (and not specific parts of it disk-cached instead) wouldn't
> make too much sense to me. That'd feel like trying to communicate with
> a process thats swapping, performance-wise.
>
> The too many files issue is bloated up to sound like its a NameNode
> issue but it isn't in reality. HDFS allows you to process lots of
> files really fast, aside of helping store them for long periods, and a
> lot of tiny files only gets you down in such operations with overheads
> of opening and closing files in the way of reading them all at a time.
> With a single or a few large files, all you do is block (data) reads,
> and very few NameNode communications - ending up going much faster.
> This is the same for local filesystems as well, but not many think of
> that.
>
> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> > Hi,
> >
> > Imagine I have a very fast hard drive that I want to use for the
> NameNode.
> > That is, I want the NameNode to store its blocks information on this hard
> > drive instead of in memory.
> >
> > Why would I do it? Scalability (no federation needed), many files are
> not a
> > problem, and warm fail-over is automatic. What would I need to change in
> the
> > NameNode to tell it to use the hard drive?
> >
> > Thank you,
> > Mark
>
>
>
> --
> Harsh J
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Harsh,

I agree with you about many small files, and I was giving this only in way
of example. However, the hard drive I am talking about can be 1-2 TB in
size, and that's pretty good, you can't easily get that much memory. In
addition, it would be more resistant to power failures than RAM. And yes,
it has the performance of RAM, and can accommodate very many threads.

Mark

On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Mark,
>
> Note that the NameNode does random memory access to serve back any
> information or mutate request you send to it, and that there can be
> several number of concurrent clients. So do you mean a 'very fast hard
> drive' thats faster than the RAM for random access itself? The
> NameNode does persist its block information onto disk for various
> purposes, but to actually make the NameNode use disk storage
> completely (and not specific parts of it disk-cached instead) wouldn't
> make too much sense to me. That'd feel like trying to communicate with
> a process thats swapping, performance-wise.
>
> The too many files issue is bloated up to sound like its a NameNode
> issue but it isn't in reality. HDFS allows you to process lots of
> files really fast, aside of helping store them for long periods, and a
> lot of tiny files only gets you down in such operations with overheads
> of opening and closing files in the way of reading them all at a time.
> With a single or a few large files, all you do is block (data) reads,
> and very few NameNode communications - ending up going much faster.
> This is the same for local filesystems as well, but not many think of
> that.
>
> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> > Hi,
> >
> > Imagine I have a very fast hard drive that I want to use for the
> NameNode.
> > That is, I want the NameNode to store its blocks information on this hard
> > drive instead of in memory.
> >
> > Why would I do it? Scalability (no federation needed), many files are
> not a
> > problem, and warm fail-over is automatic. What would I need to change in
> the
> > NameNode to tell it to use the hard drive?
> >
> > Thank you,
> > Mark
>
>
>
> --
> Harsh J
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Harsh,

I agree with you about many small files, and I was giving this only in way
of example. However, the hard drive I am talking about can be 1-2 TB in
size, and that's pretty good, you can't easily get that much memory. In
addition, it would be more resistant to power failures than RAM. And yes,
it has the performance of RAM, and can accommodate very many threads.

Mark

On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Mark,
>
> Note that the NameNode does random memory access to serve back any
> information or mutate request you send to it, and that there can be
> several number of concurrent clients. So do you mean a 'very fast hard
> drive' thats faster than the RAM for random access itself? The
> NameNode does persist its block information onto disk for various
> purposes, but to actually make the NameNode use disk storage
> completely (and not specific parts of it disk-cached instead) wouldn't
> make too much sense to me. That'd feel like trying to communicate with
> a process thats swapping, performance-wise.
>
> The too many files issue is bloated up to sound like its a NameNode
> issue but it isn't in reality. HDFS allows you to process lots of
> files really fast, aside of helping store them for long periods, and a
> lot of tiny files only gets you down in such operations with overheads
> of opening and closing files in the way of reading them all at a time.
> With a single or a few large files, all you do is block (data) reads,
> and very few NameNode communications - ending up going much faster.
> This is the same for local filesystems as well, but not many think of
> that.
>
> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> > Hi,
> >
> > Imagine I have a very fast hard drive that I want to use for the
> NameNode.
> > That is, I want the NameNode to store its blocks information on this hard
> > drive instead of in memory.
> >
> > Why would I do it? Scalability (no federation needed), many files are
> not a
> > problem, and warm fail-over is automatic. What would I need to change in
> the
> > NameNode to tell it to use the hard drive?
> >
> > Thank you,
> > Mark
>
>
>
> --
> Harsh J
>

Re: Using a hard drive instead of

Posted by Mark Kerzner <ma...@shmsoft.com>.

Harsh,

I agree with you about many small files, and I was giving this only in way
of example. However, the hard drive I am talking about can be 1-2 TB in
size, and that's pretty good, you can't easily get that much memory. In
addition, it would be more resistant to power failures than RAM. And yes,
it has the performance of RAM, and can accommodate very many threads.

Mark

On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Mark,
>
> Note that the NameNode does random memory access to serve back any
> information or mutate request you send to it, and that there can be
> several number of concurrent clients. So do you mean a 'very fast hard
> drive' thats faster than the RAM for random access itself? The
> NameNode does persist its block information onto disk for various
> purposes, but to actually make the NameNode use disk storage
> completely (and not specific parts of it disk-cached instead) wouldn't
> make too much sense to me. That'd feel like trying to communicate with
> a process thats swapping, performance-wise.
>
> The too many files issue is bloated up to sound like its a NameNode
> issue but it isn't in reality. HDFS allows you to process lots of
> files really fast, aside of helping store them for long periods, and a
> lot of tiny files only gets you down in such operations with overheads
> of opening and closing files in the way of reading them all at a time.
> With a single or a few large files, all you do is block (data) reads,
> and very few NameNode communications - ending up going much faster.
> This is the same for local filesystems as well, but not many think of
> that.
>
> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> > Hi,
> >
> > Imagine I have a very fast hard drive that I want to use for the
> NameNode.
> > That is, I want the NameNode to store its blocks information on this hard
> > drive instead of in memory.
> >
> > Why would I do it? Scalability (no federation needed), many files are
> not a
> > problem, and warm fail-over is automatic. What would I need to change in
> the
> > NameNode to tell it to use the hard drive?
> >
> > Thank you,
> > Mark
>
>
>
> --
> Harsh J
>

Re: Using a hard drive instead of

Posted by Harsh J <ha...@cloudera.com>.

Hi Mark,

Note that the NameNode does random memory access to serve back any
information or mutate request you send to it, and that there can be
several number of concurrent clients. So do you mean a 'very fast hard
drive' thats faster than the RAM for random access itself? The
NameNode does persist its block information onto disk for various
purposes, but to actually make the NameNode use disk storage
completely (and not specific parts of it disk-cached instead) wouldn't
make too much sense to me. That'd feel like trying to communicate with
a process thats swapping, performance-wise.

The too many files issue is bloated up to sound like its a NameNode
issue but it isn't in reality. HDFS allows you to process lots of
files really fast, aside of helping store them for long periods, and a
lot of tiny files only gets you down in such operations with overheads
of opening and closing files in the way of reading them all at a time.
With a single or a few large files, all you do is block (data) reads,
and very few NameNode communications - ending up going much faster.
This is the same for local filesystems as well, but not many think of
that.

On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
>
> Imagine I have a very fast hard drive that I want to use for the NameNode.
> That is, I want the NameNode to store its blocks information on this hard
> drive instead of in memory.
>
> Why would I do it? Scalability (no federation needed), many files are not a
> problem, and warm fail-over is automatic. What would I need to change in the
> NameNode to tell it to use the hard drive?
>
> Thank you,
> Mark

-- 
Harsh J

Re: Using a hard drive instead of

Posted by Harsh J <ha...@cloudera.com>.

Hi Mark,

Note that the NameNode does random memory access to serve back any
information or mutate request you send to it, and that there can be
several number of concurrent clients. So do you mean a 'very fast hard
drive' thats faster than the RAM for random access itself? The
NameNode does persist its block information onto disk for various
purposes, but to actually make the NameNode use disk storage
completely (and not specific parts of it disk-cached instead) wouldn't
make too much sense to me. That'd feel like trying to communicate with
a process thats swapping, performance-wise.

The too many files issue is bloated up to sound like its a NameNode
issue but it isn't in reality. HDFS allows you to process lots of
files really fast, aside of helping store them for long periods, and a
lot of tiny files only gets you down in such operations with overheads
of opening and closing files in the way of reading them all at a time.
With a single or a few large files, all you do is block (data) reads,
and very few NameNode communications - ending up going much faster.
This is the same for local filesystems as well, but not many think of
that.

On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
>
> Imagine I have a very fast hard drive that I want to use for the NameNode.
> That is, I want the NameNode to store its blocks information on this hard
> drive instead of in memory.
>
> Why would I do it? Scalability (no federation needed), many files are not a
> problem, and warm fail-over is automatic. What would I need to change in the
> NameNode to tell it to use the hard drive?
>
> Thank you,
> Mark

-- 
Harsh J

Re: Using a hard drive instead of

Posted by Harsh J <ha...@cloudera.com>.

Hi Mark,

Note that the NameNode does random memory access to serve back any
information or mutate request you send to it, and that there can be
several number of concurrent clients. So do you mean a 'very fast hard
drive' thats faster than the RAM for random access itself? The
NameNode does persist its block information onto disk for various
purposes, but to actually make the NameNode use disk storage
completely (and not specific parts of it disk-cached instead) wouldn't
make too much sense to me. That'd feel like trying to communicate with
a process thats swapping, performance-wise.

The too many files issue is bloated up to sound like its a NameNode
issue but it isn't in reality. HDFS allows you to process lots of
files really fast, aside of helping store them for long periods, and a
lot of tiny files only gets you down in such operations with overheads
of opening and closing files in the way of reading them all at a time.
With a single or a few large files, all you do is block (data) reads,
and very few NameNode communications - ending up going much faster.
This is the same for local filesystems as well, but not many think of
that.

On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
>
> Imagine I have a very fast hard drive that I want to use for the NameNode.
> That is, I want the NameNode to store its blocks information on this hard
> drive instead of in memory.
>
> Why would I do it? Scalability (no federation needed), many files are not a
> problem, and warm fail-over is automatic. What would I need to change in the
> NameNode to tell it to use the hard drive?
>
> Thank you,
> Mark

-- 
Harsh J

Re: Using a hard drive instead of

Posted by Colin Patrick McCabe <ra...@gmail.com>.

The direct answer to your question us to use this theoretical super-fast
hard drive as Linux swap space.

The better answer is to use federation or another solution if your needs
exceed those servable by a single NameNode.

Cheers.
Colin
On Oct 11, 2012 9:00 PM, "Mark Kerzner" <ma...@shmsoft.com> wrote:

> Hi,
>
> Imagine I have a very fast hard drive that I want to use for the NameNode.
> That is, I want the NameNode to store its blocks information on this hard
> drive instead of in memory.
>
> Why would I do it? Scalability (no federation needed), many files are not
> a problem, and warm fail-over is automatic. What would I need to change in
> the NameNode to tell it to use the hard drive?
>
> Thank you,
> Mark
>

Re: Using a hard drive instead of

Posted by Harsh J <ha...@cloudera.com>.

Hi Mark,

Note that the NameNode does random memory access to serve back any
information or mutate request you send to it, and that there can be
several number of concurrent clients. So do you mean a 'very fast hard
drive' thats faster than the RAM for random access itself? The
NameNode does persist its block information onto disk for various
purposes, but to actually make the NameNode use disk storage
completely (and not specific parts of it disk-cached instead) wouldn't
make too much sense to me. That'd feel like trying to communicate with
a process thats swapping, performance-wise.

The too many files issue is bloated up to sound like its a NameNode
issue but it isn't in reality. HDFS allows you to process lots of
files really fast, aside of helping store them for long periods, and a
lot of tiny files only gets you down in such operations with overheads
of opening and closing files in the way of reading them all at a time.
With a single or a few large files, all you do is block (data) reads,
and very few NameNode communications - ending up going much faster.
This is the same for local filesystems as well, but not many think of
that.

On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
>
> Imagine I have a very fast hard drive that I want to use for the NameNode.
> That is, I want the NameNode to store its blocks information on this hard
> drive instead of in memory.
>
> Why would I do it? Scalability (no federation needed), many files are not a
> problem, and warm fail-over is automatic. What would I need to change in the
> NameNode to tell it to use the hard drive?
>
> Thank you,
> Mark

-- 
Harsh J