You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Ted <r6...@gmail.com> on 2013/03/23 05:33:12 UTC

how to control (or understand) the memory usage in hdfs

Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
machines in a single node setup. I'm encountering out of memory errors
on the jvm running my data node.

I'm pretty sure I can just increase the heap size to fix the errors,
but my question is about how memory is actually used.

As an example, with other things like an OS's disk-cache or say
databases, if you have or let it use as an example 1gb of ram, it will
"work" with what it has available, if the data is more than 1gb of ram
it just means it'll swap in and out of memory/disk more often, i.e.
the cached data is smaller. If you give it 8gb of ram it still
functions the same, just performance increases.

With my hdfs setup, this does not appear to be true, if I allocate it
1gb of heap, it doesn't just perform worst / swap data to disk more.
It out right fails with out of memory and shuts the data node down.

So my question is... how do I really tune the memory / decide how much
memory I need to prevent shutdowns? Is 1gb just too small even on a
single machine test environment with almost no data at all, or is it
suppose to work like OS-disk caches were it always works but just
performs better or worst and I just have something configured wrong?.
Basically my objective isn't performance, it's that the server must
not shut itself down, it can slow down but not shut off.

-- 
Ted.

Re: how to control (or understand) the memory usage in hdfs

Posted by Ted <r6...@gmail.com>.
oh, really?

ulimit -n is 2048, I'd assumed that would be sufficient for just
testing on my machine. I was going to use 4096 in production.
my hdfs-site.xml has "dfs.datanode.max.xcievers" set to 4096.

As for my logs... there's a lot of "INFO" entries, I haven't gotten
around to configuring it down yet - I'm not quite sure why it's so
extensive at INFO level. My log files is 4.4gb (is this a sign I've
configured or done something wrong?)

I grep -v "INFO" in the log to get the actual error entry (assuming
the stack trace is actually is on the same line or else those stack
lines maybe misleading)
------------
2013-03-23 15:11:43,653 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
to:java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:691)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:133)
	at java.lang.Thread.run(Thread.java:722)

2013-03-23 15:11:44,177 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.InterruptedIOException: Interruped while waiting for IO on
channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
left.
	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:292)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:339)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:403)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:581)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
	at java.lang.Thread.run(Thread.java:722)

On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
> I'm guessing your OutOfMemory then is due to "Unable to create native
> thread" message? Do you mind sharing your error logs with us? Cause if
> its that, then its a ulimit/system limits issue and not a real memory
> issue.
>
> On Sat, Mar 23, 2013 at 2:30 PM, Ted <r6...@gmail.com> wrote:
>> I just checked and after running my tests, I generate only 670mb of
>> data, on 89 blocks.
>>
>> What's more, when I ran the test this time, I had increased my memory
>> to 2048mb so it completed fine - but I decided to run jconsole through
>> the test so I could see what's happenning. The data node never
>> exceeded 200mb of memory usage. It mostly stayed under 100mb.
>>
>> I'm not sure why it would complain about out of memory and shut itself
>> down when it was only 1024. It was fairly consistently doing that the
>> last few days including this morning right before I switched it to
>> 2048.
>>
>> I'm going to run the test again with 1024mb and jconsole running, none
>> of this makes any sense to me.
>>
>> On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
>>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>>> runs well for what load I apply on it.
>>>
>>> A DN's primary, growing memory consumption comes from the # of blocks
>>> it carries. All of these blocks' file paths are mapped and kept in the
>>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>>> now, like say close to a million or more, then 1 GB may not suffice
>>> anymore to hold them in and you'd need to scale up (add more RAM or
>>> increase heap size if you have more RAM)/scale out (add another node
>>> and run the balancer).
>>>
>>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>>> machines in a single node setup. I'm encountering out of memory errors
>>>> on the jvm running my data node.
>>>>
>>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>>> but my question is about how memory is actually used.
>>>>
>>>> As an example, with other things like an OS's disk-cache or say
>>>> databases, if you have or let it use as an example 1gb of ram, it will
>>>> "work" with what it has available, if the data is more than 1gb of ram
>>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>>> the cached data is smaller. If you give it 8gb of ram it still
>>>> functions the same, just performance increases.
>>>>
>>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>>> It out right fails with out of memory and shuts the data node down.
>>>>
>>>> So my question is... how do I really tune the memory / decide how much
>>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>>> single machine test environment with almost no data at all, or is it
>>>> suppose to work like OS-disk caches were it always works but just
>>>> performs better or worst and I just have something configured wrong?.
>>>> Basically my objective isn't performance, it's that the server must
>>>> not shut itself down, it can slow down but not shut off.
>>>>
>>>> --
>>>> Ted.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: how to control (or understand) the memory usage in hdfs

Posted by Ted <r6...@gmail.com>.
oh, really?

ulimit -n is 2048, I'd assumed that would be sufficient for just
testing on my machine. I was going to use 4096 in production.
my hdfs-site.xml has "dfs.datanode.max.xcievers" set to 4096.

As for my logs... there's a lot of "INFO" entries, I haven't gotten
around to configuring it down yet - I'm not quite sure why it's so
extensive at INFO level. My log files is 4.4gb (is this a sign I've
configured or done something wrong?)

I grep -v "INFO" in the log to get the actual error entry (assuming
the stack trace is actually is on the same line or else those stack
lines maybe misleading)
------------
2013-03-23 15:11:43,653 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
to:java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:691)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:133)
	at java.lang.Thread.run(Thread.java:722)

2013-03-23 15:11:44,177 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.InterruptedIOException: Interruped while waiting for IO on
channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
left.
	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:292)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:339)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:403)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:581)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
	at java.lang.Thread.run(Thread.java:722)

On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
> I'm guessing your OutOfMemory then is due to "Unable to create native
> thread" message? Do you mind sharing your error logs with us? Cause if
> its that, then its a ulimit/system limits issue and not a real memory
> issue.
>
> On Sat, Mar 23, 2013 at 2:30 PM, Ted <r6...@gmail.com> wrote:
>> I just checked and after running my tests, I generate only 670mb of
>> data, on 89 blocks.
>>
>> What's more, when I ran the test this time, I had increased my memory
>> to 2048mb so it completed fine - but I decided to run jconsole through
>> the test so I could see what's happenning. The data node never
>> exceeded 200mb of memory usage. It mostly stayed under 100mb.
>>
>> I'm not sure why it would complain about out of memory and shut itself
>> down when it was only 1024. It was fairly consistently doing that the
>> last few days including this morning right before I switched it to
>> 2048.
>>
>> I'm going to run the test again with 1024mb and jconsole running, none
>> of this makes any sense to me.
>>
>> On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
>>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>>> runs well for what load I apply on it.
>>>
>>> A DN's primary, growing memory consumption comes from the # of blocks
>>> it carries. All of these blocks' file paths are mapped and kept in the
>>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>>> now, like say close to a million or more, then 1 GB may not suffice
>>> anymore to hold them in and you'd need to scale up (add more RAM or
>>> increase heap size if you have more RAM)/scale out (add another node
>>> and run the balancer).
>>>
>>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>>> machines in a single node setup. I'm encountering out of memory errors
>>>> on the jvm running my data node.
>>>>
>>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>>> but my question is about how memory is actually used.
>>>>
>>>> As an example, with other things like an OS's disk-cache or say
>>>> databases, if you have or let it use as an example 1gb of ram, it will
>>>> "work" with what it has available, if the data is more than 1gb of ram
>>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>>> the cached data is smaller. If you give it 8gb of ram it still
>>>> functions the same, just performance increases.
>>>>
>>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>>> It out right fails with out of memory and shuts the data node down.
>>>>
>>>> So my question is... how do I really tune the memory / decide how much
>>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>>> single machine test environment with almost no data at all, or is it
>>>> suppose to work like OS-disk caches were it always works but just
>>>> performs better or worst and I just have something configured wrong?.
>>>> Basically my objective isn't performance, it's that the server must
>>>> not shut itself down, it can slow down but not shut off.
>>>>
>>>> --
>>>> Ted.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: how to control (or understand) the memory usage in hdfs

Posted by Ted <r6...@gmail.com>.
oh, really?

ulimit -n is 2048, I'd assumed that would be sufficient for just
testing on my machine. I was going to use 4096 in production.
my hdfs-site.xml has "dfs.datanode.max.xcievers" set to 4096.

As for my logs... there's a lot of "INFO" entries, I haven't gotten
around to configuring it down yet - I'm not quite sure why it's so
extensive at INFO level. My log files is 4.4gb (is this a sign I've
configured or done something wrong?)

I grep -v "INFO" in the log to get the actual error entry (assuming
the stack trace is actually is on the same line or else those stack
lines maybe misleading)
------------
2013-03-23 15:11:43,653 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
to:java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:691)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:133)
	at java.lang.Thread.run(Thread.java:722)

2013-03-23 15:11:44,177 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.InterruptedIOException: Interruped while waiting for IO on
channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
left.
	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:292)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:339)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:403)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:581)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
	at java.lang.Thread.run(Thread.java:722)

On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
> I'm guessing your OutOfMemory then is due to "Unable to create native
> thread" message? Do you mind sharing your error logs with us? Cause if
> its that, then its a ulimit/system limits issue and not a real memory
> issue.
>
> On Sat, Mar 23, 2013 at 2:30 PM, Ted <r6...@gmail.com> wrote:
>> I just checked and after running my tests, I generate only 670mb of
>> data, on 89 blocks.
>>
>> What's more, when I ran the test this time, I had increased my memory
>> to 2048mb so it completed fine - but I decided to run jconsole through
>> the test so I could see what's happenning. The data node never
>> exceeded 200mb of memory usage. It mostly stayed under 100mb.
>>
>> I'm not sure why it would complain about out of memory and shut itself
>> down when it was only 1024. It was fairly consistently doing that the
>> last few days including this morning right before I switched it to
>> 2048.
>>
>> I'm going to run the test again with 1024mb and jconsole running, none
>> of this makes any sense to me.
>>
>> On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
>>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>>> runs well for what load I apply on it.
>>>
>>> A DN's primary, growing memory consumption comes from the # of blocks
>>> it carries. All of these blocks' file paths are mapped and kept in the
>>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>>> now, like say close to a million or more, then 1 GB may not suffice
>>> anymore to hold them in and you'd need to scale up (add more RAM or
>>> increase heap size if you have more RAM)/scale out (add another node
>>> and run the balancer).
>>>
>>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>>> machines in a single node setup. I'm encountering out of memory errors
>>>> on the jvm running my data node.
>>>>
>>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>>> but my question is about how memory is actually used.
>>>>
>>>> As an example, with other things like an OS's disk-cache or say
>>>> databases, if you have or let it use as an example 1gb of ram, it will
>>>> "work" with what it has available, if the data is more than 1gb of ram
>>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>>> the cached data is smaller. If you give it 8gb of ram it still
>>>> functions the same, just performance increases.
>>>>
>>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>>> It out right fails with out of memory and shuts the data node down.
>>>>
>>>> So my question is... how do I really tune the memory / decide how much
>>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>>> single machine test environment with almost no data at all, or is it
>>>> suppose to work like OS-disk caches were it always works but just
>>>> performs better or worst and I just have something configured wrong?.
>>>> Basically my objective isn't performance, it's that the server must
>>>> not shut itself down, it can slow down but not shut off.
>>>>
>>>> --
>>>> Ted.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: how to control (or understand) the memory usage in hdfs

Posted by Ted <r6...@gmail.com>.
oh, really?

ulimit -n is 2048, I'd assumed that would be sufficient for just
testing on my machine. I was going to use 4096 in production.
my hdfs-site.xml has "dfs.datanode.max.xcievers" set to 4096.

As for my logs... there's a lot of "INFO" entries, I haven't gotten
around to configuring it down yet - I'm not quite sure why it's so
extensive at INFO level. My log files is 4.4gb (is this a sign I've
configured or done something wrong?)

I grep -v "INFO" in the log to get the actual error entry (assuming
the stack trace is actually is on the same line or else those stack
lines maybe misleading)
------------
2013-03-23 15:11:43,653 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
to:java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:691)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:133)
	at java.lang.Thread.run(Thread.java:722)

2013-03-23 15:11:44,177 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.InterruptedIOException: Interruped while waiting for IO on
channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
left.
	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at java.io.DataInputStream.read(DataInputStream.java:149)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:292)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:339)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:403)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:581)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
	at java.lang.Thread.run(Thread.java:722)

On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
> I'm guessing your OutOfMemory then is due to "Unable to create native
> thread" message? Do you mind sharing your error logs with us? Cause if
> its that, then its a ulimit/system limits issue and not a real memory
> issue.
>
> On Sat, Mar 23, 2013 at 2:30 PM, Ted <r6...@gmail.com> wrote:
>> I just checked and after running my tests, I generate only 670mb of
>> data, on 89 blocks.
>>
>> What's more, when I ran the test this time, I had increased my memory
>> to 2048mb so it completed fine - but I decided to run jconsole through
>> the test so I could see what's happenning. The data node never
>> exceeded 200mb of memory usage. It mostly stayed under 100mb.
>>
>> I'm not sure why it would complain about out of memory and shut itself
>> down when it was only 1024. It was fairly consistently doing that the
>> last few days including this morning right before I switched it to
>> 2048.
>>
>> I'm going to run the test again with 1024mb and jconsole running, none
>> of this makes any sense to me.
>>
>> On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
>>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>>> runs well for what load I apply on it.
>>>
>>> A DN's primary, growing memory consumption comes from the # of blocks
>>> it carries. All of these blocks' file paths are mapped and kept in the
>>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>>> now, like say close to a million or more, then 1 GB may not suffice
>>> anymore to hold them in and you'd need to scale up (add more RAM or
>>> increase heap size if you have more RAM)/scale out (add another node
>>> and run the balancer).
>>>
>>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>>> machines in a single node setup. I'm encountering out of memory errors
>>>> on the jvm running my data node.
>>>>
>>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>>> but my question is about how memory is actually used.
>>>>
>>>> As an example, with other things like an OS's disk-cache or say
>>>> databases, if you have or let it use as an example 1gb of ram, it will
>>>> "work" with what it has available, if the data is more than 1gb of ram
>>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>>> the cached data is smaller. If you give it 8gb of ram it still
>>>> functions the same, just performance increases.
>>>>
>>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>>> It out right fails with out of memory and shuts the data node down.
>>>>
>>>> So my question is... how do I really tune the memory / decide how much
>>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>>> single machine test environment with almost no data at all, or is it
>>>> suppose to work like OS-disk caches were it always works but just
>>>> performs better or worst and I just have something configured wrong?.
>>>> Basically my objective isn't performance, it's that the server must
>>>> not shut itself down, it can slow down but not shut off.
>>>>
>>>> --
>>>> Ted.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: how to control (or understand) the memory usage in hdfs

Posted by Harsh J <ha...@cloudera.com>.
I'm guessing your OutOfMemory then is due to "Unable to create native
thread" message? Do you mind sharing your error logs with us? Cause if
its that, then its a ulimit/system limits issue and not a real memory
issue.

On Sat, Mar 23, 2013 at 2:30 PM, Ted <r6...@gmail.com> wrote:
> I just checked and after running my tests, I generate only 670mb of
> data, on 89 blocks.
>
> What's more, when I ran the test this time, I had increased my memory
> to 2048mb so it completed fine - but I decided to run jconsole through
> the test so I could see what's happenning. The data node never
> exceeded 200mb of memory usage. It mostly stayed under 100mb.
>
> I'm not sure why it would complain about out of memory and shut itself
> down when it was only 1024. It was fairly consistently doing that the
> last few days including this morning right before I switched it to
> 2048.
>
> I'm going to run the test again with 1024mb and jconsole running, none
> of this makes any sense to me.
>
> On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>> runs well for what load I apply on it.
>>
>> A DN's primary, growing memory consumption comes from the # of blocks
>> it carries. All of these blocks' file paths are mapped and kept in the
>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>> now, like say close to a million or more, then 1 GB may not suffice
>> anymore to hold them in and you'd need to scale up (add more RAM or
>> increase heap size if you have more RAM)/scale out (add another node
>> and run the balancer).
>>
>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>> machines in a single node setup. I'm encountering out of memory errors
>>> on the jvm running my data node.
>>>
>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>> but my question is about how memory is actually used.
>>>
>>> As an example, with other things like an OS's disk-cache or say
>>> databases, if you have or let it use as an example 1gb of ram, it will
>>> "work" with what it has available, if the data is more than 1gb of ram
>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>> the cached data is smaller. If you give it 8gb of ram it still
>>> functions the same, just performance increases.
>>>
>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>> It out right fails with out of memory and shuts the data node down.
>>>
>>> So my question is... how do I really tune the memory / decide how much
>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>> single machine test environment with almost no data at all, or is it
>>> suppose to work like OS-disk caches were it always works but just
>>> performs better or worst and I just have something configured wrong?.
>>> Basically my objective isn't performance, it's that the server must
>>> not shut itself down, it can slow down but not shut off.
>>>
>>> --
>>> Ted.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
> --
> Ted.



-- 
Harsh J

Re: how to control (or understand) the memory usage in hdfs

Posted by Harsh J <ha...@cloudera.com>.
I'm guessing your OutOfMemory then is due to "Unable to create native
thread" message? Do you mind sharing your error logs with us? Cause if
its that, then its a ulimit/system limits issue and not a real memory
issue.

On Sat, Mar 23, 2013 at 2:30 PM, Ted <r6...@gmail.com> wrote:
> I just checked and after running my tests, I generate only 670mb of
> data, on 89 blocks.
>
> What's more, when I ran the test this time, I had increased my memory
> to 2048mb so it completed fine - but I decided to run jconsole through
> the test so I could see what's happenning. The data node never
> exceeded 200mb of memory usage. It mostly stayed under 100mb.
>
> I'm not sure why it would complain about out of memory and shut itself
> down when it was only 1024. It was fairly consistently doing that the
> last few days including this morning right before I switched it to
> 2048.
>
> I'm going to run the test again with 1024mb and jconsole running, none
> of this makes any sense to me.
>
> On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>> runs well for what load I apply on it.
>>
>> A DN's primary, growing memory consumption comes from the # of blocks
>> it carries. All of these blocks' file paths are mapped and kept in the
>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>> now, like say close to a million or more, then 1 GB may not suffice
>> anymore to hold them in and you'd need to scale up (add more RAM or
>> increase heap size if you have more RAM)/scale out (add another node
>> and run the balancer).
>>
>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>> machines in a single node setup. I'm encountering out of memory errors
>>> on the jvm running my data node.
>>>
>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>> but my question is about how memory is actually used.
>>>
>>> As an example, with other things like an OS's disk-cache or say
>>> databases, if you have or let it use as an example 1gb of ram, it will
>>> "work" with what it has available, if the data is more than 1gb of ram
>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>> the cached data is smaller. If you give it 8gb of ram it still
>>> functions the same, just performance increases.
>>>
>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>> It out right fails with out of memory and shuts the data node down.
>>>
>>> So my question is... how do I really tune the memory / decide how much
>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>> single machine test environment with almost no data at all, or is it
>>> suppose to work like OS-disk caches were it always works but just
>>> performs better or worst and I just have something configured wrong?.
>>> Basically my objective isn't performance, it's that the server must
>>> not shut itself down, it can slow down but not shut off.
>>>
>>> --
>>> Ted.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
> --
> Ted.



-- 
Harsh J

Re: how to control (or understand) the memory usage in hdfs

Posted by Harsh J <ha...@cloudera.com>.
I'm guessing your OutOfMemory then is due to "Unable to create native
thread" message? Do you mind sharing your error logs with us? Cause if
its that, then its a ulimit/system limits issue and not a real memory
issue.

On Sat, Mar 23, 2013 at 2:30 PM, Ted <r6...@gmail.com> wrote:
> I just checked and after running my tests, I generate only 670mb of
> data, on 89 blocks.
>
> What's more, when I ran the test this time, I had increased my memory
> to 2048mb so it completed fine - but I decided to run jconsole through
> the test so I could see what's happenning. The data node never
> exceeded 200mb of memory usage. It mostly stayed under 100mb.
>
> I'm not sure why it would complain about out of memory and shut itself
> down when it was only 1024. It was fairly consistently doing that the
> last few days including this morning right before I switched it to
> 2048.
>
> I'm going to run the test again with 1024mb and jconsole running, none
> of this makes any sense to me.
>
> On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>> runs well for what load I apply on it.
>>
>> A DN's primary, growing memory consumption comes from the # of blocks
>> it carries. All of these blocks' file paths are mapped and kept in the
>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>> now, like say close to a million or more, then 1 GB may not suffice
>> anymore to hold them in and you'd need to scale up (add more RAM or
>> increase heap size if you have more RAM)/scale out (add another node
>> and run the balancer).
>>
>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>> machines in a single node setup. I'm encountering out of memory errors
>>> on the jvm running my data node.
>>>
>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>> but my question is about how memory is actually used.
>>>
>>> As an example, with other things like an OS's disk-cache or say
>>> databases, if you have or let it use as an example 1gb of ram, it will
>>> "work" with what it has available, if the data is more than 1gb of ram
>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>> the cached data is smaller. If you give it 8gb of ram it still
>>> functions the same, just performance increases.
>>>
>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>> It out right fails with out of memory and shuts the data node down.
>>>
>>> So my question is... how do I really tune the memory / decide how much
>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>> single machine test environment with almost no data at all, or is it
>>> suppose to work like OS-disk caches were it always works but just
>>> performs better or worst and I just have something configured wrong?.
>>> Basically my objective isn't performance, it's that the server must
>>> not shut itself down, it can slow down but not shut off.
>>>
>>> --
>>> Ted.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
> --
> Ted.



-- 
Harsh J

Re: how to control (or understand) the memory usage in hdfs

Posted by Harsh J <ha...@cloudera.com>.
I'm guessing your OutOfMemory then is due to "Unable to create native
thread" message? Do you mind sharing your error logs with us? Cause if
its that, then its a ulimit/system limits issue and not a real memory
issue.

On Sat, Mar 23, 2013 at 2:30 PM, Ted <r6...@gmail.com> wrote:
> I just checked and after running my tests, I generate only 670mb of
> data, on 89 blocks.
>
> What's more, when I ran the test this time, I had increased my memory
> to 2048mb so it completed fine - but I decided to run jconsole through
> the test so I could see what's happenning. The data node never
> exceeded 200mb of memory usage. It mostly stayed under 100mb.
>
> I'm not sure why it would complain about out of memory and shut itself
> down when it was only 1024. It was fairly consistently doing that the
> last few days including this morning right before I switched it to
> 2048.
>
> I'm going to run the test again with 1024mb and jconsole running, none
> of this makes any sense to me.
>
> On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
>> I run a 128 MB heap size DN for my simple purposes on my Mac and it
>> runs well for what load I apply on it.
>>
>> A DN's primary, growing memory consumption comes from the # of blocks
>> it carries. All of these blocks' file paths are mapped and kept in the
>> RAM during its lifetime. If your DN has acquired a lot of blocks by
>> now, like say close to a million or more, then 1 GB may not suffice
>> anymore to hold them in and you'd need to scale up (add more RAM or
>> increase heap size if you have more RAM)/scale out (add another node
>> and run the balancer).
>>
>> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>>> machines in a single node setup. I'm encountering out of memory errors
>>> on the jvm running my data node.
>>>
>>> I'm pretty sure I can just increase the heap size to fix the errors,
>>> but my question is about how memory is actually used.
>>>
>>> As an example, with other things like an OS's disk-cache or say
>>> databases, if you have or let it use as an example 1gb of ram, it will
>>> "work" with what it has available, if the data is more than 1gb of ram
>>> it just means it'll swap in and out of memory/disk more often, i.e.
>>> the cached data is smaller. If you give it 8gb of ram it still
>>> functions the same, just performance increases.
>>>
>>> With my hdfs setup, this does not appear to be true, if I allocate it
>>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>>> It out right fails with out of memory and shuts the data node down.
>>>
>>> So my question is... how do I really tune the memory / decide how much
>>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>>> single machine test environment with almost no data at all, or is it
>>> suppose to work like OS-disk caches were it always works but just
>>> performs better or worst and I just have something configured wrong?.
>>> Basically my objective isn't performance, it's that the server must
>>> not shut itself down, it can slow down but not shut off.
>>>
>>> --
>>> Ted.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
> --
> Ted.



-- 
Harsh J

Re: how to control (or understand) the memory usage in hdfs

Posted by Ted <r6...@gmail.com>.
I just checked and after running my tests, I generate only 670mb of
data, on 89 blocks.

What's more, when I ran the test this time, I had increased my memory
to 2048mb so it completed fine - but I decided to run jconsole through
the test so I could see what's happenning. The data node never
exceeded 200mb of memory usage. It mostly stayed under 100mb.

I'm not sure why it would complain about out of memory and shut itself
down when it was only 1024. It was fairly consistently doing that the
last few days including this morning right before I switched it to
2048.

I'm going to run the test again with 1024mb and jconsole running, none
of this makes any sense to me.

On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
> I run a 128 MB heap size DN for my simple purposes on my Mac and it
> runs well for what load I apply on it.
>
> A DN's primary, growing memory consumption comes from the # of blocks
> it carries. All of these blocks' file paths are mapped and kept in the
> RAM during its lifetime. If your DN has acquired a lot of blocks by
> now, like say close to a million or more, then 1 GB may not suffice
> anymore to hold them in and you'd need to scale up (add more RAM or
> increase heap size if you have more RAM)/scale out (add another node
> and run the balancer).
>
> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>> machines in a single node setup. I'm encountering out of memory errors
>> on the jvm running my data node.
>>
>> I'm pretty sure I can just increase the heap size to fix the errors,
>> but my question is about how memory is actually used.
>>
>> As an example, with other things like an OS's disk-cache or say
>> databases, if you have or let it use as an example 1gb of ram, it will
>> "work" with what it has available, if the data is more than 1gb of ram
>> it just means it'll swap in and out of memory/disk more often, i.e.
>> the cached data is smaller. If you give it 8gb of ram it still
>> functions the same, just performance increases.
>>
>> With my hdfs setup, this does not appear to be true, if I allocate it
>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>> It out right fails with out of memory and shuts the data node down.
>>
>> So my question is... how do I really tune the memory / decide how much
>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>> single machine test environment with almost no data at all, or is it
>> suppose to work like OS-disk caches were it always works but just
>> performs better or worst and I just have something configured wrong?.
>> Basically my objective isn't performance, it's that the server must
>> not shut itself down, it can slow down but not shut off.
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: how to control (or understand) the memory usage in hdfs

Posted by Ted <r6...@gmail.com>.
I just checked and after running my tests, I generate only 670mb of
data, on 89 blocks.

What's more, when I ran the test this time, I had increased my memory
to 2048mb so it completed fine - but I decided to run jconsole through
the test so I could see what's happenning. The data node never
exceeded 200mb of memory usage. It mostly stayed under 100mb.

I'm not sure why it would complain about out of memory and shut itself
down when it was only 1024. It was fairly consistently doing that the
last few days including this morning right before I switched it to
2048.

I'm going to run the test again with 1024mb and jconsole running, none
of this makes any sense to me.

On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
> I run a 128 MB heap size DN for my simple purposes on my Mac and it
> runs well for what load I apply on it.
>
> A DN's primary, growing memory consumption comes from the # of blocks
> it carries. All of these blocks' file paths are mapped and kept in the
> RAM during its lifetime. If your DN has acquired a lot of blocks by
> now, like say close to a million or more, then 1 GB may not suffice
> anymore to hold them in and you'd need to scale up (add more RAM or
> increase heap size if you have more RAM)/scale out (add another node
> and run the balancer).
>
> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>> machines in a single node setup. I'm encountering out of memory errors
>> on the jvm running my data node.
>>
>> I'm pretty sure I can just increase the heap size to fix the errors,
>> but my question is about how memory is actually used.
>>
>> As an example, with other things like an OS's disk-cache or say
>> databases, if you have or let it use as an example 1gb of ram, it will
>> "work" with what it has available, if the data is more than 1gb of ram
>> it just means it'll swap in and out of memory/disk more often, i.e.
>> the cached data is smaller. If you give it 8gb of ram it still
>> functions the same, just performance increases.
>>
>> With my hdfs setup, this does not appear to be true, if I allocate it
>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>> It out right fails with out of memory and shuts the data node down.
>>
>> So my question is... how do I really tune the memory / decide how much
>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>> single machine test environment with almost no data at all, or is it
>> suppose to work like OS-disk caches were it always works but just
>> performs better or worst and I just have something configured wrong?.
>> Basically my objective isn't performance, it's that the server must
>> not shut itself down, it can slow down but not shut off.
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: how to control (or understand) the memory usage in hdfs

Posted by Ted <r6...@gmail.com>.
I just checked and after running my tests, I generate only 670mb of
data, on 89 blocks.

What's more, when I ran the test this time, I had increased my memory
to 2048mb so it completed fine - but I decided to run jconsole through
the test so I could see what's happenning. The data node never
exceeded 200mb of memory usage. It mostly stayed under 100mb.

I'm not sure why it would complain about out of memory and shut itself
down when it was only 1024. It was fairly consistently doing that the
last few days including this morning right before I switched it to
2048.

I'm going to run the test again with 1024mb and jconsole running, none
of this makes any sense to me.

On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
> I run a 128 MB heap size DN for my simple purposes on my Mac and it
> runs well for what load I apply on it.
>
> A DN's primary, growing memory consumption comes from the # of blocks
> it carries. All of these blocks' file paths are mapped and kept in the
> RAM during its lifetime. If your DN has acquired a lot of blocks by
> now, like say close to a million or more, then 1 GB may not suffice
> anymore to hold them in and you'd need to scale up (add more RAM or
> increase heap size if you have more RAM)/scale out (add another node
> and run the balancer).
>
> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>> machines in a single node setup. I'm encountering out of memory errors
>> on the jvm running my data node.
>>
>> I'm pretty sure I can just increase the heap size to fix the errors,
>> but my question is about how memory is actually used.
>>
>> As an example, with other things like an OS's disk-cache or say
>> databases, if you have or let it use as an example 1gb of ram, it will
>> "work" with what it has available, if the data is more than 1gb of ram
>> it just means it'll swap in and out of memory/disk more often, i.e.
>> the cached data is smaller. If you give it 8gb of ram it still
>> functions the same, just performance increases.
>>
>> With my hdfs setup, this does not appear to be true, if I allocate it
>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>> It out right fails with out of memory and shuts the data node down.
>>
>> So my question is... how do I really tune the memory / decide how much
>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>> single machine test environment with almost no data at all, or is it
>> suppose to work like OS-disk caches were it always works but just
>> performs better or worst and I just have something configured wrong?.
>> Basically my objective isn't performance, it's that the server must
>> not shut itself down, it can slow down but not shut off.
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: how to control (or understand) the memory usage in hdfs

Posted by Ted <r6...@gmail.com>.
I just checked and after running my tests, I generate only 670mb of
data, on 89 blocks.

What's more, when I ran the test this time, I had increased my memory
to 2048mb so it completed fine - but I decided to run jconsole through
the test so I could see what's happenning. The data node never
exceeded 200mb of memory usage. It mostly stayed under 100mb.

I'm not sure why it would complain about out of memory and shut itself
down when it was only 1024. It was fairly consistently doing that the
last few days including this morning right before I switched it to
2048.

I'm going to run the test again with 1024mb and jconsole running, none
of this makes any sense to me.

On 3/23/13, Harsh J <ha...@cloudera.com> wrote:
> I run a 128 MB heap size DN for my simple purposes on my Mac and it
> runs well for what load I apply on it.
>
> A DN's primary, growing memory consumption comes from the # of blocks
> it carries. All of these blocks' file paths are mapped and kept in the
> RAM during its lifetime. If your DN has acquired a lot of blocks by
> now, like say close to a million or more, then 1 GB may not suffice
> anymore to hold them in and you'd need to scale up (add more RAM or
> increase heap size if you have more RAM)/scale out (add another node
> and run the balancer).
>
> On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
>> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
>> machines in a single node setup. I'm encountering out of memory errors
>> on the jvm running my data node.
>>
>> I'm pretty sure I can just increase the heap size to fix the errors,
>> but my question is about how memory is actually used.
>>
>> As an example, with other things like an OS's disk-cache or say
>> databases, if you have or let it use as an example 1gb of ram, it will
>> "work" with what it has available, if the data is more than 1gb of ram
>> it just means it'll swap in and out of memory/disk more often, i.e.
>> the cached data is smaller. If you give it 8gb of ram it still
>> functions the same, just performance increases.
>>
>> With my hdfs setup, this does not appear to be true, if I allocate it
>> 1gb of heap, it doesn't just perform worst / swap data to disk more.
>> It out right fails with out of memory and shuts the data node down.
>>
>> So my question is... how do I really tune the memory / decide how much
>> memory I need to prevent shutdowns? Is 1gb just too small even on a
>> single machine test environment with almost no data at all, or is it
>> suppose to work like OS-disk caches were it always works but just
>> performs better or worst and I just have something configured wrong?.
>> Basically my objective isn't performance, it's that the server must
>> not shut itself down, it can slow down but not shut off.
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: how to control (or understand) the memory usage in hdfs

Posted by Harsh J <ha...@cloudera.com>.
I run a 128 MB heap size DN for my simple purposes on my Mac and it
runs well for what load I apply on it.

A DN's primary, growing memory consumption comes from the # of blocks
it carries. All of these blocks' file paths are mapped and kept in the
RAM during its lifetime. If your DN has acquired a lot of blocks by
now, like say close to a million or more, then 1 GB may not suffice
anymore to hold them in and you'd need to scale up (add more RAM or
increase heap size if you have more RAM)/scale out (add another node
and run the balancer).

On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
> machines in a single node setup. I'm encountering out of memory errors
> on the jvm running my data node.
>
> I'm pretty sure I can just increase the heap size to fix the errors,
> but my question is about how memory is actually used.
>
> As an example, with other things like an OS's disk-cache or say
> databases, if you have or let it use as an example 1gb of ram, it will
> "work" with what it has available, if the data is more than 1gb of ram
> it just means it'll swap in and out of memory/disk more often, i.e.
> the cached data is smaller. If you give it 8gb of ram it still
> functions the same, just performance increases.
>
> With my hdfs setup, this does not appear to be true, if I allocate it
> 1gb of heap, it doesn't just perform worst / swap data to disk more.
> It out right fails with out of memory and shuts the data node down.
>
> So my question is... how do I really tune the memory / decide how much
> memory I need to prevent shutdowns? Is 1gb just too small even on a
> single machine test environment with almost no data at all, or is it
> suppose to work like OS-disk caches were it always works but just
> performs better or worst and I just have something configured wrong?.
> Basically my objective isn't performance, it's that the server must
> not shut itself down, it can slow down but not shut off.
>
> --
> Ted.



-- 
Harsh J

Re: how to control (or understand) the memory usage in hdfs

Posted by Harsh J <ha...@cloudera.com>.
I run a 128 MB heap size DN for my simple purposes on my Mac and it
runs well for what load I apply on it.

A DN's primary, growing memory consumption comes from the # of blocks
it carries. All of these blocks' file paths are mapped and kept in the
RAM during its lifetime. If your DN has acquired a lot of blocks by
now, like say close to a million or more, then 1 GB may not suffice
anymore to hold them in and you'd need to scale up (add more RAM or
increase heap size if you have more RAM)/scale out (add another node
and run the balancer).

On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
> machines in a single node setup. I'm encountering out of memory errors
> on the jvm running my data node.
>
> I'm pretty sure I can just increase the heap size to fix the errors,
> but my question is about how memory is actually used.
>
> As an example, with other things like an OS's disk-cache or say
> databases, if you have or let it use as an example 1gb of ram, it will
> "work" with what it has available, if the data is more than 1gb of ram
> it just means it'll swap in and out of memory/disk more often, i.e.
> the cached data is smaller. If you give it 8gb of ram it still
> functions the same, just performance increases.
>
> With my hdfs setup, this does not appear to be true, if I allocate it
> 1gb of heap, it doesn't just perform worst / swap data to disk more.
> It out right fails with out of memory and shuts the data node down.
>
> So my question is... how do I really tune the memory / decide how much
> memory I need to prevent shutdowns? Is 1gb just too small even on a
> single machine test environment with almost no data at all, or is it
> suppose to work like OS-disk caches were it always works but just
> performs better or worst and I just have something configured wrong?.
> Basically my objective isn't performance, it's that the server must
> not shut itself down, it can slow down but not shut off.
>
> --
> Ted.



-- 
Harsh J

Re: how to control (or understand) the memory usage in hdfs

Posted by Harsh J <ha...@cloudera.com>.
I run a 128 MB heap size DN for my simple purposes on my Mac and it
runs well for what load I apply on it.

A DN's primary, growing memory consumption comes from the # of blocks
it carries. All of these blocks' file paths are mapped and kept in the
RAM during its lifetime. If your DN has acquired a lot of blocks by
now, like say close to a million or more, then 1 GB may not suffice
anymore to hold them in and you'd need to scale up (add more RAM or
increase heap size if you have more RAM)/scale out (add another node
and run the balancer).

On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
> machines in a single node setup. I'm encountering out of memory errors
> on the jvm running my data node.
>
> I'm pretty sure I can just increase the heap size to fix the errors,
> but my question is about how memory is actually used.
>
> As an example, with other things like an OS's disk-cache or say
> databases, if you have or let it use as an example 1gb of ram, it will
> "work" with what it has available, if the data is more than 1gb of ram
> it just means it'll swap in and out of memory/disk more often, i.e.
> the cached data is smaller. If you give it 8gb of ram it still
> functions the same, just performance increases.
>
> With my hdfs setup, this does not appear to be true, if I allocate it
> 1gb of heap, it doesn't just perform worst / swap data to disk more.
> It out right fails with out of memory and shuts the data node down.
>
> So my question is... how do I really tune the memory / decide how much
> memory I need to prevent shutdowns? Is 1gb just too small even on a
> single machine test environment with almost no data at all, or is it
> suppose to work like OS-disk caches were it always works but just
> performs better or worst and I just have something configured wrong?.
> Basically my objective isn't performance, it's that the server must
> not shut itself down, it can slow down but not shut off.
>
> --
> Ted.



-- 
Harsh J

Re: how to control (or understand) the memory usage in hdfs

Posted by Harsh J <ha...@cloudera.com>.
I run a 128 MB heap size DN for my simple purposes on my Mac and it
runs well for what load I apply on it.

A DN's primary, growing memory consumption comes from the # of blocks
it carries. All of these blocks' file paths are mapped and kept in the
RAM during its lifetime. If your DN has acquired a lot of blocks by
now, like say close to a million or more, then 1 GB may not suffice
anymore to hold them in and you'd need to scale up (add more RAM or
increase heap size if you have more RAM)/scale out (add another node
and run the balancer).

On Sat, Mar 23, 2013 at 10:03 AM, Ted <r6...@gmail.com> wrote:
> Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
> machines in a single node setup. I'm encountering out of memory errors
> on the jvm running my data node.
>
> I'm pretty sure I can just increase the heap size to fix the errors,
> but my question is about how memory is actually used.
>
> As an example, with other things like an OS's disk-cache or say
> databases, if you have or let it use as an example 1gb of ram, it will
> "work" with what it has available, if the data is more than 1gb of ram
> it just means it'll swap in and out of memory/disk more often, i.e.
> the cached data is smaller. If you give it 8gb of ram it still
> functions the same, just performance increases.
>
> With my hdfs setup, this does not appear to be true, if I allocate it
> 1gb of heap, it doesn't just perform worst / swap data to disk more.
> It out right fails with out of memory and shuts the data node down.
>
> So my question is... how do I really tune the memory / decide how much
> memory I need to prevent shutdowns? Is 1gb just too small even on a
> single machine test environment with almost no data at all, or is it
> suppose to work like OS-disk caches were it always works but just
> performs better or worst and I just have something configured wrong?.
> Basically my objective isn't performance, it's that the server must
> not shut itself down, it can slow down but not shut off.
>
> --
> Ted.



-- 
Harsh J