You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by baran cakici <ba...@gmail.com> on 2011/05/02 13:37:34 UTC

Re: Configuration for small Cluster

any comments???

2011/4/28 baran cakici <ba...@gmail.com>

> Hi Everyone,
>
> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo 2
> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if my
> Configuration is correctly. Can you please just look, if it is ok?
>
> -mapred-site.xml
>
> <property>
> <name>mapred.job.tracker</name>
> <value>apple:9001</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx512m -server</value>
> </property>
>
> <property>
> <name>mapred.job.tracker.handler.count</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.local.dir</name>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> </property>
>
> <property>
> <name>mapred.map.tasks</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.submit.replication</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.system.dir</name>
>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.indexcache.mb</name>
> <value>10</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.temp.dir</name>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> </property>
>
> <property>
> <name>webinterface.private.actions</name>
> <value>true</value>
> </property>
>
> <property>
> <name>mapred.reduce.slowstart.completed.maps</name>
> <value>0.01</value>
> </property>
>
> -hdfs-site.xml
>
> <property>
> <name>dfs.block.size</name>
> <value>268435456</value>
> </property>
> PS: I extended dfs.block.size, because I won 50% better performance with
> this change.
>
> I am waiting for your comments...
>
> Regards,
>
> Baran
>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

Hi,

Actually it is not about security level of Linux. They use some Application
for Security that just with Windows compatible, it was about that. I am not
against to Linux too :)

Regards,

Baran

2011/5/3 hadoopman <ha...@gmail.com>

> I would dispute the assertion that linux isn't secure.  I'm an MCSE and
> AIX Unix certified.  I can setup windows servers that are very secure
> (and in secure).  The same thing goes for Unix and Linux servers.
> Depends who's hands are on the keyboard imo :D
>
> If it was me, I would replace the celerons and increase ram.  Windows
> likes more hardware when improving performance.  Also anything that uses
> emulation or creates an application layer (like cygwin or wine) I
> wouldn't expect anything that borders on performance.  Sure there are
> tweaks that can be made however it still will be shy of what can be
> pulled out of a system (again imo).
>
> Good Luck with it.
>
>
>
>
> On 05/03/2011 04:12 AM, baran cakici wrote:
>
>> Hi,
>> I make this System at Work. For the Security Reasons I cant use Linux at
>> the Company. They prefer to use Windows.
>> thanks,
>> Baran
>>
>>
>

Re: Configuration for small Cluster

Posted by hadoopman <ha...@gmail.com>.

I would dispute the assertion that linux isn't secure.  I'm an MCSE and
AIX Unix certified.  I can setup windows servers that are very secure
(and in secure).  The same thing goes for Unix and Linux servers.
Depends who's hands are on the keyboard imo :D

If it was me, I would replace the celerons and increase ram.  Windows
likes more hardware when improving performance.  Also anything that uses
emulation or creates an application layer (like cygwin or wine) I
wouldn't expect anything that borders on performance.  Sure there are
tweaks that can be made however it still will be shy of what can be
pulled out of a system (again imo).

Good Luck with it.

On 05/03/2011 04:12 AM, baran cakici wrote:
> Hi,
> I make this System at Work. For the Security Reasons I cant use Linux 
> at the Company. They prefer to use Windows.
> thanks,
> Baran
>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

Hi,

I make this System at Work. For the Security Reasons I cant use Linux at the
Company. They prefer to use Windows.

thanks,

Baran

2011/5/3 hadoopman <ha...@gmail.com>

> I'm curious if there is a compelling reason for running it under cygwin
> instead of linux.
>
> I'm also concerned with the celeron and 2 gig ram systems.  Sounds rather
> low end for performance.
>
> Just a couple things that stand out to me.
>
> thanks!
>
>
>
> On 05/02/2011 05:37 AM, baran cakici wrote:
>
> any comments???
>
> 2011/4/28 baran cakici <ba...@gmail.com>
>
>> Hi Everyone,
>>
>> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo
>> 2 GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if my
>> Configuration is correctly. Can you please just look, if it is ok?
>>
>> -mapred-site.xml
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>apple:9001</value>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx512m -server</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker.handler.count</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>> </property>
>>
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.submit.replication</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.system.dir</name>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.indexcache.mb</name>
>> <value>10</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.temp.dir</name>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>> </property>
>>
>> <property>
>> <name>webinterface.private.actions</name>
>> <value>true</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.slowstart.completed.maps</name>
>> <value>0.01</value>
>> </property>
>>
>> -hdfs-site.xml
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>268435456</value>
>> </property>
>> PS: I extended dfs.block.size, because I won 50% better performance with
>> this change.
>>
>> I am waiting for your comments...
>>
>> Regards,
>>
>> Baran
>>
>
>
>

Re: Configuration for small Cluster

Posted by hadoopman <ha...@gmail.com>.

I'm curious if there is a compelling reason for running it under cygwin 
instead of linux.

I'm also concerned with the celeron and 2 gig ram systems.  Sounds 
rather low end for performance.

Just a couple things that stand out to me.

thanks!


On 05/02/2011 05:37 AM, baran cakici wrote:
> any comments???
>
> 2011/4/28 baran cakici <barancakici@gmail.com 
> <ma...@gmail.com>>
>
>     Hi Everyone,
>     I have a Cluster with one Master(JobTracker and NameNode - Intel
>     Core2Duo 2 GB Ram) and four Slaves(Datanode and Tasktracker -
>     Celeron 2 GB Ram). My Inputdata are between 2GB-10GB and I read
>     Inputdata in MapReduce line by line. Now, I try to accelerate my
>     System(Benchmark), but I'm not sure, if my Configuration
>     is correctly. Can you please just look, if it is ok?
>     -mapred-site.xml
>     <property>
>     <name>mapred.job.tracker</name>
>     <value>apple:9001</value>
>     </property>
>     <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx512m -server</value>
>     </property>
>     <property>
>     <name>mapred.job.tracker.handler.count</name>
>     <value>2</value>
>     </property>
>     <property>
>     <name>mapred.local.dir</name>
>     <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>     </property>
>     <property>
>     <name>mapred.map.tasks</name>
>     <value>1</value>
>     </property>
>     <property>
>     <name>mapred.reduce.tasks</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.submit.replication</name>
>     <value>2</value>
>     </property>
>     <property>
>     <name>mapred.system.dir</name>
>     <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.indexcache.mb</name>
>     <value>10</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.map.tasks.maximum</name>
>     <value>1</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.temp.dir</name>
>     <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>     </property>
>     <property>
>     <name>webinterface.private.actions</name>
>     <value>true</value>
>     </property>
>     <property>
>     <name>mapred.reduce.slowstart.completed.maps</name>
>     <value>0.01</value>
>     </property>
>     -hdfs-site.xml
>     <property>
>     <name>dfs.block.size</name>
>     <value>268435456</value>
>     </property>
>     PS: I extended dfs.block.size, because I won 50% better
>     performance with this change.
>     I am waiting for your comments...
>     Regards,
>     Baran
>
>

Re: Configuration for small Cluster

Posted by James Seigel <ja...@tynt.com>.

I am talking about unix swapping of memory out to disk when the Os
runs low on ram.  If it is doing this you will get abysmal
performance.

With small amount of ram in your boxes, and you number of mappers and
reduces totaling 5 with child opts of 512mb you have a scenario when
you can OOM the box.

To be fair I didn't go too deep into your settings.

James

Sent from my mobile. Please excuse the typos.

On 2011-05-02, at 6:58 AM, baran cakici <ba...@gmail.com> wrote:

> Hi James,
>
> Thank you for your response... What do you mean with "swapping"? Each Node
> works as a Tasktracker and Datanode. I can see it, if your question is
> this...
> Regards,
>
> Baran
> 2011/5/2 James Seigel <ja...@tynt.com>
>
>> Do you see swapping on your data nodes with this config?
>>
>> James
>>
>> Sent from my mobile. Please excuse the typos.
>>
>> On 2011-05-02, at 5:38 AM, baran cakici <ba...@gmail.com> wrote:
>>
>>> any comments???
>>>
>>> 2011/4/28 baran cakici <ba...@gmail.com>
>>>
>>>> Hi Everyone,
>>>>
>>>> I have a Cluster with one Master(JobTracker and NameNode - Intel
>> Core2Duo 2
>>>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
>>>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
>>>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure,
>> if my
>>>> Configuration is correctly. Can you please just look, if it is ok?
>>>>
>>>> -mapred-site.xml
>>>>
>>>> <property>
>>>> <name>mapred.job.tracker</name>
>>>> <value>apple:9001</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.child.java.opts</name>
>>>> <value>-Xmx512m -server</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.job.tracker.handler.count</name>
>>>> <value>2</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.local.dir</name>
>>>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.map.tasks</name>
>>>> <value>1</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.reduce.tasks</name>
>>>> <value>4</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.submit.replication</name>
>>>> <value>2</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.system.dir</name>
>>>>
>>>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.tasktracker.indexcache.mb</name>
>>>> <value>10</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.tasktracker.map.tasks.maximum</name>
>>>> <value>1</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>>> <value>4</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.temp.dir</name>
>>>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>webinterface.private.actions</name>
>>>> <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.reduce.slowstart.completed.maps</name>
>>>> <value>0.01</value>
>>>> </property>
>>>>
>>>> -hdfs-site.xml
>>>>
>>>> <property>
>>>> <name>dfs.block.size</name>
>>>> <value>268435456</value>
>>>> </property>
>>>> PS: I extended dfs.block.size, because I won 50% better performance with
>>>> this change.
>>>>
>>>> I am waiting for your comments...
>>>>
>>>> Regards,
>>>>
>>>> Baran
>>>>
>>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

Hi James,

Thank you for your response... What do you mean with "swapping"? Each Node
works as a Tasktracker and Datanode. I can see it, if your question is
this...
Regards,

Baran
2011/5/2 James Seigel <ja...@tynt.com>

> Do you see swapping on your data nodes with this config?
>
> James
>
> Sent from my mobile. Please excuse the typos.
>
> On 2011-05-02, at 5:38 AM, baran cakici <ba...@gmail.com> wrote:
>
> > any comments???
> >
> > 2011/4/28 baran cakici <ba...@gmail.com>
> >
> >> Hi Everyone,
> >>
> >> I have a Cluster with one Master(JobTracker and NameNode - Intel
> Core2Duo 2
> >> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
> >> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
> >> line. Now, I try to accelerate my System(Benchmark), but I'm not sure,
> if my
> >> Configuration is correctly. Can you please just look, if it is ok?
> >>
> >> -mapred-site.xml
> >>
> >> <property>
> >> <name>mapred.job.tracker</name>
> >> <value>apple:9001</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.child.java.opts</name>
> >> <value>-Xmx512m -server</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.job.tracker.handler.count</name>
> >> <value>2</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.local.dir</name>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.map.tasks</name>
> >> <value>1</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.reduce.tasks</name>
> >> <value>4</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.submit.replication</name>
> >> <value>2</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.system.dir</name>
> >>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.tasktracker.indexcache.mb</name>
> >> <value>10</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.tasktracker.map.tasks.maximum</name>
> >> <value>1</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >> <value>4</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.temp.dir</name>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> >> </property>
> >>
> >> <property>
> >> <name>webinterface.private.actions</name>
> >> <value>true</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.reduce.slowstart.completed.maps</name>
> >> <value>0.01</value>
> >> </property>
> >>
> >> -hdfs-site.xml
> >>
> >> <property>
> >> <name>dfs.block.size</name>
> >> <value>268435456</value>
> >> </property>
> >> PS: I extended dfs.block.size, because I won 50% better performance with
> >> this change.
> >>
> >> I am waiting for your comments...
> >>
> >> Regards,
> >>
> >> Baran
> >>
>

Re: Configuration for small Cluster

Posted by James Seigel <ja...@tynt.com>.

Do you see swapping on your data nodes with this config?

James

Sent from my mobile. Please excuse the typos.

On 2011-05-02, at 5:38 AM, baran cakici <ba...@gmail.com> wrote:

> any comments???
>
> 2011/4/28 baran cakici <ba...@gmail.com>
>
>> Hi Everyone,
>>
>> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo 2
>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if my
>> Configuration is correctly. Can you please just look, if it is ok?
>>
>> -mapred-site.xml
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>apple:9001</value>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx512m -server</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker.handler.count</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>> </property>
>>
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.submit.replication</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.system.dir</name>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.indexcache.mb</name>
>> <value>10</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.temp.dir</name>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>> </property>
>>
>> <property>
>> <name>webinterface.private.actions</name>
>> <value>true</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.slowstart.completed.maps</name>
>> <value>0.01</value>
>> </property>
>>
>> -hdfs-site.xml
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>268435456</value>
>> </property>
>> PS: I extended dfs.block.size, because I won 50% better performance with
>> this change.
>>
>> I am waiting for your comments...
>>
>> Regards,
>>
>> Baran
>>