You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by baran cakici <ba...@gmail.com> on 2011/04/28 17:21:58 UTC

Configuration for small Cluster

Hi Everyone,

I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo 2
GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if my
Configuration is correctly. Can you please just look, if it is ok?

-mapred-site.xml

<property>
<name>mapred.job.tracker</name>
<value>apple:9001</value>
</property>

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx512m -server</value>
</property>

<property>
<name>mapred.job.tracker.handler.count</name>
<value>2</value>
</property>

<property>
<name>mapred.local.dir</name>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
</property>

<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>

<property>
<name>mapred.reduce.tasks</name>
<value>4</value>
</property>

<property>
<name>mapred.submit.replication</name>
<value>2</value>
</property>

<property>
<name>mapred.system.dir</name>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
</property>

<property>
<name>mapred.tasktracker.indexcache.mb</name>
<value>10</value>
</property>

<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
</property>

<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>4</value>
</property>

<property>
<name>mapred.temp.dir</name>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
</property>

<property>
<name>webinterface.private.actions</name>
<value>true</value>
</property>

<property>
<name>mapred.reduce.slowstart.completed.maps</name>
<value>0.01</value>
</property>

-hdfs-site.xml

<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
PS: I extended dfs.block.size, because I won 50% better performance with
this change.

I am waiting for your comments...

Regards,

Baran

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

Hi,

Actually it is not about security level of Linux. They use some Application
for Security that just with Windows compatible, it was about that. I am not
against to Linux too :)

Regards,

Baran

2011/5/3 hadoopman <ha...@gmail.com>

> I would dispute the assertion that linux isn't secure.  I'm an MCSE and
> AIX Unix certified.  I can setup windows servers that are very secure
> (and in secure).  The same thing goes for Unix and Linux servers.
> Depends who's hands are on the keyboard imo :D
>
> If it was me, I would replace the celerons and increase ram.  Windows
> likes more hardware when improving performance.  Also anything that uses
> emulation or creates an application layer (like cygwin or wine) I
> wouldn't expect anything that borders on performance.  Sure there are
> tweaks that can be made however it still will be shy of what can be
> pulled out of a system (again imo).
>
> Good Luck with it.
>
>
>
>
> On 05/03/2011 04:12 AM, baran cakici wrote:
>
>> Hi,
>> I make this System at Work. For the Security Reasons I cant use Linux at
>> the Company. They prefer to use Windows.
>> thanks,
>> Baran
>>
>>
>

Re: Configuration for small Cluster

Posted by hadoopman <ha...@gmail.com>.

I would dispute the assertion that linux isn't secure.  I'm an MCSE and
AIX Unix certified.  I can setup windows servers that are very secure
(and in secure).  The same thing goes for Unix and Linux servers.
Depends who's hands are on the keyboard imo :D

If it was me, I would replace the celerons and increase ram.  Windows
likes more hardware when improving performance.  Also anything that uses
emulation or creates an application layer (like cygwin or wine) I
wouldn't expect anything that borders on performance.  Sure there are
tweaks that can be made however it still will be shy of what can be
pulled out of a system (again imo).

Good Luck with it.

On 05/03/2011 04:12 AM, baran cakici wrote:
> Hi,
> I make this System at Work. For the Security Reasons I cant use Linux 
> at the Company. They prefer to use Windows.
> thanks,
> Baran
>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

Hi,

I make this System at Work. For the Security Reasons I cant use Linux at the
Company. They prefer to use Windows.

thanks,

Baran

2011/5/3 hadoopman <ha...@gmail.com>

> I'm curious if there is a compelling reason for running it under cygwin
> instead of linux.
>
> I'm also concerned with the celeron and 2 gig ram systems.  Sounds rather
> low end for performance.
>
> Just a couple things that stand out to me.
>
> thanks!
>
>
>
> On 05/02/2011 05:37 AM, baran cakici wrote:
>
> any comments???
>
> 2011/4/28 baran cakici <ba...@gmail.com>
>
>> Hi Everyone,
>>
>> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo
>> 2 GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if my
>> Configuration is correctly. Can you please just look, if it is ok?
>>
>> -mapred-site.xml
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>apple:9001</value>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx512m -server</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker.handler.count</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>> </property>
>>
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.submit.replication</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.system.dir</name>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.indexcache.mb</name>
>> <value>10</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.temp.dir</name>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>> </property>
>>
>> <property>
>> <name>webinterface.private.actions</name>
>> <value>true</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.slowstart.completed.maps</name>
>> <value>0.01</value>
>> </property>
>>
>> -hdfs-site.xml
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>268435456</value>
>> </property>
>> PS: I extended dfs.block.size, because I won 50% better performance with
>> this change.
>>
>> I am waiting for your comments...
>>
>> Regards,
>>
>> Baran
>>
>
>
>

Re: Configuration for small Cluster

Posted by hadoopman <ha...@gmail.com>.

I'm curious if there is a compelling reason for running it under cygwin 
instead of linux.

I'm also concerned with the celeron and 2 gig ram systems.  Sounds 
rather low end for performance.

Just a couple things that stand out to me.

thanks!


On 05/02/2011 05:37 AM, baran cakici wrote:
> any comments???
>
> 2011/4/28 baran cakici <barancakici@gmail.com 
> <ma...@gmail.com>>
>
>     Hi Everyone,
>     I have a Cluster with one Master(JobTracker and NameNode - Intel
>     Core2Duo 2 GB Ram) and four Slaves(Datanode and Tasktracker -
>     Celeron 2 GB Ram). My Inputdata are between 2GB-10GB and I read
>     Inputdata in MapReduce line by line. Now, I try to accelerate my
>     System(Benchmark), but I'm not sure, if my Configuration
>     is correctly. Can you please just look, if it is ok?
>     -mapred-site.xml
>     <property>
>     <name>mapred.job.tracker</name>
>     <value>apple:9001</value>
>     </property>
>     <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx512m -server</value>
>     </property>
>     <property>
>     <name>mapred.job.tracker.handler.count</name>
>     <value>2</value>
>     </property>
>     <property>
>     <name>mapred.local.dir</name>
>     <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>     </property>
>     <property>
>     <name>mapred.map.tasks</name>
>     <value>1</value>
>     </property>
>     <property>
>     <name>mapred.reduce.tasks</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.submit.replication</name>
>     <value>2</value>
>     </property>
>     <property>
>     <name>mapred.system.dir</name>
>     <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.indexcache.mb</name>
>     <value>10</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.map.tasks.maximum</name>
>     <value>1</value>
>     </property>
>     <property>
>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>     <value>4</value>
>     </property>
>     <property>
>     <name>mapred.temp.dir</name>
>     <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>     </property>
>     <property>
>     <name>webinterface.private.actions</name>
>     <value>true</value>
>     </property>
>     <property>
>     <name>mapred.reduce.slowstart.completed.maps</name>
>     <value>0.01</value>
>     </property>
>     -hdfs-site.xml
>     <property>
>     <name>dfs.block.size</name>
>     <value>268435456</value>
>     </property>
>     PS: I extended dfs.block.size, because I won 50% better
>     performance with this change.
>     I am waiting for your comments...
>     Regards,
>     Baran
>
>

Re: Configuration for small Cluster

Posted by James Seigel <ja...@tynt.com>.

I am talking about unix swapping of memory out to disk when the Os
runs low on ram.  If it is doing this you will get abysmal
performance.

With small amount of ram in your boxes, and you number of mappers and
reduces totaling 5 with child opts of 512mb you have a scenario when
you can OOM the box.

To be fair I didn't go too deep into your settings.

James

Sent from my mobile. Please excuse the typos.

On 2011-05-02, at 6:58 AM, baran cakici <ba...@gmail.com> wrote:

> Hi James,
>
> Thank you for your response... What do you mean with "swapping"? Each Node
> works as a Tasktracker and Datanode. I can see it, if your question is
> this...
> Regards,
>
> Baran
> 2011/5/2 James Seigel <ja...@tynt.com>
>
>> Do you see swapping on your data nodes with this config?
>>
>> James
>>
>> Sent from my mobile. Please excuse the typos.
>>
>> On 2011-05-02, at 5:38 AM, baran cakici <ba...@gmail.com> wrote:
>>
>>> any comments???
>>>
>>> 2011/4/28 baran cakici <ba...@gmail.com>
>>>
>>>> Hi Everyone,
>>>>
>>>> I have a Cluster with one Master(JobTracker and NameNode - Intel
>> Core2Duo 2
>>>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
>>>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
>>>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure,
>> if my
>>>> Configuration is correctly. Can you please just look, if it is ok?
>>>>
>>>> -mapred-site.xml
>>>>
>>>> <property>
>>>> <name>mapred.job.tracker</name>
>>>> <value>apple:9001</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.child.java.opts</name>
>>>> <value>-Xmx512m -server</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.job.tracker.handler.count</name>
>>>> <value>2</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.local.dir</name>
>>>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.map.tasks</name>
>>>> <value>1</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.reduce.tasks</name>
>>>> <value>4</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.submit.replication</name>
>>>> <value>2</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.system.dir</name>
>>>>
>>>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.tasktracker.indexcache.mb</name>
>>>> <value>10</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.tasktracker.map.tasks.maximum</name>
>>>> <value>1</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>>> <value>4</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.temp.dir</name>
>>>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>webinterface.private.actions</name>
>>>> <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>mapred.reduce.slowstart.completed.maps</name>
>>>> <value>0.01</value>
>>>> </property>
>>>>
>>>> -hdfs-site.xml
>>>>
>>>> <property>
>>>> <name>dfs.block.size</name>
>>>> <value>268435456</value>
>>>> </property>
>>>> PS: I extended dfs.block.size, because I won 50% better performance with
>>>> this change.
>>>>
>>>> I am waiting for your comments...
>>>>
>>>> Regards,
>>>>
>>>> Baran
>>>>
>>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

Hi James,

Thank you for your response... What do you mean with "swapping"? Each Node
works as a Tasktracker and Datanode. I can see it, if your question is
this...
Regards,

Baran
2011/5/2 James Seigel <ja...@tynt.com>

> Do you see swapping on your data nodes with this config?
>
> James
>
> Sent from my mobile. Please excuse the typos.
>
> On 2011-05-02, at 5:38 AM, baran cakici <ba...@gmail.com> wrote:
>
> > any comments???
> >
> > 2011/4/28 baran cakici <ba...@gmail.com>
> >
> >> Hi Everyone,
> >>
> >> I have a Cluster with one Master(JobTracker and NameNode - Intel
> Core2Duo 2
> >> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
> >> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
> >> line. Now, I try to accelerate my System(Benchmark), but I'm not sure,
> if my
> >> Configuration is correctly. Can you please just look, if it is ok?
> >>
> >> -mapred-site.xml
> >>
> >> <property>
> >> <name>mapred.job.tracker</name>
> >> <value>apple:9001</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.child.java.opts</name>
> >> <value>-Xmx512m -server</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.job.tracker.handler.count</name>
> >> <value>2</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.local.dir</name>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.map.tasks</name>
> >> <value>1</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.reduce.tasks</name>
> >> <value>4</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.submit.replication</name>
> >> <value>2</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.system.dir</name>
> >>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.tasktracker.indexcache.mb</name>
> >> <value>10</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.tasktracker.map.tasks.maximum</name>
> >> <value>1</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >> <value>4</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.temp.dir</name>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> >> </property>
> >>
> >> <property>
> >> <name>webinterface.private.actions</name>
> >> <value>true</value>
> >> </property>
> >>
> >> <property>
> >> <name>mapred.reduce.slowstart.completed.maps</name>
> >> <value>0.01</value>
> >> </property>
> >>
> >> -hdfs-site.xml
> >>
> >> <property>
> >> <name>dfs.block.size</name>
> >> <value>268435456</value>
> >> </property>
> >> PS: I extended dfs.block.size, because I won 50% better performance with
> >> this change.
> >>
> >> I am waiting for your comments...
> >>
> >> Regards,
> >>
> >> Baran
> >>
>

Re: Configuration for small Cluster

Posted by James Seigel <ja...@tynt.com>.

Do you see swapping on your data nodes with this config?

James

Sent from my mobile. Please excuse the typos.

On 2011-05-02, at 5:38 AM, baran cakici <ba...@gmail.com> wrote:

> any comments???
>
> 2011/4/28 baran cakici <ba...@gmail.com>
>
>> Hi Everyone,
>>
>> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo 2
>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if my
>> Configuration is correctly. Can you please just look, if it is ok?
>>
>> -mapred-site.xml
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>apple:9001</value>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx512m -server</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker.handler.count</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>> </property>
>>
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.submit.replication</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.system.dir</name>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.indexcache.mb</name>
>> <value>10</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.temp.dir</name>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>> </property>
>>
>> <property>
>> <name>webinterface.private.actions</name>
>> <value>true</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.slowstart.completed.maps</name>
>> <value>0.01</value>
>> </property>
>>
>> -hdfs-site.xml
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>268435456</value>
>> </property>
>> PS: I extended dfs.block.size, because I won 50% better performance with
>> this change.
>>
>> I am waiting for your comments...
>>
>> Regards,
>>
>> Baran
>>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

any comments???

2011/4/28 baran cakici <ba...@gmail.com>

> Hi Everyone,
>
> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo 2
> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if my
> Configuration is correctly. Can you please just look, if it is ok?
>
> -mapred-site.xml
>
> <property>
> <name>mapred.job.tracker</name>
> <value>apple:9001</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx512m -server</value>
> </property>
>
> <property>
> <name>mapred.job.tracker.handler.count</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.local.dir</name>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> </property>
>
> <property>
> <name>mapred.map.tasks</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.submit.replication</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.system.dir</name>
>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.indexcache.mb</name>
> <value>10</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.temp.dir</name>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> </property>
>
> <property>
> <name>webinterface.private.actions</name>
> <value>true</value>
> </property>
>
> <property>
> <name>mapred.reduce.slowstart.completed.maps</name>
> <value>0.01</value>
> </property>
>
> -hdfs-site.xml
>
> <property>
> <name>dfs.block.size</name>
> <value>268435456</value>
> </property>
> PS: I extended dfs.block.size, because I won 50% better performance with
> this change.
>
> I am waiting for your comments...
>
> Regards,
>
> Baran
>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

any comments???

2011/4/28 baran cakici <ba...@gmail.com>

> Hi Everyone,
>
> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo 2
> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if my
> Configuration is correctly. Can you please just look, if it is ok?
>
> -mapred-site.xml
>
> <property>
> <name>mapred.job.tracker</name>
> <value>apple:9001</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx512m -server</value>
> </property>
>
> <property>
> <name>mapred.job.tracker.handler.count</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.local.dir</name>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> </property>
>
> <property>
> <name>mapred.map.tasks</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.submit.replication</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.system.dir</name>
>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.indexcache.mb</name>
> <value>10</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.temp.dir</name>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> </property>
>
> <property>
> <name>webinterface.private.actions</name>
> <value>true</value>
> </property>
>
> <property>
> <name>mapred.reduce.slowstart.completed.maps</name>
> <value>0.01</value>
> </property>
>
> -hdfs-site.xml
>
> <property>
> <name>dfs.block.size</name>
> <value>268435456</value>
> </property>
> PS: I extended dfs.block.size, because I won 50% better performance with
> this change.
>
> I am waiting for your comments...
>
> Regards,
>
> Baran
>

Re: Configuration for small Cluster

Posted by Richard Nadeau <st...@gmail.com>.

(Phones are fun)

With your setting of -Xmx512m for "mapred.child.java.opts" you don't have
enough RAM for 4 reduce tasks. If you have single core Celerons, you also
don't have enough CPU cores to run all four.

You might also try kicking -Xmx512m down to -Xmx256m and see if things run
OK.

Rick

On May 2, 2011 9:18 AM, "Richard Nadeau" <st...@gmail.com> wrote:
> I would change "mapred.tasktracker.reduce.tasks.maximum" to one. With your
> setting
>
> On May 2, 2011 8:48 AM, "baran cakici" <ba...@gmail.com> wrote:
>> without job;
>>
>> CPU Usage = 0%
>> Memory = 585 MB (2GB Ram)
>>
>> Baran
>> 2011/5/2 baran cakici <ba...@gmail.com>
>>
>>> CPU Usage = 95-100%
>>> Memory = 650-850 MB (2GB Ram)
>>>
>>> Baran
>>>
>>>
>>> 2011/5/2 James Seigel <ja...@tynt.com>
>>>
>>>> If you have windows and cygwin you probably don't have a lot if memory
>>>> left at 2 gig.
>>>>
>>>> Pull up system monitor on the data nodes and check for free memory
>>>> when you have you jobs running. I bet it is quite low.
>>>>
>>>> I am not a windows guy so I can't take you much farther.
>>>>
>>>> James
>>>>
>>>> Sent from my mobile. Please excuse the typos.
>>>>
>>>> On 2011-05-02, at 8:32 AM, baran cakici <ba...@gmail.com> wrote:
>>>>
>>>> > yes, I am running under cygwin on my datanodes too. OS of Datanodes
> are
>>>> > Windows as well.
>>>> >
>>>> > What can I do exactly for a better Performance. I changed
>>>> > mapred.child.java.opts to default value.How can I solve this
> "swapping"
>>>> > problem?
>>>> >
>>>> > PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS.
>>>> >
>>>> > thanks, both of you
>>>> >
>>>> > Regards,
>>>> >
>>>> > Baran
>>>> > 2011/5/2 Richard Nadeau <st...@gmail.com>
>>>> >
>>>> >> Are you running under cygwin on your data nodes as well? That is
>>>> certain to
>>>> >> cause performance problems. As James suggested, swapping to disk is
>>>> going
>>>> >> to
>>>> >> be a killer, running on Windows with Celeron processors only
> compounds
>>>> the
>>>> >> problem. The Celeron processor is also sub-optimal for CPU intensive
>>>> tasks
>>>> >>
>>>> >> Rick
>>>> >>
>>>> >> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com>
> wrote:
>>>> >>> Hi Everyone,
>>>> >>>
>>>> >>> I have a Cluster with one Master(JobTracker and NameNode - Intel
>>>> Core2Duo
>>>> >> 2
>>>> >>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB
> Ram).
>>>> My
>>>> >>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce
> line
>>>> by
>>>> >>> line. Now, I try to accelerate my System(Benchmark), but I'm not
> sure,
>>>> if
>>>> >> my
>>>> >>> Configuration is correctly. Can you please just look, if it is ok?
>>>> >>>
>>>> >>> -mapred-site.xml
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.job.tracker</name>
>>>> >>> <value>apple:9001</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.child.java.opts</name>
>>>> >>> <value>-Xmx512m -server</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.job.tracker.handler.count</name>
>>>> >>> <value>2</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.local.dir</name>
>>>> >>>
>>>> >>
>>>>
>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.map.tasks</name>
>>>> >>> <value>1</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.reduce.tasks</name>
>>>> >>> <value>4</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.submit.replication</name>
>>>> >>> <value>2</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.system.dir</name>
>>>> >>>
>>>> >>
>>>> >>
>>>>
>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.tasktracker.indexcache.mb</name>
>>>> >>> <value>10</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.tasktracker.map.tasks.maximum</name>
>>>> >>> <value>1</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>>> >>> <value>4</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.temp.dir</name>
>>>> >>>
>>>> >>
>>>>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>webinterface.private.actions</name>
>>>> >>> <value>true</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>mapred.reduce.slowstart.completed.maps</name>
>>>> >>> <value>0.01</value>
>>>> >>> </property>
>>>> >>>
>>>> >>> -hdfs-site.xml
>>>> >>>
>>>> >>> <property>
>>>> >>> <name>dfs.block.size</name>
>>>> >>> <value>268435456</value>
>>>> >>> </property>
>>>> >>> PS: I extended dfs.block.size, because I won 50% better performance
>>>> with
>>>> >>> this change.
>>>> >>>
>>>> >>> I am waiting for your comments...
>>>> >>>
>>>> >>> Regards,
>>>> >>>
>>>> >>> Baran
>>>> >>
>>>>
>>>
>>>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

Hi Matthew,

At some test situations ended my Map-Process, than I was waiting for
ReduceCopy, Therefore I changed this option, If it starts early, than finish
the ReduceCopy early too. I
think, mapred.reduce.slowstart.completed.maps for all Reduce Process(inc.
Sort and shuffle), but I'm not sure for that.If it is not, than you are
right.

I connected my computers with a Gigaset-Switch  (Ethernet connection)

Regards,

Baran

2011/5/2 GOEKE, MATTHEW [AG/1000] <ma...@monsanto.com>

> Have you tested the performance of adjusting
> mapred.reduce.slowstart.completed.maps property? I'm curious as to what
> effect you have seen by dropping it from the default to .01 because my
> original assumption would have been to try something much higher so that you
> don't have threads spawning so soon for sort and shuffle. Also what kind of
> network interfaces does each of these machines have and how is the "rack"
> setup?
>
> Matt
>
> -----Original Message-----
> From: baran cakici [mailto:barancakici@gmail.com]
> Sent: Monday, May 02, 2011 10:30 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Configuration for small Cluster
>
> I got it, I want to run on each Tasktracker one ReduceTask, overall 4
> Redeuce Task on all Cluster
>
> 2011/5/2 baran cakici <ba...@gmail.com>
>
> > Actually it was one, I changed that, and got better Performance by
> Reduce,
> > because my Reduce-Algortihm is a little bit complex.
> >
> > thanks anyway
> >
> > Regards,
> >
> > Baran
> >
> > 2011/5/2 Richard Nadeau <st...@gmail.com>
> >
> >> I would change "mapred.tasktracker.reduce.tasks.maximum" to one. With
> your
> >> setting
> >>
> >> On May 2, 2011 8:48 AM, "baran cakici" <ba...@gmail.com> wrote:
> >> > without job;
> >> >
> >> > CPU Usage = 0%
> >> > Memory = 585 MB (2GB Ram)
> >> >
> >> > Baran
> >> > 2011/5/2 baran cakici <ba...@gmail.com>
> >> >
> >> >> CPU Usage = 95-100%
> >> >> Memory = 650-850 MB (2GB Ram)
> >> >>
> >> >> Baran
> >> >>
> >> >>
> >> >> 2011/5/2 James Seigel <ja...@tynt.com>
> >> >>
> >> >>> If you have windows and cygwin you probably don't have a lot if
> memory
> >> >>> left at 2 gig.
> >> >>>
> >> >>> Pull up system monitor on the data nodes and check for free memory
> >> >>> when you have you jobs running. I bet it is quite low.
> >> >>>
> >> >>> I am not a windows guy so I can't take you much farther.
> >> >>>
> >> >>> James
> >> >>>
> >> >>> Sent from my mobile. Please excuse the typos.
> >> >>>
> >> >>> On 2011-05-02, at 8:32 AM, baran cakici <ba...@gmail.com>
> >> wrote:
> >> >>>
> >> >>> > yes, I am running under cygwin on my datanodes too. OS of
> Datanodes
> >> are
> >> >>> > Windows as well.
> >> >>> >
> >> >>> > What can I do exactly for a better Performance. I changed
> >> >>> > mapred.child.java.opts to default value.How can I solve this
> >> "swapping"
> >> >>> > problem?
> >> >>> >
> >> >>> > PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux
> OS.
> >> >>> >
> >> >>> > thanks, both of you
> >> >>> >
> >> >>> > Regards,
> >> >>> >
> >> >>> > Baran
> >> >>> > 2011/5/2 Richard Nadeau <st...@gmail.com>
> >> >>> >
> >> >>> >> Are you running under cygwin on your data nodes as well? That is
> >> >>> certain to
> >> >>> >> cause performance problems. As James suggested, swapping to disk
> is
> >> >>> going
> >> >>> >> to
> >> >>> >> be a killer, running on Windows with Celeron processors only
> >> compounds
> >> >>> the
> >> >>> >> problem. The Celeron processor is also sub-optimal for CPU
> >> intensive
> >> >>> tasks
> >> >>> >>
> >> >>> >> Rick
> >> >>> >>
> >> >>> >> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com>
> >> wrote:
> >> >>> >>> Hi Everyone,
> >> >>> >>>
> >> >>> >>> I have a Cluster with one Master(JobTracker and NameNode - Intel
> >> >>> Core2Duo
> >> >>> >> 2
> >> >>> >>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB
> >> Ram).
> >> >>> My
> >> >>> >>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce
> >> line
> >> >>> by
> >> >>> >>> line. Now, I try to accelerate my System(Benchmark), but I'm not
> >> sure,
> >> >>> if
> >> >>> >> my
> >> >>> >>> Configuration is correctly. Can you please just look, if it is
> ok?
> >> >>> >>>
> >> >>> >>> -mapred-site.xml
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.job.tracker</name>
> >> >>> >>> <value>apple:9001</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.child.java.opts</name>
> >> >>> >>> <value>-Xmx512m -server</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.job.tracker.handler.count</name>
> >> >>> >>> <value>2</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.local.dir</name>
> >> >>> >>>
> >> >>> >>
> >> >>>
> >>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.map.tasks</name>
> >> >>> >>> <value>1</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.reduce.tasks</name>
> >> >>> >>> <value>4</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.submit.replication</name>
> >> >>> >>> <value>2</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.system.dir</name>
> >> >>> >>>
> >> >>> >>
> >> >>> >>
> >> >>>
> >>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.tasktracker.indexcache.mb</name>
> >> >>> >>> <value>10</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.tasktracker.map.tasks.maximum</name>
> >> >>> >>> <value>1</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >> >>> >>> <value>4</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.temp.dir</name>
> >> >>> >>>
> >> >>> >>
> >> >>>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>webinterface.private.actions</name>
> >> >>> >>> <value>true</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>mapred.reduce.slowstart.completed.maps</name>
> >> >>> >>> <value>0.01</value>
> >> >>> >>> </property>
> >> >>> >>>
> >> >>> >>> -hdfs-site.xml
> >> >>> >>>
> >> >>> >>> <property>
> >> >>> >>> <name>dfs.block.size</name>
> >> >>> >>> <value>268435456</value>
> >> >>> >>> </property>
> >> >>> >>> PS: I extended dfs.block.size, because I won 50% better
> >> performance
> >> >>> with
> >> >>> >>> this change.
> >> >>> >>>
> >> >>> >>> I am waiting for your comments...
> >> >>> >>>
> >> >>> >>> Regards,
> >> >>> >>>
> >> >>> >>> Baran
> >> >>> >>
> >> >>>
> >> >>
> >> >>
> >>
> >
> >
> This e-mail message may contain privileged and/or confidential information,
> and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use
> of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export
> control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR)
> and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.
>

RE: Configuration for small Cluster

Posted by "GOEKE, MATTHEW [AG/1000]" <ma...@monsanto.com>.

Have you tested the performance of adjusting mapred.reduce.slowstart.completed.maps property? I'm curious as to what effect you have seen by dropping it from the default to .01 because my original assumption would have been to try something much higher so that you don't have threads spawning so soon for sort and shuffle. Also what kind of network interfaces does each of these machines have and how is the "rack" setup?

Matt

-----Original Message-----
From: baran cakici [mailto:barancakici@gmail.com] 
Sent: Monday, May 02, 2011 10:30 AM
To: common-user@hadoop.apache.org
Subject: Re: Configuration for small Cluster

I got it, I want to run on each Tasktracker one ReduceTask, overall 4
Redeuce Task on all Cluster

2011/5/2 baran cakici <ba...@gmail.com>

> Actually it was one, I changed that, and got better Performance by Reduce,
> because my Reduce-Algortihm is a little bit complex.
>
> thanks anyway
>
> Regards,
>
> Baran
>
> 2011/5/2 Richard Nadeau <st...@gmail.com>
>
>> I would change "mapred.tasktracker.reduce.tasks.maximum" to one. With your
>> setting
>>
>> On May 2, 2011 8:48 AM, "baran cakici" <ba...@gmail.com> wrote:
>> > without job;
>> >
>> > CPU Usage = 0%
>> > Memory = 585 MB (2GB Ram)
>> >
>> > Baran
>> > 2011/5/2 baran cakici <ba...@gmail.com>
>> >
>> >> CPU Usage = 95-100%
>> >> Memory = 650-850 MB (2GB Ram)
>> >>
>> >> Baran
>> >>
>> >>
>> >> 2011/5/2 James Seigel <ja...@tynt.com>
>> >>
>> >>> If you have windows and cygwin you probably don't have a lot if memory
>> >>> left at 2 gig.
>> >>>
>> >>> Pull up system monitor on the data nodes and check for free memory
>> >>> when you have you jobs running. I bet it is quite low.
>> >>>
>> >>> I am not a windows guy so I can't take you much farther.
>> >>>
>> >>> James
>> >>>
>> >>> Sent from my mobile. Please excuse the typos.
>> >>>
>> >>> On 2011-05-02, at 8:32 AM, baran cakici <ba...@gmail.com>
>> wrote:
>> >>>
>> >>> > yes, I am running under cygwin on my datanodes too. OS of Datanodes
>> are
>> >>> > Windows as well.
>> >>> >
>> >>> > What can I do exactly for a better Performance. I changed
>> >>> > mapred.child.java.opts to default value.How can I solve this
>> "swapping"
>> >>> > problem?
>> >>> >
>> >>> > PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS.
>> >>> >
>> >>> > thanks, both of you
>> >>> >
>> >>> > Regards,
>> >>> >
>> >>> > Baran
>> >>> > 2011/5/2 Richard Nadeau <st...@gmail.com>
>> >>> >
>> >>> >> Are you running under cygwin on your data nodes as well? That is
>> >>> certain to
>> >>> >> cause performance problems. As James suggested, swapping to disk is
>> >>> going
>> >>> >> to
>> >>> >> be a killer, running on Windows with Celeron processors only
>> compounds
>> >>> the
>> >>> >> problem. The Celeron processor is also sub-optimal for CPU
>> intensive
>> >>> tasks
>> >>> >>
>> >>> >> Rick
>> >>> >>
>> >>> >> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com>
>> wrote:
>> >>> >>> Hi Everyone,
>> >>> >>>
>> >>> >>> I have a Cluster with one Master(JobTracker and NameNode - Intel
>> >>> Core2Duo
>> >>> >> 2
>> >>> >>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB
>> Ram).
>> >>> My
>> >>> >>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce
>> line
>> >>> by
>> >>> >>> line. Now, I try to accelerate my System(Benchmark), but I'm not
>> sure,
>> >>> if
>> >>> >> my
>> >>> >>> Configuration is correctly. Can you please just look, if it is ok?
>> >>> >>>
>> >>> >>> -mapred-site.xml
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.job.tracker</name>
>> >>> >>> <value>apple:9001</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.child.java.opts</name>
>> >>> >>> <value>-Xmx512m -server</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.job.tracker.handler.count</name>
>> >>> >>> <value>2</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.local.dir</name>
>> >>> >>>
>> >>> >>
>> >>>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.map.tasks</name>
>> >>> >>> <value>1</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.reduce.tasks</name>
>> >>> >>> <value>4</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.submit.replication</name>
>> >>> >>> <value>2</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.system.dir</name>
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.tasktracker.indexcache.mb</name>
>> >>> >>> <value>10</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> >>> >>> <value>1</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>> >>> >>> <value>4</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.temp.dir</name>
>> >>> >>>
>> >>> >>
>> >>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>webinterface.private.actions</name>
>> >>> >>> <value>true</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.reduce.slowstart.completed.maps</name>
>> >>> >>> <value>0.01</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> -hdfs-site.xml
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>dfs.block.size</name>
>> >>> >>> <value>268435456</value>
>> >>> >>> </property>
>> >>> >>> PS: I extended dfs.block.size, because I won 50% better
>> performance
>> >>> with
>> >>> >>> this change.
>> >>> >>>
>> >>> >>> I am waiting for your comments...
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>>
>> >>> >>> Baran
>> >>> >>
>> >>>
>> >>
>> >>
>>
>
>
This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you are obligated to comply with all
applicable U.S. export laws and regulations.

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

I got it, I want to run on each Tasktracker one ReduceTask, overall 4
Redeuce Task on all Cluster

2011/5/2 baran cakici <ba...@gmail.com>

> Actually it was one, I changed that, and got better Performance by Reduce,
> because my Reduce-Algortihm is a little bit complex.
>
> thanks anyway
>
> Regards,
>
> Baran
>
> 2011/5/2 Richard Nadeau <st...@gmail.com>
>
>> I would change "mapred.tasktracker.reduce.tasks.maximum" to one. With your
>> setting
>>
>> On May 2, 2011 8:48 AM, "baran cakici" <ba...@gmail.com> wrote:
>> > without job;
>> >
>> > CPU Usage = 0%
>> > Memory = 585 MB (2GB Ram)
>> >
>> > Baran
>> > 2011/5/2 baran cakici <ba...@gmail.com>
>> >
>> >> CPU Usage = 95-100%
>> >> Memory = 650-850 MB (2GB Ram)
>> >>
>> >> Baran
>> >>
>> >>
>> >> 2011/5/2 James Seigel <ja...@tynt.com>
>> >>
>> >>> If you have windows and cygwin you probably don't have a lot if memory
>> >>> left at 2 gig.
>> >>>
>> >>> Pull up system monitor on the data nodes and check for free memory
>> >>> when you have you jobs running. I bet it is quite low.
>> >>>
>> >>> I am not a windows guy so I can't take you much farther.
>> >>>
>> >>> James
>> >>>
>> >>> Sent from my mobile. Please excuse the typos.
>> >>>
>> >>> On 2011-05-02, at 8:32 AM, baran cakici <ba...@gmail.com>
>> wrote:
>> >>>
>> >>> > yes, I am running under cygwin on my datanodes too. OS of Datanodes
>> are
>> >>> > Windows as well.
>> >>> >
>> >>> > What can I do exactly for a better Performance. I changed
>> >>> > mapred.child.java.opts to default value.How can I solve this
>> "swapping"
>> >>> > problem?
>> >>> >
>> >>> > PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS.
>> >>> >
>> >>> > thanks, both of you
>> >>> >
>> >>> > Regards,
>> >>> >
>> >>> > Baran
>> >>> > 2011/5/2 Richard Nadeau <st...@gmail.com>
>> >>> >
>> >>> >> Are you running under cygwin on your data nodes as well? That is
>> >>> certain to
>> >>> >> cause performance problems. As James suggested, swapping to disk is
>> >>> going
>> >>> >> to
>> >>> >> be a killer, running on Windows with Celeron processors only
>> compounds
>> >>> the
>> >>> >> problem. The Celeron processor is also sub-optimal for CPU
>> intensive
>> >>> tasks
>> >>> >>
>> >>> >> Rick
>> >>> >>
>> >>> >> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com>
>> wrote:
>> >>> >>> Hi Everyone,
>> >>> >>>
>> >>> >>> I have a Cluster with one Master(JobTracker and NameNode - Intel
>> >>> Core2Duo
>> >>> >> 2
>> >>> >>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB
>> Ram).
>> >>> My
>> >>> >>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce
>> line
>> >>> by
>> >>> >>> line. Now, I try to accelerate my System(Benchmark), but I'm not
>> sure,
>> >>> if
>> >>> >> my
>> >>> >>> Configuration is correctly. Can you please just look, if it is ok?
>> >>> >>>
>> >>> >>> -mapred-site.xml
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.job.tracker</name>
>> >>> >>> <value>apple:9001</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.child.java.opts</name>
>> >>> >>> <value>-Xmx512m -server</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.job.tracker.handler.count</name>
>> >>> >>> <value>2</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.local.dir</name>
>> >>> >>>
>> >>> >>
>> >>>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.map.tasks</name>
>> >>> >>> <value>1</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.reduce.tasks</name>
>> >>> >>> <value>4</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.submit.replication</name>
>> >>> >>> <value>2</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.system.dir</name>
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.tasktracker.indexcache.mb</name>
>> >>> >>> <value>10</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> >>> >>> <value>1</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>> >>> >>> <value>4</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.temp.dir</name>
>> >>> >>>
>> >>> >>
>> >>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>webinterface.private.actions</name>
>> >>> >>> <value>true</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>mapred.reduce.slowstart.completed.maps</name>
>> >>> >>> <value>0.01</value>
>> >>> >>> </property>
>> >>> >>>
>> >>> >>> -hdfs-site.xml
>> >>> >>>
>> >>> >>> <property>
>> >>> >>> <name>dfs.block.size</name>
>> >>> >>> <value>268435456</value>
>> >>> >>> </property>
>> >>> >>> PS: I extended dfs.block.size, because I won 50% better
>> performance
>> >>> with
>> >>> >>> this change.
>> >>> >>>
>> >>> >>> I am waiting for your comments...
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>>
>> >>> >>> Baran
>> >>> >>
>> >>>
>> >>
>> >>
>>
>
>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

Actually it was one, I changed that, and got better Performance by Reduce,
because my Reduce-Algortihm is a little bit complex.

thanks anyway

Regards,

Baran

2011/5/2 Richard Nadeau <st...@gmail.com>

> I would change "mapred.tasktracker.reduce.tasks.maximum" to one. With your
> setting
>
> On May 2, 2011 8:48 AM, "baran cakici" <ba...@gmail.com> wrote:
> > without job;
> >
> > CPU Usage = 0%
> > Memory = 585 MB (2GB Ram)
> >
> > Baran
> > 2011/5/2 baran cakici <ba...@gmail.com>
> >
> >> CPU Usage = 95-100%
> >> Memory = 650-850 MB (2GB Ram)
> >>
> >> Baran
> >>
> >>
> >> 2011/5/2 James Seigel <ja...@tynt.com>
> >>
> >>> If you have windows and cygwin you probably don't have a lot if memory
> >>> left at 2 gig.
> >>>
> >>> Pull up system monitor on the data nodes and check for free memory
> >>> when you have you jobs running. I bet it is quite low.
> >>>
> >>> I am not a windows guy so I can't take you much farther.
> >>>
> >>> James
> >>>
> >>> Sent from my mobile. Please excuse the typos.
> >>>
> >>> On 2011-05-02, at 8:32 AM, baran cakici <ba...@gmail.com> wrote:
> >>>
> >>> > yes, I am running under cygwin on my datanodes too. OS of Datanodes
> are
> >>> > Windows as well.
> >>> >
> >>> > What can I do exactly for a better Performance. I changed
> >>> > mapred.child.java.opts to default value.How can I solve this
> "swapping"
> >>> > problem?
> >>> >
> >>> > PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS.
> >>> >
> >>> > thanks, both of you
> >>> >
> >>> > Regards,
> >>> >
> >>> > Baran
> >>> > 2011/5/2 Richard Nadeau <st...@gmail.com>
> >>> >
> >>> >> Are you running under cygwin on your data nodes as well? That is
> >>> certain to
> >>> >> cause performance problems. As James suggested, swapping to disk is
> >>> going
> >>> >> to
> >>> >> be a killer, running on Windows with Celeron processors only
> compounds
> >>> the
> >>> >> problem. The Celeron processor is also sub-optimal for CPU intensive
> >>> tasks
> >>> >>
> >>> >> Rick
> >>> >>
> >>> >> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com>
> wrote:
> >>> >>> Hi Everyone,
> >>> >>>
> >>> >>> I have a Cluster with one Master(JobTracker and NameNode - Intel
> >>> Core2Duo
> >>> >> 2
> >>> >>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB
> Ram).
> >>> My
> >>> >>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce
> line
> >>> by
> >>> >>> line. Now, I try to accelerate my System(Benchmark), but I'm not
> sure,
> >>> if
> >>> >> my
> >>> >>> Configuration is correctly. Can you please just look, if it is ok?
> >>> >>>
> >>> >>> -mapred-site.xml
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.job.tracker</name>
> >>> >>> <value>apple:9001</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.child.java.opts</name>
> >>> >>> <value>-Xmx512m -server</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.job.tracker.handler.count</name>
> >>> >>> <value>2</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.local.dir</name>
> >>> >>>
> >>> >>
> >>>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.map.tasks</name>
> >>> >>> <value>1</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.reduce.tasks</name>
> >>> >>> <value>4</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.submit.replication</name>
> >>> >>> <value>2</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.system.dir</name>
> >>> >>>
> >>> >>
> >>> >>
> >>>
>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.tasktracker.indexcache.mb</name>
> >>> >>> <value>10</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.tasktracker.map.tasks.maximum</name>
> >>> >>> <value>1</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >>> >>> <value>4</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.temp.dir</name>
> >>> >>>
> >>> >>
> >>>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>webinterface.private.actions</name>
> >>> >>> <value>true</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>mapred.reduce.slowstart.completed.maps</name>
> >>> >>> <value>0.01</value>
> >>> >>> </property>
> >>> >>>
> >>> >>> -hdfs-site.xml
> >>> >>>
> >>> >>> <property>
> >>> >>> <name>dfs.block.size</name>
> >>> >>> <value>268435456</value>
> >>> >>> </property>
> >>> >>> PS: I extended dfs.block.size, because I won 50% better performance
> >>> with
> >>> >>> this change.
> >>> >>>
> >>> >>> I am waiting for your comments...
> >>> >>>
> >>> >>> Regards,
> >>> >>>
> >>> >>> Baran
> >>> >>
> >>>
> >>
> >>
>

Re: Configuration for small Cluster

Posted by Richard Nadeau <st...@gmail.com>.

I would change "mapred.tasktracker.reduce.tasks.maximum" to one. With your
setting

On May 2, 2011 8:48 AM, "baran cakici" <ba...@gmail.com> wrote:
> without job;
>
> CPU Usage = 0%
> Memory = 585 MB (2GB Ram)
>
> Baran
> 2011/5/2 baran cakici <ba...@gmail.com>
>
>> CPU Usage = 95-100%
>> Memory = 650-850 MB (2GB Ram)
>>
>> Baran
>>
>>
>> 2011/5/2 James Seigel <ja...@tynt.com>
>>
>>> If you have windows and cygwin you probably don't have a lot if memory
>>> left at 2 gig.
>>>
>>> Pull up system monitor on the data nodes and check for free memory
>>> when you have you jobs running. I bet it is quite low.
>>>
>>> I am not a windows guy so I can't take you much farther.
>>>
>>> James
>>>
>>> Sent from my mobile. Please excuse the typos.
>>>
>>> On 2011-05-02, at 8:32 AM, baran cakici <ba...@gmail.com> wrote:
>>>
>>> > yes, I am running under cygwin on my datanodes too. OS of Datanodes
are
>>> > Windows as well.
>>> >
>>> > What can I do exactly for a better Performance. I changed
>>> > mapred.child.java.opts to default value.How can I solve this
"swapping"
>>> > problem?
>>> >
>>> > PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS.
>>> >
>>> > thanks, both of you
>>> >
>>> > Regards,
>>> >
>>> > Baran
>>> > 2011/5/2 Richard Nadeau <st...@gmail.com>
>>> >
>>> >> Are you running under cygwin on your data nodes as well? That is
>>> certain to
>>> >> cause performance problems. As James suggested, swapping to disk is
>>> going
>>> >> to
>>> >> be a killer, running on Windows with Celeron processors only
compounds
>>> the
>>> >> problem. The Celeron processor is also sub-optimal for CPU intensive
>>> tasks
>>> >>
>>> >> Rick
>>> >>
>>> >> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com>
wrote:
>>> >>> Hi Everyone,
>>> >>>
>>> >>> I have a Cluster with one Master(JobTracker and NameNode - Intel
>>> Core2Duo
>>> >> 2
>>> >>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB
Ram).
>>> My
>>> >>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce
line
>>> by
>>> >>> line. Now, I try to accelerate my System(Benchmark), but I'm not
sure,
>>> if
>>> >> my
>>> >>> Configuration is correctly. Can you please just look, if it is ok?
>>> >>>
>>> >>> -mapred-site.xml
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.job.tracker</name>
>>> >>> <value>apple:9001</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.child.java.opts</name>
>>> >>> <value>-Xmx512m -server</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.job.tracker.handler.count</name>
>>> >>> <value>2</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.local.dir</name>
>>> >>>
>>> >>
>>>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.map.tasks</name>
>>> >>> <value>1</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.reduce.tasks</name>
>>> >>> <value>4</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.submit.replication</name>
>>> >>> <value>2</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.system.dir</name>
>>> >>>
>>> >>
>>> >>
>>>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.tasktracker.indexcache.mb</name>
>>> >>> <value>10</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.tasktracker.map.tasks.maximum</name>
>>> >>> <value>1</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>> >>> <value>4</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.temp.dir</name>
>>> >>>
>>> >>
>>>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>webinterface.private.actions</name>
>>> >>> <value>true</value>
>>> >>> </property>
>>> >>>
>>> >>> <property>
>>> >>> <name>mapred.reduce.slowstart.completed.maps</name>
>>> >>> <value>0.01</value>
>>> >>> </property>
>>> >>>
>>> >>> -hdfs-site.xml
>>> >>>
>>> >>> <property>
>>> >>> <name>dfs.block.size</name>
>>> >>> <value>268435456</value>
>>> >>> </property>
>>> >>> PS: I extended dfs.block.size, because I won 50% better performance
>>> with
>>> >>> this change.
>>> >>>
>>> >>> I am waiting for your comments...
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Baran
>>> >>
>>>
>>
>>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

without job;

CPU Usage = 0%
Memory      = 585 MB (2GB Ram)

Baran
2011/5/2 baran cakici <ba...@gmail.com>

> CPU Usage = 95-100%
> Memory      = 650-850 MB (2GB Ram)
>
> Baran
>
>
>  2011/5/2 James Seigel <ja...@tynt.com>
>
>> If you have windows and cygwin you probably don't have a lot if memory
>> left at 2 gig.
>>
>> Pull up system monitor on the data nodes and check for free memory
>> when you have you jobs running. I bet it is quite low.
>>
>> I am not a windows guy so I can't take you much farther.
>>
>> James
>>
>> Sent from my mobile. Please excuse the typos.
>>
>>  On 2011-05-02, at 8:32 AM, baran cakici <ba...@gmail.com> wrote:
>>
>> > yes, I am running under cygwin on my datanodes too. OS of Datanodes are
>> > Windows as well.
>> >
>> > What can I do exactly for a better Performance. I changed
>> > mapred.child.java.opts to default value.How can I solve this "swapping"
>> > problem?
>> >
>> > PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS.
>> >
>> > thanks, both of you
>> >
>> > Regards,
>> >
>> > Baran
>> > 2011/5/2 Richard Nadeau <st...@gmail.com>
>> >
>> >> Are you running under cygwin on your data nodes as well? That is
>> certain to
>> >> cause performance problems. As James suggested, swapping to disk is
>> going
>> >> to
>> >> be a killer, running on Windows with Celeron processors only compounds
>> the
>> >> problem. The Celeron processor is also sub-optimal for CPU intensive
>> tasks
>> >>
>> >> Rick
>> >>
>> >> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com> wrote:
>> >>> Hi Everyone,
>> >>>
>> >>> I have a Cluster with one Master(JobTracker and NameNode - Intel
>> Core2Duo
>> >> 2
>> >>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram).
>> My
>> >>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line
>> by
>> >>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure,
>> if
>> >> my
>> >>> Configuration is correctly. Can you please just look, if it is ok?
>> >>>
>> >>> -mapred-site.xml
>> >>>
>> >>> <property>
>> >>> <name>mapred.job.tracker</name>
>> >>> <value>apple:9001</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.child.java.opts</name>
>> >>> <value>-Xmx512m -server</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.job.tracker.handler.count</name>
>> >>> <value>2</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.local.dir</name>
>> >>>
>> >>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.map.tasks</name>
>> >>> <value>1</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.reduce.tasks</name>
>> >>> <value>4</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.submit.replication</name>
>> >>> <value>2</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.system.dir</name>
>> >>>
>> >>
>> >>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.tasktracker.indexcache.mb</name>
>> >>> <value>10</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> >>> <value>1</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>> >>> <value>4</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.temp.dir</name>
>> >>>
>> >>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>webinterface.private.actions</name>
>> >>> <value>true</value>
>> >>> </property>
>> >>>
>> >>> <property>
>> >>> <name>mapred.reduce.slowstart.completed.maps</name>
>> >>> <value>0.01</value>
>> >>> </property>
>> >>>
>> >>> -hdfs-site.xml
>> >>>
>> >>> <property>
>> >>> <name>dfs.block.size</name>
>> >>> <value>268435456</value>
>> >>> </property>
>> >>> PS: I extended dfs.block.size, because I won 50% better performance
>> with
>> >>> this change.
>> >>>
>> >>> I am waiting for your comments...
>> >>>
>> >>> Regards,
>> >>>
>> >>> Baran
>> >>
>>
>
>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

CPU Usage = 95-100%
Memory      = 650-850 MB (2GB Ram)

Baran


2011/5/2 James Seigel <ja...@tynt.com>

> If you have windows and cygwin you probably don't have a lot if memory
> left at 2 gig.
>
> Pull up system monitor on the data nodes and check for free memory
> when you have you jobs running. I bet it is quite low.
>
> I am not a windows guy so I can't take you much farther.
>
> James
>
> Sent from my mobile. Please excuse the typos.
>
>  On 2011-05-02, at 8:32 AM, baran cakici <ba...@gmail.com> wrote:
>
> > yes, I am running under cygwin on my datanodes too. OS of Datanodes are
> > Windows as well.
> >
> > What can I do exactly for a better Performance. I changed
> > mapred.child.java.opts to default value.How can I solve this "swapping"
> > problem?
> >
> > PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS.
> >
> > thanks, both of you
> >
> > Regards,
> >
> > Baran
> > 2011/5/2 Richard Nadeau <st...@gmail.com>
> >
> >> Are you running under cygwin on your data nodes as well? That is certain
> to
> >> cause performance problems. As James suggested, swapping to disk is
> going
> >> to
> >> be a killer, running on Windows with Celeron processors only compounds
> the
> >> problem. The Celeron processor is also sub-optimal for CPU intensive
> tasks
> >>
> >> Rick
> >>
> >> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com> wrote:
> >>> Hi Everyone,
> >>>
> >>> I have a Cluster with one Master(JobTracker and NameNode - Intel
> Core2Duo
> >> 2
> >>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram).
> My
> >>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line
> by
> >>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure,
> if
> >> my
> >>> Configuration is correctly. Can you please just look, if it is ok?
> >>>
> >>> -mapred-site.xml
> >>>
> >>> <property>
> >>> <name>mapred.job.tracker</name>
> >>> <value>apple:9001</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.child.java.opts</name>
> >>> <value>-Xmx512m -server</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.job.tracker.handler.count</name>
> >>> <value>2</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.local.dir</name>
> >>>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.map.tasks</name>
> >>> <value>1</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.reduce.tasks</name>
> >>> <value>4</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.submit.replication</name>
> >>> <value>2</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.system.dir</name>
> >>>
> >>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.tasktracker.indexcache.mb</name>
> >>> <value>10</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.tasktracker.map.tasks.maximum</name>
> >>> <value>1</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >>> <value>4</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.temp.dir</name>
> >>>
> >>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>webinterface.private.actions</name>
> >>> <value>true</value>
> >>> </property>
> >>>
> >>> <property>
> >>> <name>mapred.reduce.slowstart.completed.maps</name>
> >>> <value>0.01</value>
> >>> </property>
> >>>
> >>> -hdfs-site.xml
> >>>
> >>> <property>
> >>> <name>dfs.block.size</name>
> >>> <value>268435456</value>
> >>> </property>
> >>> PS: I extended dfs.block.size, because I won 50% better performance
> with
> >>> this change.
> >>>
> >>> I am waiting for your comments...
> >>>
> >>> Regards,
> >>>
> >>> Baran
> >>
>

Re: Configuration for small Cluster

Posted by James Seigel <ja...@tynt.com>.

If you have windows and cygwin you probably don't have a lot if memory
left at 2 gig.

Pull up system monitor on the data nodes and check for free memory
when you have you jobs running. I bet it is quite low.

I am not a windows guy so I can't take you much farther.

James

Sent from my mobile. Please excuse the typos.

On 2011-05-02, at 8:32 AM, baran cakici <ba...@gmail.com> wrote:

> yes, I am running under cygwin on my datanodes too. OS of Datanodes are
> Windows as well.
>
> What can I do exactly for a better Performance. I changed
> mapred.child.java.opts to default value.How can I solve this "swapping"
> problem?
>
> PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS.
>
> thanks, both of you
>
> Regards,
>
> Baran
> 2011/5/2 Richard Nadeau <st...@gmail.com>
>
>> Are you running under cygwin on your data nodes as well? That is certain to
>> cause performance problems. As James suggested, swapping to disk is going
>> to
>> be a killer, running on Windows with Celeron processors only compounds the
>> problem. The Celeron processor is also sub-optimal for CPU intensive tasks
>>
>> Rick
>>
>> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com> wrote:
>>> Hi Everyone,
>>>
>>> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo
>> 2
>>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
>>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
>>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if
>> my
>>> Configuration is correctly. Can you please just look, if it is ok?
>>>
>>> -mapred-site.xml
>>>
>>> <property>
>>> <name>mapred.job.tracker</name>
>>> <value>apple:9001</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.child.java.opts</name>
>>> <value>-Xmx512m -server</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.job.tracker.handler.count</name>
>>> <value>2</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.local.dir</name>
>>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.map.tasks</name>
>>> <value>1</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.reduce.tasks</name>
>>> <value>4</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.submit.replication</name>
>>> <value>2</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.system.dir</name>
>>>
>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.tasktracker.indexcache.mb</name>
>>> <value>10</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.tasktracker.map.tasks.maximum</name>
>>> <value>1</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>> <value>4</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.temp.dir</name>
>>>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>>> </property>
>>>
>>> <property>
>>> <name>webinterface.private.actions</name>
>>> <value>true</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.reduce.slowstart.completed.maps</name>
>>> <value>0.01</value>
>>> </property>
>>>
>>> -hdfs-site.xml
>>>
>>> <property>
>>> <name>dfs.block.size</name>
>>> <value>268435456</value>
>>> </property>
>>> PS: I extended dfs.block.size, because I won 50% better performance with
>>> this change.
>>>
>>> I am waiting for your comments...
>>>
>>> Regards,
>>>
>>> Baran
>>

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

yes, I am running under cygwin on my datanodes too. OS of Datanodes are
Windows as well.

What can I do exactly for a better Performance. I changed
mapred.child.java.opts to default value.How can I solve this "swapping"
problem?

PS: I dont have a chance to get Slaves(Celeron 2GHz) with Liniux OS.

thanks, both of you

Regards,

Baran
2011/5/2 Richard Nadeau <st...@gmail.com>

> Are you running under cygwin on your data nodes as well? That is certain to
> cause performance problems. As James suggested, swapping to disk is going
> to
> be a killer, running on Windows with Celeron processors only compounds the
> problem. The Celeron processor is also sub-optimal for CPU intensive tasks
>
> Rick
>
> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com> wrote:
> > Hi Everyone,
> >
> > I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo
> 2
> > GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
> > Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
> > line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if
> my
> > Configuration is correctly. Can you please just look, if it is ok?
> >
> > -mapred-site.xml
> >
> > <property>
> > <name>mapred.job.tracker</name>
> > <value>apple:9001</value>
> > </property>
> >
> > <property>
> > <name>mapred.child.java.opts</name>
> > <value>-Xmx512m -server</value>
> > </property>
> >
> > <property>
> > <name>mapred.job.tracker.handler.count</name>
> > <value>2</value>
> > </property>
> >
> > <property>
> > <name>mapred.local.dir</name>
> >
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> > </property>
> >
> > <property>
> > <name>mapred.map.tasks</name>
> > <value>1</value>
> > </property>
> >
> > <property>
> > <name>mapred.reduce.tasks</name>
> > <value>4</value>
> > </property>
> >
> > <property>
> > <name>mapred.submit.replication</name>
> > <value>2</value>
> > </property>
> >
> > <property>
> > <name>mapred.system.dir</name>
> >
>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> > </property>
> >
> > <property>
> > <name>mapred.tasktracker.indexcache.mb</name>
> > <value>10</value>
> > </property>
> >
> > <property>
> > <name>mapred.tasktracker.map.tasks.maximum</name>
> > <value>1</value>
> > </property>
> >
> > <property>
> > <name>mapred.tasktracker.reduce.tasks.maximum</name>
> > <value>4</value>
> > </property>
> >
> > <property>
> > <name>mapred.temp.dir</name>
> >
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> > </property>
> >
> > <property>
> > <name>webinterface.private.actions</name>
> > <value>true</value>
> > </property>
> >
> > <property>
> > <name>mapred.reduce.slowstart.completed.maps</name>
> > <value>0.01</value>
> > </property>
> >
> > -hdfs-site.xml
> >
> > <property>
> > <name>dfs.block.size</name>
> > <value>268435456</value>
> > </property>
> > PS: I extended dfs.block.size, because I won 50% better performance with
> > this change.
> >
> > I am waiting for your comments...
> >
> > Regards,
> >
> > Baran
>

Re: Configuration for small Cluster

Posted by James Seigel <ja...@tynt.com>.

Sorry, I assumed Linux.

James

Sent from my mobile. Please excuse the typos.

On 2011-05-02, at 8:15 AM, Richard Nadeau <st...@gmail.com> wrote:

> Are you running under cygwin on your data nodes as well? That is certain to
> cause performance problems. As James suggested, swapping to disk is going to
> be a killer, running on Windows with Celeron processors only compounds the
> problem. The Celeron processor is also sub-optimal for CPU intensive tasks
>
> Rick
>
> On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com> wrote:
>> Hi Everyone,
>>
>> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo
> 2
>> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
>> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
>> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if
> my
>> Configuration is correctly. Can you please just look, if it is ok?
>>
>> -mapred-site.xml
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>apple:9001</value>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx512m -server</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker.handler.count</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
>> </property>
>>
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.submit.replication</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.system.dir</name>
>>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.indexcache.mb</name>
>> <value>10</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.temp.dir</name>
>> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
>> </property>
>>
>> <property>
>> <name>webinterface.private.actions</name>
>> <value>true</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.slowstart.completed.maps</name>
>> <value>0.01</value>
>> </property>
>>
>> -hdfs-site.xml
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>268435456</value>
>> </property>
>> PS: I extended dfs.block.size, because I won 50% better performance with
>> this change.
>>
>> I am waiting for your comments...
>>
>> Regards,
>>
>> Baran

Re: Configuration for small Cluster

Posted by Richard Nadeau <st...@gmail.com>.

Are you running under cygwin on your data nodes as well? That is certain to
cause performance problems. As James suggested, swapping to disk is going to
be a killer, running on Windows with Celeron processors only compounds the
problem. The Celeron processor is also sub-optimal for CPU intensive tasks

Rick

On Apr 28, 2011 9:22 AM, "baran cakici" <ba...@gmail.com> wrote:
> Hi Everyone,
>
> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo
2
> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if
my
> Configuration is correctly. Can you please just look, if it is ok?
>
> -mapred-site.xml
>
> <property>
> <name>mapred.job.tracker</name>
> <value>apple:9001</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx512m -server</value>
> </property>
>
> <property>
> <name>mapred.job.tracker.handler.count</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.local.dir</name>
>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> </property>
>
> <property>
> <name>mapred.map.tasks</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.submit.replication</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.system.dir</name>
>
<value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.indexcache.mb</name>
> <value>10</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.temp.dir</name>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> </property>
>
> <property>
> <name>webinterface.private.actions</name>
> <value>true</value>
> </property>
>
> <property>
> <name>mapred.reduce.slowstart.completed.maps</name>
> <value>0.01</value>
> </property>
>
> -hdfs-site.xml
>
> <property>
> <name>dfs.block.size</name>
> <value>268435456</value>
> </property>
> PS: I extended dfs.block.size, because I won 50% better performance with
> this change.
>
> I am waiting for your comments...
>
> Regards,
>
> Baran

Re: Configuration for small Cluster

Posted by baran cakici <ba...@gmail.com>.

any comments???

2011/4/28 baran cakici <ba...@gmail.com>

> Hi Everyone,
>
> I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo 2
> GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My
> Inputdata are between 2GB-10GB and I read Inputdata in MapReduce line by
> line. Now, I try to accelerate my System(Benchmark), but I'm not sure, if my
> Configuration is correctly. Can you please just look, if it is ok?
>
> -mapred-site.xml
>
> <property>
> <name>mapred.job.tracker</name>
> <value>apple:9001</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx512m -server</value>
> </property>
>
> <property>
> <name>mapred.job.tracker.handler.count</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.local.dir</name>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/local</value>
> </property>
>
> <property>
> <name>mapred.map.tasks</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.submit.replication</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.system.dir</name>
>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/system</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.indexcache.mb</name>
> <value>10</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.temp.dir</name>
> <value>/cygwin/usr/local/hadoop-datastore/hadoop-Baran/mapred/temp</value>
> </property>
>
> <property>
> <name>webinterface.private.actions</name>
> <value>true</value>
> </property>
>
> <property>
> <name>mapred.reduce.slowstart.completed.maps</name>
> <value>0.01</value>
> </property>
>
> -hdfs-site.xml
>
> <property>
> <name>dfs.block.size</name>
> <value>268435456</value>
> </property>
> PS: I extended dfs.block.size, because I won 50% better performance with
> this change.
>
> I am waiting for your comments...
>
> Regards,
>
> Baran
>