You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Sundeep Kambhampati <ka...@cse.ohio-state.edu> on 2013/09/06 02:41:45 UTC

How to speed up Hadoop?

Hi all,

     I am looking for ways to configure Hadoop inorder to speed up data 
processing. Assuming all my nodes are highly fault tolerant, will making 
data replication factor 1 speed up the processing? Are there some way to 
disable failure monitoring done by Hadoop?

Thank you for your time.

-Sundeep

Re: How to speed up Hadoop?

Posted by Sundeep Kambhampati <ka...@cse.ohio-state.edu>.

On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote:
> Solution 1: Throw more hardware at the cluster. That's the whole point 
> of hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what 
> kind of jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of 
> defeats the purpose of using Hadoop. You could do this if you can't 
> get more hardware, are running experimental non-critical 
> non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <cembree@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I think you just went backwards.   more replicas (generally
>     speaking) are better.
>
>     I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant"
>     ones for almost every problem.  I'd get them for the same or less
>     $ too.
>
>
>
>
>     On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati
>     <kambhamp@cse.ohio-state.edu <ma...@cse.ohio-state.edu>>
>     wrote:
>
>         Hi all,
>
>             I am looking for ways to configure Hadoop inorder to speed
>         up data processing. Assuming all my nodes are highly fault
>         tolerant, will making data replication factor 1 speed up the
>         processing? Are there some way to disable failure monitoring
>         done by Hadoop?
>
>         Thank you for your time.
>
>         -Sundeep
>
>
>
Thank you for your inputs. I can't currently add more hardware.

By monitoring I mean something like speculative execution.

Regards,
Sundeep

Re: How to speed up Hadoop?

Posted by Sundeep Kambhampati <ka...@cse.ohio-state.edu>.

On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote:
> Solution 1: Throw more hardware at the cluster. That's the whole point 
> of hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what 
> kind of jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of 
> defeats the purpose of using Hadoop. You could do this if you can't 
> get more hardware, are running experimental non-critical 
> non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <cembree@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I think you just went backwards.   more replicas (generally
>     speaking) are better.
>
>     I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant"
>     ones for almost every problem.  I'd get them for the same or less
>     $ too.
>
>
>
>
>     On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati
>     <kambhamp@cse.ohio-state.edu <ma...@cse.ohio-state.edu>>
>     wrote:
>
>         Hi all,
>
>             I am looking for ways to configure Hadoop inorder to speed
>         up data processing. Assuming all my nodes are highly fault
>         tolerant, will making data replication factor 1 speed up the
>         processing? Are there some way to disable failure monitoring
>         done by Hadoop?
>
>         Thank you for your time.
>
>         -Sundeep
>
>
>
Thank you your inputs. I can't currently add more hardware.

By monitoring I mean something like speculative execution.

Regards,
Sundeep

Re: How to speed up Hadoop?

Posted by Peyman Mohajerian <mo...@gmail.com>.

How about this: http://hadoop.apache.org/docs/stable/vaidya.html
I've never tried it myself, i was just reading about it today.


On Thu, Sep 5, 2013 at 5:57 PM, Preethi Vinayak Ponangi <
vinayakponangi@gmail.com> wrote:

> Solution 1: Throw more hardware at the cluster. That's the whole point of
> hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of
> jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of
> defeats the purpose of using Hadoop. You could do this if you can't get
> more hardware, are running experimental non-critical non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <ce...@gmail.com> wrote:
>
>> I think you just went backwards.   more replicas (generally speaking) are
>> better.
>>
>> I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
>> almost every problem.  I'd get them for the same or less $ too.
>>
>>
>>
>>
>> On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
>> kambhamp@cse.ohio-state.edu> wrote:
>>
>>> Hi all,
>>>
>>>     I am looking for ways to configure Hadoop inorder to speed up data
>>> processing. Assuming all my nodes are highly fault tolerant, will making
>>> data replication factor 1 speed up the processing? Are there some way to
>>> disable failure monitoring done by Hadoop?
>>>
>>> Thank you for your time.
>>>
>>> -Sundeep
>>>
>>
>>
>

Re: How to speed up Hadoop?

Posted by Peyman Mohajerian <mo...@gmail.com>.

How about this: http://hadoop.apache.org/docs/stable/vaidya.html
I've never tried it myself, i was just reading about it today.


On Thu, Sep 5, 2013 at 5:57 PM, Preethi Vinayak Ponangi <
vinayakponangi@gmail.com> wrote:

> Solution 1: Throw more hardware at the cluster. That's the whole point of
> hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of
> jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of
> defeats the purpose of using Hadoop. You could do this if you can't get
> more hardware, are running experimental non-critical non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <ce...@gmail.com> wrote:
>
>> I think you just went backwards.   more replicas (generally speaking) are
>> better.
>>
>> I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
>> almost every problem.  I'd get them for the same or less $ too.
>>
>>
>>
>>
>> On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
>> kambhamp@cse.ohio-state.edu> wrote:
>>
>>> Hi all,
>>>
>>>     I am looking for ways to configure Hadoop inorder to speed up data
>>> processing. Assuming all my nodes are highly fault tolerant, will making
>>> data replication factor 1 speed up the processing? Are there some way to
>>> disable failure monitoring done by Hadoop?
>>>
>>> Thank you for your time.
>>>
>>> -Sundeep
>>>
>>
>>
>

Re: How to speed up Hadoop?

Posted by Sundeep Kambhampati <ka...@cse.ohio-state.edu>.

On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote:
> Solution 1: Throw more hardware at the cluster. That's the whole point 
> of hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what 
> kind of jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of 
> defeats the purpose of using Hadoop. You could do this if you can't 
> get more hardware, are running experimental non-critical 
> non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <cembree@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I think you just went backwards.   more replicas (generally
>     speaking) are better.
>
>     I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant"
>     ones for almost every problem.  I'd get them for the same or less
>     $ too.
>
>
>
>
>     On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati
>     <kambhamp@cse.ohio-state.edu <ma...@cse.ohio-state.edu>>
>     wrote:
>
>         Hi all,
>
>             I am looking for ways to configure Hadoop inorder to speed
>         up data processing. Assuming all my nodes are highly fault
>         tolerant, will making data replication factor 1 speed up the
>         processing? Are there some way to disable failure monitoring
>         done by Hadoop?
>
>         Thank you for your time.
>
>         -Sundeep
>
>
>
Thank you for your inputs. I can't currently add more hardware.

By monitoring I mean something like speculative execution.

Regards,
Sundeep

Re: How to speed up Hadoop?

Posted by Sundeep Kambhampati <ka...@cse.ohio-state.edu>.

On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote:
> Solution 1: Throw more hardware at the cluster. That's the whole point 
> of hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what 
> kind of jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of 
> defeats the purpose of using Hadoop. You could do this if you can't 
> get more hardware, are running experimental non-critical 
> non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <cembree@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I think you just went backwards.   more replicas (generally
>     speaking) are better.
>
>     I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant"
>     ones for almost every problem.  I'd get them for the same or less
>     $ too.
>
>
>
>
>     On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati
>     <kambhamp@cse.ohio-state.edu <ma...@cse.ohio-state.edu>>
>     wrote:
>
>         Hi all,
>
>             I am looking for ways to configure Hadoop inorder to speed
>         up data processing. Assuming all my nodes are highly fault
>         tolerant, will making data replication factor 1 speed up the
>         processing? Are there some way to disable failure monitoring
>         done by Hadoop?
>
>         Thank you for your time.
>
>         -Sundeep
>
>
>
Thank you your inputs. I can't currently add more hardware.

By monitoring I mean something like speculative execution.

Regards,
Sundeep

Re: How to speed up Hadoop?

Posted by Sundeep Kambhampati <ka...@cse.ohio-state.edu>.

On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote:
> Solution 1: Throw more hardware at the cluster. That's the whole point 
> of hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what 
> kind of jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of 
> defeats the purpose of using Hadoop. You could do this if you can't 
> get more hardware, are running experimental non-critical 
> non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <cembree@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I think you just went backwards.   more replicas (generally
>     speaking) are better.
>
>     I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant"
>     ones for almost every problem.  I'd get them for the same or less
>     $ too.
>
>
>
>
>     On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati
>     <kambhamp@cse.ohio-state.edu <ma...@cse.ohio-state.edu>>
>     wrote:
>
>         Hi all,
>
>             I am looking for ways to configure Hadoop inorder to speed
>         up data processing. Assuming all my nodes are highly fault
>         tolerant, will making data replication factor 1 speed up the
>         processing? Are there some way to disable failure monitoring
>         done by Hadoop?
>
>         Thank you for your time.
>
>         -Sundeep
>
>
>
Thank you your inputs. I can't currently add more hardware.

By monitoring I mean something like speculative execution.

Regards,
Sundeep

Re: How to speed up Hadoop?

Posted by Sundeep Kambhampati <ka...@cse.ohio-state.edu>.

On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote:
> Solution 1: Throw more hardware at the cluster. That's the whole point 
> of hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what 
> kind of jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of 
> defeats the purpose of using Hadoop. You could do this if you can't 
> get more hardware, are running experimental non-critical 
> non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <cembree@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I think you just went backwards.   more replicas (generally
>     speaking) are better.
>
>     I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant"
>     ones for almost every problem.  I'd get them for the same or less
>     $ too.
>
>
>
>
>     On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati
>     <kambhamp@cse.ohio-state.edu <ma...@cse.ohio-state.edu>>
>     wrote:
>
>         Hi all,
>
>             I am looking for ways to configure Hadoop inorder to speed
>         up data processing. Assuming all my nodes are highly fault
>         tolerant, will making data replication factor 1 speed up the
>         processing? Are there some way to disable failure monitoring
>         done by Hadoop?
>
>         Thank you for your time.
>
>         -Sundeep
>
>
>
Thank you your inputs. I can't currently add more hardware.

By monitoring I mean something like speculative execution.

Regards,
Sundeep

Re: How to speed up Hadoop?

Posted by Peyman Mohajerian <mo...@gmail.com>.

How about this: http://hadoop.apache.org/docs/stable/vaidya.html
I've never tried it myself, i was just reading about it today.


On Thu, Sep 5, 2013 at 5:57 PM, Preethi Vinayak Ponangi <
vinayakponangi@gmail.com> wrote:

> Solution 1: Throw more hardware at the cluster. That's the whole point of
> hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of
> jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of
> defeats the purpose of using Hadoop. You could do this if you can't get
> more hardware, are running experimental non-critical non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <ce...@gmail.com> wrote:
>
>> I think you just went backwards.   more replicas (generally speaking) are
>> better.
>>
>> I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
>> almost every problem.  I'd get them for the same or less $ too.
>>
>>
>>
>>
>> On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
>> kambhamp@cse.ohio-state.edu> wrote:
>>
>>> Hi all,
>>>
>>>     I am looking for ways to configure Hadoop inorder to speed up data
>>> processing. Assuming all my nodes are highly fault tolerant, will making
>>> data replication factor 1 speed up the processing? Are there some way to
>>> disable failure monitoring done by Hadoop?
>>>
>>> Thank you for your time.
>>>
>>> -Sundeep
>>>
>>
>>
>

Re: How to speed up Hadoop?

Posted by Peyman Mohajerian <mo...@gmail.com>.

How about this: http://hadoop.apache.org/docs/stable/vaidya.html
I've never tried it myself, i was just reading about it today.


On Thu, Sep 5, 2013 at 5:57 PM, Preethi Vinayak Ponangi <
vinayakponangi@gmail.com> wrote:

> Solution 1: Throw more hardware at the cluster. That's the whole point of
> hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of
> jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of
> defeats the purpose of using Hadoop. You could do this if you can't get
> more hardware, are running experimental non-critical non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <ce...@gmail.com> wrote:
>
>> I think you just went backwards.   more replicas (generally speaking) are
>> better.
>>
>> I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
>> almost every problem.  I'd get them for the same or less $ too.
>>
>>
>>
>>
>> On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
>> kambhamp@cse.ohio-state.edu> wrote:
>>
>>> Hi all,
>>>
>>>     I am looking for ways to configure Hadoop inorder to speed up data
>>> processing. Assuming all my nodes are highly fault tolerant, will making
>>> data replication factor 1 speed up the processing? Are there some way to
>>> disable failure monitoring done by Hadoop?
>>>
>>> Thank you for your time.
>>>
>>> -Sundeep
>>>
>>
>>
>

Re: How to speed up Hadoop?

Posted by Sundeep Kambhampati <ka...@cse.ohio-state.edu>.

On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote:
> Solution 1: Throw more hardware at the cluster. That's the whole point 
> of hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what 
> kind of jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of 
> defeats the purpose of using Hadoop. You could do this if you can't 
> get more hardware, are running experimental non-critical 
> non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <cembree@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I think you just went backwards.   more replicas (generally
>     speaking) are better.
>
>     I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant"
>     ones for almost every problem.  I'd get them for the same or less
>     $ too.
>
>
>
>
>     On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati
>     <kambhamp@cse.ohio-state.edu <ma...@cse.ohio-state.edu>>
>     wrote:
>
>         Hi all,
>
>             I am looking for ways to configure Hadoop inorder to speed
>         up data processing. Assuming all my nodes are highly fault
>         tolerant, will making data replication factor 1 speed up the
>         processing? Are there some way to disable failure monitoring
>         done by Hadoop?
>
>         Thank you for your time.
>
>         -Sundeep
>
>
>
Thank you for your inputs. I can't currently add more hardware.

By monitoring I mean something like speculative execution.

Regards,
Sundeep

Re: How to speed up Hadoop?

Posted by Sundeep Kambhampati <ka...@cse.ohio-state.edu>.

On 9/5/2013 8:57 PM, Preethi Vinayak Ponangi wrote:
> Solution 1: Throw more hardware at the cluster. That's the whole point 
> of hadoop.
> Solution 2: Try to optimize the mapreduce jobs. It depends on what 
> kind of jobs you are running.
>
> I wouldn't suggest decreasing the number of replications as it kind of 
> defeats the purpose of using Hadoop. You could do this if you can't 
> get more hardware, are running experimental non-critical 
> non-production data.
>
> What kind of Hadoop monitoring are you talking about?
>
> Regards,
> Vinayak.
>
>
> On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <cembree@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I think you just went backwards.   more replicas (generally
>     speaking) are better.
>
>     I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant"
>     ones for almost every problem.  I'd get them for the same or less
>     $ too.
>
>
>
>
>     On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati
>     <kambhamp@cse.ohio-state.edu <ma...@cse.ohio-state.edu>>
>     wrote:
>
>         Hi all,
>
>             I am looking for ways to configure Hadoop inorder to speed
>         up data processing. Assuming all my nodes are highly fault
>         tolerant, will making data replication factor 1 speed up the
>         processing? Are there some way to disable failure monitoring
>         done by Hadoop?
>
>         Thank you for your time.
>
>         -Sundeep
>
>
>
Thank you for your inputs. I can't currently add more hardware.

By monitoring I mean something like speculative execution.

Regards,
Sundeep

Re: How to speed up Hadoop?

Posted by Preethi Vinayak Ponangi <vi...@gmail.com>.

Solution 1: Throw more hardware at the cluster. That's the whole point of
hadoop.
Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of
jobs you are running.

I wouldn't suggest decreasing the number of replications as it kind of
defeats the purpose of using Hadoop. You could do this if you can't get
more hardware, are running experimental non-critical non-production data.

What kind of Hadoop monitoring are you talking about?

Regards,
Vinayak.

On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <ce...@gmail.com> wrote:

> I think you just went backwards.   more replicas (generally speaking) are
> better.
>
> I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
> almost every problem.  I'd get them for the same or less $ too.
>
>
>
>
> On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
> kambhamp@cse.ohio-state.edu> wrote:
>
>> Hi all,
>>
>>     I am looking for ways to configure Hadoop inorder to speed up data
>> processing. Assuming all my nodes are highly fault tolerant, will making
>> data replication factor 1 speed up the processing? Are there some way to
>> disable failure monitoring done by Hadoop?
>>
>> Thank you for your time.
>>
>> -Sundeep
>>
>
>

Re: How to speed up Hadoop?

Posted by Preethi Vinayak Ponangi <vi...@gmail.com>.

Solution 1: Throw more hardware at the cluster. That's the whole point of
hadoop.
Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of
jobs you are running.

I wouldn't suggest decreasing the number of replications as it kind of
defeats the purpose of using Hadoop. You could do this if you can't get
more hardware, are running experimental non-critical non-production data.

What kind of Hadoop monitoring are you talking about?

Regards,
Vinayak.

On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <ce...@gmail.com> wrote:

> I think you just went backwards.   more replicas (generally speaking) are
> better.
>
> I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
> almost every problem.  I'd get them for the same or less $ too.
>
>
>
>
> On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
> kambhamp@cse.ohio-state.edu> wrote:
>
>> Hi all,
>>
>>     I am looking for ways to configure Hadoop inorder to speed up data
>> processing. Assuming all my nodes are highly fault tolerant, will making
>> data replication factor 1 speed up the processing? Are there some way to
>> disable failure monitoring done by Hadoop?
>>
>> Thank you for your time.
>>
>> -Sundeep
>>
>
>

Re: How to speed up Hadoop?

Posted by Preethi Vinayak Ponangi <vi...@gmail.com>.

Solution 1: Throw more hardware at the cluster. That's the whole point of
hadoop.
Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of
jobs you are running.

I wouldn't suggest decreasing the number of replications as it kind of
defeats the purpose of using Hadoop. You could do this if you can't get
more hardware, are running experimental non-critical non-production data.

What kind of Hadoop monitoring are you talking about?

Regards,
Vinayak.

On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <ce...@gmail.com> wrote:

> I think you just went backwards.   more replicas (generally speaking) are
> better.
>
> I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
> almost every problem.  I'd get them for the same or less $ too.
>
>
>
>
> On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
> kambhamp@cse.ohio-state.edu> wrote:
>
>> Hi all,
>>
>>     I am looking for ways to configure Hadoop inorder to speed up data
>> processing. Assuming all my nodes are highly fault tolerant, will making
>> data replication factor 1 speed up the processing? Are there some way to
>> disable failure monitoring done by Hadoop?
>>
>> Thank you for your time.
>>
>> -Sundeep
>>
>
>

Re: How to speed up Hadoop?

Posted by Preethi Vinayak Ponangi <vi...@gmail.com>.

Solution 1: Throw more hardware at the cluster. That's the whole point of
hadoop.
Solution 2: Try to optimize the mapreduce jobs. It depends on what kind of
jobs you are running.

I wouldn't suggest decreasing the number of replications as it kind of
defeats the purpose of using Hadoop. You could do this if you can't get
more hardware, are running experimental non-critical non-production data.

What kind of Hadoop monitoring are you talking about?

Regards,
Vinayak.

On Thu, Sep 5, 2013 at 7:51 PM, Chris Embree <ce...@gmail.com> wrote:

> I think you just went backwards.   more replicas (generally speaking) are
> better.
>
> I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
> almost every problem.  I'd get them for the same or less $ too.
>
>
>
>
> On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
> kambhamp@cse.ohio-state.edu> wrote:
>
>> Hi all,
>>
>>     I am looking for ways to configure Hadoop inorder to speed up data
>> processing. Assuming all my nodes are highly fault tolerant, will making
>> data replication factor 1 speed up the processing? Are there some way to
>> disable failure monitoring done by Hadoop?
>>
>> Thank you for your time.
>>
>> -Sundeep
>>
>
>

Re: How to speed up Hadoop?

Posted by Chris Embree <ce...@gmail.com>.

I think you just went backwards.   more replicas (generally speaking) are
better.

I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
almost every problem.  I'd get them for the same or less $ too.

On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
kambhamp@cse.ohio-state.edu> wrote:

> Hi all,
>
>     I am looking for ways to configure Hadoop inorder to speed up data
> processing. Assuming all my nodes are highly fault tolerant, will making
> data replication factor 1 speed up the processing? Are there some way to
> disable failure monitoring done by Hadoop?
>
> Thank you for your time.
>
> -Sundeep
>

Re: How to speed up Hadoop?

Posted by Chris Embree <ce...@gmail.com>.

I think you just went backwards.   more replicas (generally speaking) are
better.

I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
almost every problem.  I'd get them for the same or less $ too.

On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
kambhamp@cse.ohio-state.edu> wrote:

> Hi all,
>
>     I am looking for ways to configure Hadoop inorder to speed up data
> processing. Assuming all my nodes are highly fault tolerant, will making
> data replication factor 1 speed up the processing? Are there some way to
> disable failure monitoring done by Hadoop?
>
> Thank you for your time.
>
> -Sundeep
>

Re: How to speed up Hadoop?

Posted by Harsh J <ha...@cloudera.com>.

I'd recommend reading Eric Sammer's "Hadoop Operations" (O'Reilly)
book. It goes over a lot of this stuff - building, monitoring, tuning,
optimizing, etc..

If your goal is just speed and quicker results, and not retention or
safety, by all means use replication factor as 1. Note that its
difficult for us to suggest configs unless you also share your
use-case (in brief) or goals. While the software is highly tunable, a
lot of tweaks depend on what you are planning to do.

On Fri, Sep 6, 2013 at 6:11 AM, Sundeep Kambhampati
<ka...@cse.ohio-state.edu> wrote:
> Hi all,
>
>     I am looking for ways to configure Hadoop inorder to speed up data
> processing. Assuming all my nodes are highly fault tolerant, will making
> data replication factor 1 speed up the processing? Are there some way to
> disable failure monitoring done by Hadoop?
>
> Thank you for your time.
>
> -Sundeep

-- 
Harsh J

Re: How to speed up Hadoop?

Posted by Harsh J <ha...@cloudera.com>.

I'd recommend reading Eric Sammer's "Hadoop Operations" (O'Reilly)
book. It goes over a lot of this stuff - building, monitoring, tuning,
optimizing, etc..

If your goal is just speed and quicker results, and not retention or
safety, by all means use replication factor as 1. Note that its
difficult for us to suggest configs unless you also share your
use-case (in brief) or goals. While the software is highly tunable, a
lot of tweaks depend on what you are planning to do.

On Fri, Sep 6, 2013 at 6:11 AM, Sundeep Kambhampati
<ka...@cse.ohio-state.edu> wrote:
> Hi all,
>
>     I am looking for ways to configure Hadoop inorder to speed up data
> processing. Assuming all my nodes are highly fault tolerant, will making
> data replication factor 1 speed up the processing? Are there some way to
> disable failure monitoring done by Hadoop?
>
> Thank you for your time.
>
> -Sundeep

-- 
Harsh J

Re: How to speed up Hadoop?

Posted by Harsh J <ha...@cloudera.com>.

I'd recommend reading Eric Sammer's "Hadoop Operations" (O'Reilly)
book. It goes over a lot of this stuff - building, monitoring, tuning,
optimizing, etc..

If your goal is just speed and quicker results, and not retention or
safety, by all means use replication factor as 1. Note that its
difficult for us to suggest configs unless you also share your
use-case (in brief) or goals. While the software is highly tunable, a
lot of tweaks depend on what you are planning to do.

On Fri, Sep 6, 2013 at 6:11 AM, Sundeep Kambhampati
<ka...@cse.ohio-state.edu> wrote:
> Hi all,
>
>     I am looking for ways to configure Hadoop inorder to speed up data
> processing. Assuming all my nodes are highly fault tolerant, will making
> data replication factor 1 speed up the processing? Are there some way to
> disable failure monitoring done by Hadoop?
>
> Thank you for your time.
>
> -Sundeep

-- 
Harsh J

Re: How to speed up Hadoop?

Posted by Chris Embree <ce...@gmail.com>.

I think you just went backwards.   more replicas (generally speaking) are
better.

I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
almost every problem.  I'd get them for the same or less $ too.

On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
kambhamp@cse.ohio-state.edu> wrote:

> Hi all,
>
>     I am looking for ways to configure Hadoop inorder to speed up data
> processing. Assuming all my nodes are highly fault tolerant, will making
> data replication factor 1 speed up the processing? Are there some way to
> disable failure monitoring done by Hadoop?
>
> Thank you for your time.
>
> -Sundeep
>

Re: How to speed up Hadoop?

Posted by Chris Embree <ce...@gmail.com>.

I think you just went backwards.   more replicas (generally speaking) are
better.

I'd take 60 cheap, 1 U servers over 20 "highly fault tolerant" ones for
almost every problem.  I'd get them for the same or less $ too.

On Thu, Sep 5, 2013 at 8:41 PM, Sundeep Kambhampati <
kambhamp@cse.ohio-state.edu> wrote:

> Hi all,
>
>     I am looking for ways to configure Hadoop inorder to speed up data
> processing. Assuming all my nodes are highly fault tolerant, will making
> data replication factor 1 speed up the processing? Are there some way to
> disable failure monitoring done by Hadoop?
>
> Thank you for your time.
>
> -Sundeep
>

Re: How to speed up Hadoop?

Posted by Harsh J <ha...@cloudera.com>.

I'd recommend reading Eric Sammer's "Hadoop Operations" (O'Reilly)
book. It goes over a lot of this stuff - building, monitoring, tuning,
optimizing, etc..

If your goal is just speed and quicker results, and not retention or
safety, by all means use replication factor as 1. Note that its
difficult for us to suggest configs unless you also share your
use-case (in brief) or goals. While the software is highly tunable, a
lot of tweaks depend on what you are planning to do.

On Fri, Sep 6, 2013 at 6:11 AM, Sundeep Kambhampati
<ka...@cse.ohio-state.edu> wrote:
> Hi all,
>
>     I am looking for ways to configure Hadoop inorder to speed up data
> processing. Assuming all my nodes are highly fault tolerant, will making
> data replication factor 1 speed up the processing? Are there some way to
> disable failure monitoring done by Hadoop?
>
> Thank you for your time.
>
> -Sundeep

-- 
Harsh J