You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mahsa Mofidpoor <mo...@gmail.com> on 2012/08/20 15:03:31 UTC

running a job on single-node setup takes less time than running on a cluster

Hello,

I run a simple join (select col_list from table1 join table2 on
(join_condition)) on both single-node and multi-nodes  setup. The table
sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
the query on the cluster then to run it on a single-node hadoop setup.
I checked to map logs and I saw that both mappings happen on the master
node.
Do I need to increase the data in order to benefit from the multi-nodes
capacity?
How can I make sure that my data is distributed on all the nodes?

Thank you in advance for your assistance.

Reagrds,
Mahsa

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Thank you very much.

On Tue, Aug 21, 2012 at 11:46 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Dear Mahsa,
>
> Yes what you have observed is defined to happen that way.
> On a single node cluster -- everything is local. There is network transfer
> and every thing else vanish. Try to increase the data size .. you will see
> the effect of parallel jvm's on the job time.
>
> In your single node cluster, you have one jvm and everything is local.
> In multinode , multiple jvm's and mapper ouput to be copied to reducer
> (network transfer).
>
> Comparing the above two situations.. may be your small data didnot reach
> the threshold where you the observer of multinode cluster.
>
> Try increasing the data size and you will see wonders. You know, I worked
> on TB of data for table joins. It worked just amazing.
>
>
>
> On Tue, Aug 21, 2012 at 12:01 AM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>
>> Thnaks Saurabh
>>
>>
>> On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:
>>
>>> Dear Mahsa,
>>>
>>> You need to increase the data size to benefit out of Hadoop. Basically
>>> hadoop creates splits based on the configured value. The default being
>>> 64MB. So if your data size is less than 64MB it would basically run only 1
>>> MR job.
>>>
>>> Thanks & Regards,
>>> Saurabh Bhutyani
>>>
>>> Call  : 9820083104
>>> Gtalk: s4saurabh@gmail.com
>>>
>>>
>>>
>>> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> I run a simple join (select col_list from table1 join table2 on
>>>> (join_condition)) on both single-node and multi-nodes  setup. The table
>>>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>>>> the query on the cluster then to run it on a single-node hadoop setup.
>>>> I checked to map logs and I saw that both mappings happen on the master
>>>> node.
>>>> Do I need to increase the data in order to benefit from the multi-nodes
>>>> capacity?
>>>> How can I make sure that my data is distributed on all the nodes?
>>>>
>>>> Thank you in advance for your assistance.
>>>>
>>>> Reagrds,
>>>> Mahsa
>>>>
>>>
>>>
>>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Thank you very much.

On Tue, Aug 21, 2012 at 11:46 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Dear Mahsa,
>
> Yes what you have observed is defined to happen that way.
> On a single node cluster -- everything is local. There is network transfer
> and every thing else vanish. Try to increase the data size .. you will see
> the effect of parallel jvm's on the job time.
>
> In your single node cluster, you have one jvm and everything is local.
> In multinode , multiple jvm's and mapper ouput to be copied to reducer
> (network transfer).
>
> Comparing the above two situations.. may be your small data didnot reach
> the threshold where you the observer of multinode cluster.
>
> Try increasing the data size and you will see wonders. You know, I worked
> on TB of data for table joins. It worked just amazing.
>
>
>
> On Tue, Aug 21, 2012 at 12:01 AM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>
>> Thnaks Saurabh
>>
>>
>> On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:
>>
>>> Dear Mahsa,
>>>
>>> You need to increase the data size to benefit out of Hadoop. Basically
>>> hadoop creates splits based on the configured value. The default being
>>> 64MB. So if your data size is less than 64MB it would basically run only 1
>>> MR job.
>>>
>>> Thanks & Regards,
>>> Saurabh Bhutyani
>>>
>>> Call  : 9820083104
>>> Gtalk: s4saurabh@gmail.com
>>>
>>>
>>>
>>> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> I run a simple join (select col_list from table1 join table2 on
>>>> (join_condition)) on both single-node and multi-nodes  setup. The table
>>>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>>>> the query on the cluster then to run it on a single-node hadoop setup.
>>>> I checked to map logs and I saw that both mappings happen on the master
>>>> node.
>>>> Do I need to increase the data in order to benefit from the multi-nodes
>>>> capacity?
>>>> How can I make sure that my data is distributed on all the nodes?
>>>>
>>>> Thank you in advance for your assistance.
>>>>
>>>> Reagrds,
>>>> Mahsa
>>>>
>>>
>>>
>>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Thank you very much.

On Tue, Aug 21, 2012 at 11:46 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Dear Mahsa,
>
> Yes what you have observed is defined to happen that way.
> On a single node cluster -- everything is local. There is network transfer
> and every thing else vanish. Try to increase the data size .. you will see
> the effect of parallel jvm's on the job time.
>
> In your single node cluster, you have one jvm and everything is local.
> In multinode , multiple jvm's and mapper ouput to be copied to reducer
> (network transfer).
>
> Comparing the above two situations.. may be your small data didnot reach
> the threshold where you the observer of multinode cluster.
>
> Try increasing the data size and you will see wonders. You know, I worked
> on TB of data for table joins. It worked just amazing.
>
>
>
> On Tue, Aug 21, 2012 at 12:01 AM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>
>> Thnaks Saurabh
>>
>>
>> On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:
>>
>>> Dear Mahsa,
>>>
>>> You need to increase the data size to benefit out of Hadoop. Basically
>>> hadoop creates splits based on the configured value. The default being
>>> 64MB. So if your data size is less than 64MB it would basically run only 1
>>> MR job.
>>>
>>> Thanks & Regards,
>>> Saurabh Bhutyani
>>>
>>> Call  : 9820083104
>>> Gtalk: s4saurabh@gmail.com
>>>
>>>
>>>
>>> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> I run a simple join (select col_list from table1 join table2 on
>>>> (join_condition)) on both single-node and multi-nodes  setup. The table
>>>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>>>> the query on the cluster then to run it on a single-node hadoop setup.
>>>> I checked to map logs and I saw that both mappings happen on the master
>>>> node.
>>>> Do I need to increase the data in order to benefit from the multi-nodes
>>>> capacity?
>>>> How can I make sure that my data is distributed on all the nodes?
>>>>
>>>> Thank you in advance for your assistance.
>>>>
>>>> Reagrds,
>>>> Mahsa
>>>>
>>>
>>>
>>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Thank you very much.

On Tue, Aug 21, 2012 at 11:46 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Dear Mahsa,
>
> Yes what you have observed is defined to happen that way.
> On a single node cluster -- everything is local. There is network transfer
> and every thing else vanish. Try to increase the data size .. you will see
> the effect of parallel jvm's on the job time.
>
> In your single node cluster, you have one jvm and everything is local.
> In multinode , multiple jvm's and mapper ouput to be copied to reducer
> (network transfer).
>
> Comparing the above two situations.. may be your small data didnot reach
> the threshold where you the observer of multinode cluster.
>
> Try increasing the data size and you will see wonders. You know, I worked
> on TB of data for table joins. It worked just amazing.
>
>
>
> On Tue, Aug 21, 2012 at 12:01 AM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>
>> Thnaks Saurabh
>>
>>
>> On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:
>>
>>> Dear Mahsa,
>>>
>>> You need to increase the data size to benefit out of Hadoop. Basically
>>> hadoop creates splits based on the configured value. The default being
>>> 64MB. So if your data size is less than 64MB it would basically run only 1
>>> MR job.
>>>
>>> Thanks & Regards,
>>> Saurabh Bhutyani
>>>
>>> Call  : 9820083104
>>> Gtalk: s4saurabh@gmail.com
>>>
>>>
>>>
>>> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> I run a simple join (select col_list from table1 join table2 on
>>>> (join_condition)) on both single-node and multi-nodes  setup. The table
>>>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>>>> the query on the cluster then to run it on a single-node hadoop setup.
>>>> I checked to map logs and I saw that both mappings happen on the master
>>>> node.
>>>> Do I need to increase the data in order to benefit from the multi-nodes
>>>> capacity?
>>>> How can I make sure that my data is distributed on all the nodes?
>>>>
>>>> Thank you in advance for your assistance.
>>>>
>>>> Reagrds,
>>>> Mahsa
>>>>
>>>
>>>
>>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.

Dear Mahsa,

Yes what you have observed is defined to happen that way.
On a single node cluster -- everything is local. There is network transfer
and every thing else vanish. Try to increase the data size .. you will see
the effect of parallel jvm's on the job time.

In your single node cluster, you have one jvm and everything is local.
In multinode , multiple jvm's and mapper ouput to be copied to reducer
(network transfer).

Comparing the above two situations.. may be your small data didnot reach
the threshold where you the observer of multinode cluster.

Try increasing the data size and you will see wonders. You know, I worked
on TB of data for table joins. It worked just amazing.

On Tue, Aug 21, 2012 at 12:01 AM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Thnaks Saurabh
>
>
> On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:
>
>> Dear Mahsa,
>>
>> You need to increase the data size to benefit out of Hadoop. Basically
>> hadoop creates splits based on the configured value. The default being
>> 64MB. So if your data size is less than 64MB it would basically run only 1
>> MR job.
>>
>> Thanks & Regards,
>> Saurabh Bhutyani
>>
>> Call  : 9820083104
>> Gtalk: s4saurabh@gmail.com
>>
>>
>>
>> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I run a simple join (select col_list from table1 join table2 on
>>> (join_condition)) on both single-node and multi-nodes  setup. The table
>>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>>> the query on the cluster then to run it on a single-node hadoop setup.
>>> I checked to map logs and I saw that both mappings happen on the master
>>> node.
>>> Do I need to increase the data in order to benefit from the multi-nodes
>>> capacity?
>>> How can I make sure that my data is distributed on all the nodes?
>>>
>>> Thank you in advance for your assistance.
>>>
>>> Reagrds,
>>> Mahsa
>>>
>>
>>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.

Dear Mahsa,

Yes what you have observed is defined to happen that way.
On a single node cluster -- everything is local. There is network transfer
and every thing else vanish. Try to increase the data size .. you will see
the effect of parallel jvm's on the job time.

In your single node cluster, you have one jvm and everything is local.
In multinode , multiple jvm's and mapper ouput to be copied to reducer
(network transfer).

Comparing the above two situations.. may be your small data didnot reach
the threshold where you the observer of multinode cluster.

Try increasing the data size and you will see wonders. You know, I worked
on TB of data for table joins. It worked just amazing.

On Tue, Aug 21, 2012 at 12:01 AM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Thnaks Saurabh
>
>
> On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:
>
>> Dear Mahsa,
>>
>> You need to increase the data size to benefit out of Hadoop. Basically
>> hadoop creates splits based on the configured value. The default being
>> 64MB. So if your data size is less than 64MB it would basically run only 1
>> MR job.
>>
>> Thanks & Regards,
>> Saurabh Bhutyani
>>
>> Call  : 9820083104
>> Gtalk: s4saurabh@gmail.com
>>
>>
>>
>> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I run a simple join (select col_list from table1 join table2 on
>>> (join_condition)) on both single-node and multi-nodes  setup. The table
>>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>>> the query on the cluster then to run it on a single-node hadoop setup.
>>> I checked to map logs and I saw that both mappings happen on the master
>>> node.
>>> Do I need to increase the data in order to benefit from the multi-nodes
>>> capacity?
>>> How can I make sure that my data is distributed on all the nodes?
>>>
>>> Thank you in advance for your assistance.
>>>
>>> Reagrds,
>>> Mahsa
>>>
>>
>>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.

Dear Mahsa,

Yes what you have observed is defined to happen that way.
On a single node cluster -- everything is local. There is network transfer
and every thing else vanish. Try to increase the data size .. you will see
the effect of parallel jvm's on the job time.

In your single node cluster, you have one jvm and everything is local.
In multinode , multiple jvm's and mapper ouput to be copied to reducer
(network transfer).

Comparing the above two situations.. may be your small data didnot reach
the threshold where you the observer of multinode cluster.

Try increasing the data size and you will see wonders. You know, I worked
on TB of data for table joins. It worked just amazing.

On Tue, Aug 21, 2012 at 12:01 AM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Thnaks Saurabh
>
>
> On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:
>
>> Dear Mahsa,
>>
>> You need to increase the data size to benefit out of Hadoop. Basically
>> hadoop creates splits based on the configured value. The default being
>> 64MB. So if your data size is less than 64MB it would basically run only 1
>> MR job.
>>
>> Thanks & Regards,
>> Saurabh Bhutyani
>>
>> Call  : 9820083104
>> Gtalk: s4saurabh@gmail.com
>>
>>
>>
>> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I run a simple join (select col_list from table1 join table2 on
>>> (join_condition)) on both single-node and multi-nodes  setup. The table
>>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>>> the query on the cluster then to run it on a single-node hadoop setup.
>>> I checked to map logs and I saw that both mappings happen on the master
>>> node.
>>> Do I need to increase the data in order to benefit from the multi-nodes
>>> capacity?
>>> How can I make sure that my data is distributed on all the nodes?
>>>
>>> Thank you in advance for your assistance.
>>>
>>> Reagrds,
>>> Mahsa
>>>
>>
>>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by nagarjuna kanamarlapudi <na...@gmail.com>.

Dear Mahsa,

Yes what you have observed is defined to happen that way.
On a single node cluster -- everything is local. There is network transfer
and every thing else vanish. Try to increase the data size .. you will see
the effect of parallel jvm's on the job time.

In your single node cluster, you have one jvm and everything is local.
In multinode , multiple jvm's and mapper ouput to be copied to reducer
(network transfer).

Comparing the above two situations.. may be your small data didnot reach
the threshold where you the observer of multinode cluster.

Try increasing the data size and you will see wonders. You know, I worked
on TB of data for table joins. It worked just amazing.

On Tue, Aug 21, 2012 at 12:01 AM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Thnaks Saurabh
>
>
> On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:
>
>> Dear Mahsa,
>>
>> You need to increase the data size to benefit out of Hadoop. Basically
>> hadoop creates splits based on the configured value. The default being
>> 64MB. So if your data size is less than 64MB it would basically run only 1
>> MR job.
>>
>> Thanks & Regards,
>> Saurabh Bhutyani
>>
>> Call  : 9820083104
>> Gtalk: s4saurabh@gmail.com
>>
>>
>>
>> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I run a simple join (select col_list from table1 join table2 on
>>> (join_condition)) on both single-node and multi-nodes  setup. The table
>>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>>> the query on the cluster then to run it on a single-node hadoop setup.
>>> I checked to map logs and I saw that both mappings happen on the master
>>> node.
>>> Do I need to increase the data in order to benefit from the multi-nodes
>>> capacity?
>>> How can I make sure that my data is distributed on all the nodes?
>>>
>>> Thank you in advance for your assistance.
>>>
>>> Reagrds,
>>> Mahsa
>>>
>>
>>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Thnaks Saurabh

On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:

> Dear Mahsa,
>
> You need to increase the data size to benefit out of Hadoop. Basically
> hadoop creates splits based on the configured value. The default being
> 64MB. So if your data size is less than 64MB it would basically run only 1
> MR job.
>
> Thanks & Regards,
> Saurabh Bhutyani
>
> Call  : 9820083104
> Gtalk: s4saurabh@gmail.com
>
>
>
> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>
>> Hello,
>>
>> I run a simple join (select col_list from table1 join table2 on
>> (join_condition)) on both single-node and multi-nodes  setup. The table
>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>> the query on the cluster then to run it on a single-node hadoop setup.
>> I checked to map logs and I saw that both mappings happen on the master
>> node.
>> Do I need to increase the data in order to benefit from the multi-nodes
>> capacity?
>> How can I make sure that my data is distributed on all the nodes?
>>
>> Thank you in advance for your assistance.
>>
>> Reagrds,
>> Mahsa
>>
>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Thnaks Saurabh

On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:

> Dear Mahsa,
>
> You need to increase the data size to benefit out of Hadoop. Basically
> hadoop creates splits based on the configured value. The default being
> 64MB. So if your data size is less than 64MB it would basically run only 1
> MR job.
>
> Thanks & Regards,
> Saurabh Bhutyani
>
> Call  : 9820083104
> Gtalk: s4saurabh@gmail.com
>
>
>
> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>
>> Hello,
>>
>> I run a simple join (select col_list from table1 join table2 on
>> (join_condition)) on both single-node and multi-nodes  setup. The table
>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>> the query on the cluster then to run it on a single-node hadoop setup.
>> I checked to map logs and I saw that both mappings happen on the master
>> node.
>> Do I need to increase the data in order to benefit from the multi-nodes
>> capacity?
>> How can I make sure that my data is distributed on all the nodes?
>>
>> Thank you in advance for your assistance.
>>
>> Reagrds,
>> Mahsa
>>
>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Thnaks Saurabh

On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:

> Dear Mahsa,
>
> You need to increase the data size to benefit out of Hadoop. Basically
> hadoop creates splits based on the configured value. The default being
> 64MB. So if your data size is less than 64MB it would basically run only 1
> MR job.
>
> Thanks & Regards,
> Saurabh Bhutyani
>
> Call  : 9820083104
> Gtalk: s4saurabh@gmail.com
>
>
>
> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>
>> Hello,
>>
>> I run a simple join (select col_list from table1 join table2 on
>> (join_condition)) on both single-node and multi-nodes  setup. The table
>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>> the query on the cluster then to run it on a single-node hadoop setup.
>> I checked to map logs and I saw that both mappings happen on the master
>> node.
>> Do I need to increase the data in order to benefit from the multi-nodes
>> capacity?
>> How can I make sure that my data is distributed on all the nodes?
>>
>> Thank you in advance for your assistance.
>>
>> Reagrds,
>> Mahsa
>>
>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Mahsa Mofidpoor <mo...@gmail.com>.

Thnaks Saurabh

On Mon, Aug 20, 2012 at 12:15 PM, Saurabh bhutyani <s4...@gmail.com>wrote:

> Dear Mahsa,
>
> You need to increase the data size to benefit out of Hadoop. Basically
> hadoop creates splits based on the configured value. The default being
> 64MB. So if your data size is less than 64MB it would basically run only 1
> MR job.
>
> Thanks & Regards,
> Saurabh Bhutyani
>
> Call  : 9820083104
> Gtalk: s4saurabh@gmail.com
>
>
>
> On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:
>
>> Hello,
>>
>> I run a simple join (select col_list from table1 join table2 on
>> (join_condition)) on both single-node and multi-nodes  setup. The table
>> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
>> the query on the cluster then to run it on a single-node hadoop setup.
>> I checked to map logs and I saw that both mappings happen on the master
>> node.
>> Do I need to increase the data in order to benefit from the multi-nodes
>> capacity?
>> How can I make sure that my data is distributed on all the nodes?
>>
>> Thank you in advance for your assistance.
>>
>> Reagrds,
>> Mahsa
>>
>
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Saurabh bhutyani <s4...@gmail.com>.

Dear Mahsa,

You need to increase the data size to benefit out of Hadoop. Basically
hadoop creates splits based on the configured value. The default being
64MB. So if your data size is less than 64MB it would basically run only 1
MR job.

Thanks & Regards,
Saurabh Bhutyani

Call  : 9820083104
Gtalk: s4saurabh@gmail.com

On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Hello,
>
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
>
> Thank you in advance for your assistance.
>
> Reagrds,
> Mahsa
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

I have no answer to your questions , but have some questions though !

What tables are you talking about ?
Considering you are talking about datasets/files when you say tables , why
using hadoop for such some sized tables.

On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Hello,
>
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
>
> Thank you in advance for your assistance.
>
> Reagrds,
> Mahsa
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Saurabh bhutyani <s4...@gmail.com>.

Dear Mahsa,

You need to increase the data size to benefit out of Hadoop. Basically
hadoop creates splits based on the configured value. The default being
64MB. So if your data size is less than 64MB it would basically run only 1
MR job.

Thanks & Regards,
Saurabh Bhutyani

Call  : 9820083104
Gtalk: s4saurabh@gmail.com

On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Hello,
>
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
>
> Thank you in advance for your assistance.
>
> Reagrds,
> Mahsa
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

I have no answer to your questions , but have some questions though !

What tables are you talking about ?
Considering you are talking about datasets/files when you say tables , why
using hadoop for such some sized tables.

On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Hello,
>
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
>
> Thank you in advance for your assistance.
>
> Reagrds,
> Mahsa
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

I have no answer to your questions , but have some questions though !

What tables are you talking about ?
Considering you are talking about datasets/files when you say tables , why
using hadoop for such some sized tables.

On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Hello,
>
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
>
> Thank you in advance for your assistance.
>
> Reagrds,
> Mahsa
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

I have no answer to your questions , but have some questions though !

What tables are you talking about ?
Considering you are talking about datasets/files when you say tables , why
using hadoop for such some sized tables.

On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Hello,
>
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
>
> Thank you in advance for your assistance.
>
> Reagrds,
> Mahsa
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Saurabh bhutyani <s4...@gmail.com>.

Dear Mahsa,

You need to increase the data size to benefit out of Hadoop. Basically
hadoop creates splits based on the configured value. The default being
64MB. So if your data size is less than 64MB it would basically run only 1
MR job.

Thanks & Regards,
Saurabh Bhutyani

Call  : 9820083104
Gtalk: s4saurabh@gmail.com

On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Hello,
>
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
>
> Thank you in advance for your assistance.
>
> Reagrds,
> Mahsa
>

Re: running a job on single-node setup takes less time than running on a cluster

Posted by Saurabh bhutyani <s4...@gmail.com>.

Dear Mahsa,

You need to increase the data size to benefit out of Hadoop. Basically
hadoop creates splits based on the configured value. The default being
64MB. So if your data size is less than 64MB it would basically run only 1
MR job.

Thanks & Regards,
Saurabh Bhutyani

Call  : 9820083104
Gtalk: s4saurabh@gmail.com

On Mon, Aug 20, 2012 at 6:33 PM, Mahsa Mofidpoor <mo...@gmail.com>wrote:

> Hello,
>
> I run a simple join (select col_list from table1 join table2 on
> (join_condition)) on both single-node and multi-nodes  setup. The table
> sizes are 1.7 MB and 4.2 MB respectively.  It takes more time to execute
> the query on the cluster then to run it on a single-node hadoop setup.
> I checked to map logs and I saw that both mappings happen on the master
> node.
> Do I need to increase the data in order to benefit from the multi-nodes
> capacity?
> How can I make sure that my data is distributed on all the nodes?
>
> Thank you in advance for your assistance.
>
> Reagrds,
> Mahsa
>