You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Sai Sai <sa...@yahoo.in> on 2013/06/07 13:17:16 UTC

Re: Why/When partitioner is used.

I always get confused why we should partition and what is the use of it.
Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on...
Is it just to parallelize the reduce process.
Please help.
Thanks
Sai

Re: Why/When partitioner is used.

Posted by Bryan Beaudreault <bb...@hubspot.com>.

There are practical applications for defining your own partitioner as well:

1) Controlling database concurrency.  For instance, lets say you have a
distributed datastore like HBase or even your own mysql sharding scheme.
 Using the default HashPartitioner, keys will get for the most part
randomly distributed across your reducers.  If your reduce code does
database saves or gets, this could cause periods where all reducers are
hitting a single database.  This may be more concurrency than your database
can handle, so you could use a partitioner to send all keys you know would
hit Shard A to reducers 1,2,3, and and all that would hit Shard B to
reducers 4,5,6.

2) I've also used partitioners when I want to do some cross-key operations
such as deduping, counting, or otherwise.  You can further combine the
custom partitioner with your own custom comparator and grouping comparator
to do many advanced operations based the application you are working on.

Since a single Reducer instance is used to reduce() all tuples in a
partition, being able to control exactly which records make it onto a
partition is a hugely valuable tool.

On Fri, Jun 7, 2013 at 10:03 AM, John Lilley <jo...@redpoint.net>wrote:

>  There are kind of two parts to this.  The semantics of MapReduce promise
> that all tuples sharing the same key value are sent to the same reducer, so
> that you can write useful MR applications that do things like “count words”
> or “summarize by date”.  In order to accomplish that, the shuffle phase of
> MR performs a partitioning by key to move tuples sharing the same key to
> the same node where they can be processed together.  You can think of
> key-partitioning as a strategy that assists in parallel distributed sorting.
> ****
>
> john****
>
> ** **
>
> *From:* Sai Sai [mailto:saigraph@yahoo.in]
> *Sent:* Friday, June 07, 2013 5:17 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Why/When partitioner is used.****
>
> ** **
>
> I always get confused why we should partition and what is the use of it.**
> **
>
> Why would one want to send all the keys starting with A to Reducer1 and B
> to R2 and so on...****
>
> Is it just to parallelize the reduce process.****
>
> Please help.****
>
> Thanks****
>
> Sai****
>

Re: Why/When partitioner is used.

Posted by Bryan Beaudreault <bb...@hubspot.com>.

There are practical applications for defining your own partitioner as well:

1) Controlling database concurrency.  For instance, lets say you have a
distributed datastore like HBase or even your own mysql sharding scheme.
 Using the default HashPartitioner, keys will get for the most part
randomly distributed across your reducers.  If your reduce code does
database saves or gets, this could cause periods where all reducers are
hitting a single database.  This may be more concurrency than your database
can handle, so you could use a partitioner to send all keys you know would
hit Shard A to reducers 1,2,3, and and all that would hit Shard B to
reducers 4,5,6.

2) I've also used partitioners when I want to do some cross-key operations
such as deduping, counting, or otherwise.  You can further combine the
custom partitioner with your own custom comparator and grouping comparator
to do many advanced operations based the application you are working on.

Since a single Reducer instance is used to reduce() all tuples in a
partition, being able to control exactly which records make it onto a
partition is a hugely valuable tool.

On Fri, Jun 7, 2013 at 10:03 AM, John Lilley <jo...@redpoint.net>wrote:

>  There are kind of two parts to this.  The semantics of MapReduce promise
> that all tuples sharing the same key value are sent to the same reducer, so
> that you can write useful MR applications that do things like “count words”
> or “summarize by date”.  In order to accomplish that, the shuffle phase of
> MR performs a partitioning by key to move tuples sharing the same key to
> the same node where they can be processed together.  You can think of
> key-partitioning as a strategy that assists in parallel distributed sorting.
> ****
>
> john****
>
> ** **
>
> *From:* Sai Sai [mailto:saigraph@yahoo.in]
> *Sent:* Friday, June 07, 2013 5:17 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Why/When partitioner is used.****
>
> ** **
>
> I always get confused why we should partition and what is the use of it.**
> **
>
> Why would one want to send all the keys starting with A to Reducer1 and B
> to R2 and so on...****
>
> Is it just to parallelize the reduce process.****
>
> Please help.****
>
> Thanks****
>
> Sai****
>

Re: Why/When partitioner is used.

Posted by Bryan Beaudreault <bb...@hubspot.com>.

There are practical applications for defining your own partitioner as well:

1) Controlling database concurrency.  For instance, lets say you have a
distributed datastore like HBase or even your own mysql sharding scheme.
 Using the default HashPartitioner, keys will get for the most part
randomly distributed across your reducers.  If your reduce code does
database saves or gets, this could cause periods where all reducers are
hitting a single database.  This may be more concurrency than your database
can handle, so you could use a partitioner to send all keys you know would
hit Shard A to reducers 1,2,3, and and all that would hit Shard B to
reducers 4,5,6.

2) I've also used partitioners when I want to do some cross-key operations
such as deduping, counting, or otherwise.  You can further combine the
custom partitioner with your own custom comparator and grouping comparator
to do many advanced operations based the application you are working on.

Since a single Reducer instance is used to reduce() all tuples in a
partition, being able to control exactly which records make it onto a
partition is a hugely valuable tool.

On Fri, Jun 7, 2013 at 10:03 AM, John Lilley <jo...@redpoint.net>wrote:

>  There are kind of two parts to this.  The semantics of MapReduce promise
> that all tuples sharing the same key value are sent to the same reducer, so
> that you can write useful MR applications that do things like “count words”
> or “summarize by date”.  In order to accomplish that, the shuffle phase of
> MR performs a partitioning by key to move tuples sharing the same key to
> the same node where they can be processed together.  You can think of
> key-partitioning as a strategy that assists in parallel distributed sorting.
> ****
>
> john****
>
> ** **
>
> *From:* Sai Sai [mailto:saigraph@yahoo.in]
> *Sent:* Friday, June 07, 2013 5:17 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Why/When partitioner is used.****
>
> ** **
>
> I always get confused why we should partition and what is the use of it.**
> **
>
> Why would one want to send all the keys starting with A to Reducer1 and B
> to R2 and so on...****
>
> Is it just to parallelize the reduce process.****
>
> Please help.****
>
> Thanks****
>
> Sai****
>

Re: Why/When partitioner is used.

Posted by Bryan Beaudreault <bb...@hubspot.com>.

There are practical applications for defining your own partitioner as well:

1) Controlling database concurrency.  For instance, lets say you have a
distributed datastore like HBase or even your own mysql sharding scheme.
 Using the default HashPartitioner, keys will get for the most part
randomly distributed across your reducers.  If your reduce code does
database saves or gets, this could cause periods where all reducers are
hitting a single database.  This may be more concurrency than your database
can handle, so you could use a partitioner to send all keys you know would
hit Shard A to reducers 1,2,3, and and all that would hit Shard B to
reducers 4,5,6.

2) I've also used partitioners when I want to do some cross-key operations
such as deduping, counting, or otherwise.  You can further combine the
custom partitioner with your own custom comparator and grouping comparator
to do many advanced operations based the application you are working on.

Since a single Reducer instance is used to reduce() all tuples in a
partition, being able to control exactly which records make it onto a
partition is a hugely valuable tool.

On Fri, Jun 7, 2013 at 10:03 AM, John Lilley <jo...@redpoint.net>wrote:

>  There are kind of two parts to this.  The semantics of MapReduce promise
> that all tuples sharing the same key value are sent to the same reducer, so
> that you can write useful MR applications that do things like “count words”
> or “summarize by date”.  In order to accomplish that, the shuffle phase of
> MR performs a partitioning by key to move tuples sharing the same key to
> the same node where they can be processed together.  You can think of
> key-partitioning as a strategy that assists in parallel distributed sorting.
> ****
>
> john****
>
> ** **
>
> *From:* Sai Sai [mailto:saigraph@yahoo.in]
> *Sent:* Friday, June 07, 2013 5:17 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Why/When partitioner is used.****
>
> ** **
>
> I always get confused why we should partition and what is the use of it.**
> **
>
> Why would one want to send all the keys starting with A to Reducer1 and B
> to R2 and so on...****
>
> Is it just to parallelize the reduce process.****
>
> Please help.****
>
> Thanks****
>
> Sai****
>

RE: Why/When partitioner is used.

Posted by John Lilley <jo...@redpoint.net>.

There are kind of two parts to this.  The semantics of MapReduce promise that all tuples sharing the same key value are sent to the same reducer, so that you can write useful MR applications that do things like “count words” or “summarize by date”.  In order to accomplish that, the shuffle phase of MR performs a partitioning by key to move tuples sharing the same key to the same node where they can be processed together.  You can think of key-partitioning as a strategy that assists in parallel distributed sorting.
john

From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Friday, June 07, 2013 5:17 AM
To: user@hadoop.apache.org
Subject: Re: Why/When partitioner is used.

I always get confused why we should partition and what is the use of it.
Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on...
Is it just to parallelize the reduce process.
Please help.
Thanks
Sai

RE: Why/When partitioner is used.

Posted by John Lilley <jo...@redpoint.net>.

There are kind of two parts to this.  The semantics of MapReduce promise that all tuples sharing the same key value are sent to the same reducer, so that you can write useful MR applications that do things like “count words” or “summarize by date”.  In order to accomplish that, the shuffle phase of MR performs a partitioning by key to move tuples sharing the same key to the same node where they can be processed together.  You can think of key-partitioning as a strategy that assists in parallel distributed sorting.
john

From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Friday, June 07, 2013 5:17 AM
To: user@hadoop.apache.org
Subject: Re: Why/When partitioner is used.

I always get confused why we should partition and what is the use of it.
Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on...
Is it just to parallelize the reduce process.
Please help.
Thanks
Sai

Re: Why/When partitioner is used.

Posted by Harsh J <ha...@cloudera.com>.

Why not also ask yourself, what if you do not send all keys to the
same reducer? Would you get the results you desire that way? :)

On Fri, Jun 7, 2013 at 4:47 PM, Sai Sai <sa...@yahoo.in> wrote:
> I always get confused why we should partition and what is the use of it.
> Why would one want to send all the keys starting with A to Reducer1 and B to
> R2 and so on...
> Is it just to parallelize the reduce process.
> Please help.
> Thanks
> Sai



-- 
Harsh J

Re: Is it possible to define num of mappers to run for a job

Posted by Sai Sai <sa...@yahoo.in>.

Is it possible to define num of mappers to run for a job.

What r the conditions we need to be aware of when defining such a thing.
Please help.
Thanks
Sai

Re: Is it possible to define num of mappers to run for a job

Posted by Sai Sai <sa...@yahoo.in>.

Is it possible to define num of mappers to run for a job.

What r the conditions we need to be aware of when defining such a thing.
Please help.
Thanks
Sai

Re: Pool & slot questions

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

Totally agree with Shahab,

just a quick answer, but detail is your homework

> Can we think of a job pool similar to a queue.
I do think, partition the slot resource into different chunk size.
FS. inside can be choose between FIFO or FAIR
Queue. it's FIFO.
cool thing about queue in Yarn is sub-pool check it out...

>  Is it possible to configure a slot if so how.
http://lmgtfy.com/?q=fair+scheduler+hadoop+tutorial

Good luck

On Jun 7, 2013, at 6:10 AM, Shahab Yunus <sh...@gmail.com>>
 wrote:

Sai,

This is regarding all your recent emails and questions. I suggest that you read Hadoop: The Definitive Guide by Tom White (http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as it goes through all of your queries in detail and with examples. The questions that you are asking are pretty basic and the answers are available and well documented all over the web. In parallel you can also download the code which is free and easily available and start looking into them.

Regards,
Shahab

On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai <sa...@yahoo.in>> wrote:
1. Can we think of a job pool similar to a queue.

2. Is it possible to configure a slot if so how.

Please help.
Thanks
Sai

Re: Pool & slot questions

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

Totally agree with Shahab,

just a quick answer, but detail is your homework

> Can we think of a job pool similar to a queue.
I do think, partition the slot resource into different chunk size.
FS. inside can be choose between FIFO or FAIR
Queue. it's FIFO.
cool thing about queue in Yarn is sub-pool check it out...

>  Is it possible to configure a slot if so how.
http://lmgtfy.com/?q=fair+scheduler+hadoop+tutorial

Good luck

On Jun 7, 2013, at 6:10 AM, Shahab Yunus <sh...@gmail.com>>
 wrote:

Sai,

This is regarding all your recent emails and questions. I suggest that you read Hadoop: The Definitive Guide by Tom White (http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as it goes through all of your queries in detail and with examples. The questions that you are asking are pretty basic and the answers are available and well documented all over the web. In parallel you can also download the code which is free and easily available and start looking into them.

Regards,
Shahab

On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai <sa...@yahoo.in>> wrote:
1. Can we think of a job pool similar to a queue.

2. Is it possible to configure a slot if so how.

Please help.
Thanks
Sai

Re: Pool & slot questions

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

Totally agree with Shahab,

just a quick answer, but detail is your homework

> Can we think of a job pool similar to a queue.
I do think, partition the slot resource into different chunk size.
FS. inside can be choose between FIFO or FAIR
Queue. it's FIFO.
cool thing about queue in Yarn is sub-pool check it out...

>  Is it possible to configure a slot if so how.
http://lmgtfy.com/?q=fair+scheduler+hadoop+tutorial

Good luck

On Jun 7, 2013, at 6:10 AM, Shahab Yunus <sh...@gmail.com>>
 wrote:

Sai,

This is regarding all your recent emails and questions. I suggest that you read Hadoop: The Definitive Guide by Tom White (http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as it goes through all of your queries in detail and with examples. The questions that you are asking are pretty basic and the answers are available and well documented all over the web. In parallel you can also download the code which is free and easily available and start looking into them.

Regards,
Shahab

On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai <sa...@yahoo.in>> wrote:
1. Can we think of a job pool similar to a queue.

2. Is it possible to configure a slot if so how.

Please help.
Thanks
Sai

Re: Pool & slot questions

Posted by Patai Sangbutsarakum <Pa...@turn.com>.

Totally agree with Shahab,

just a quick answer, but detail is your homework

> Can we think of a job pool similar to a queue.
I do think, partition the slot resource into different chunk size.
FS. inside can be choose between FIFO or FAIR
Queue. it's FIFO.
cool thing about queue in Yarn is sub-pool check it out...

>  Is it possible to configure a slot if so how.
http://lmgtfy.com/?q=fair+scheduler+hadoop+tutorial

Good luck

On Jun 7, 2013, at 6:10 AM, Shahab Yunus <sh...@gmail.com>>
 wrote:

Sai,

This is regarding all your recent emails and questions. I suggest that you read Hadoop: The Definitive Guide by Tom White (http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as it goes through all of your queries in detail and with examples. The questions that you are asking are pretty basic and the answers are available and well documented all over the web. In parallel you can also download the code which is free and easily available and start looking into them.

Regards,
Shahab

On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai <sa...@yahoo.in>> wrote:
1. Can we think of a job pool similar to a queue.

2. Is it possible to configure a slot if so how.

Please help.
Thanks
Sai

Re: Pool & slot questions

Posted by Shahab Yunus <sh...@gmail.com>.

Sai,

This is regarding all your recent emails and questions. I suggest that you
read Hadoop: The Definitive Guide by Tom White (
http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as
it goes through all of your queries in detail and with examples. The
questions that you are asking are pretty basic and the answers are
available and well documented all over the web. In parallel you can also
download the code which is free and easily available and start looking into
them.

Regards,
Shahab

On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai <sa...@yahoo.in> wrote:

> 1. Can we think of a job pool similar to a queue.
>
> 2. Is it possible to configure a slot if so how.
>
> Please help.
> Thanks
> Sai
>
>
>
>
>

Re: Pool & slot questions

Posted by Shahab Yunus <sh...@gmail.com>.

Sai,

This is regarding all your recent emails and questions. I suggest that you
read Hadoop: The Definitive Guide by Tom White (
http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as
it goes through all of your queries in detail and with examples. The
questions that you are asking are pretty basic and the answers are
available and well documented all over the web. In parallel you can also
download the code which is free and easily available and start looking into
them.

Regards,
Shahab

On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai <sa...@yahoo.in> wrote:

> 1. Can we think of a job pool similar to a queue.
>
> 2. Is it possible to configure a slot if so how.
>
> Please help.
> Thanks
> Sai
>
>
>
>
>

Re: Pool & slot questions

Posted by Shahab Yunus <sh...@gmail.com>.

Sai,

This is regarding all your recent emails and questions. I suggest that you
read Hadoop: The Definitive Guide by Tom White (
http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as
it goes through all of your queries in detail and with examples. The
questions that you are asking are pretty basic and the answers are
available and well documented all over the web. In parallel you can also
download the code which is free and easily available and start looking into
them.

Regards,
Shahab

On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai <sa...@yahoo.in> wrote:

> 1. Can we think of a job pool similar to a queue.
>
> 2. Is it possible to configure a slot if so how.
>
> Please help.
> Thanks
> Sai
>
>
>
>
>

Re: Pool & slot questions

Posted by Shahab Yunus <sh...@gmail.com>.

Sai,

This is regarding all your recent emails and questions. I suggest that you
read Hadoop: The Definitive Guide by Tom White (
http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as
it goes through all of your queries in detail and with examples. The
questions that you are asking are pretty basic and the answers are
available and well documented all over the web. In parallel you can also
download the code which is free and easily available and start looking into
them.

Regards,
Shahab

On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai <sa...@yahoo.in> wrote:

> 1. Can we think of a job pool similar to a queue.
>
> 2. Is it possible to configure a slot if so how.
>
> Please help.
> Thanks
> Sai
>
>
>
>
>

Re: Pool & slot questions

Posted by Sai Sai <sa...@yahoo.in>.

1. Can we think of a job pool similar to a queue.

2. Is it possible to configure a slot if so how.

Please help.
Thanks

Sai

Re: Pool & slot questions

Posted by Sai Sai <sa...@yahoo.in>.

1. Can we think of a job pool similar to a queue.

2. Is it possible to configure a slot if so how.

Please help.
Thanks

Sai

Re: Pool & slot questions

Posted by Sai Sai <sa...@yahoo.in>.

1. Can we think of a job pool similar to a queue.

2. Is it possible to configure a slot if so how.

Please help.
Thanks

Sai

Re: Is it possible to define num of mappers to run for a job

Posted by Sai Sai <sa...@yahoo.in>.

Is it possible to define num of mappers to run for a job.

What r the conditions we need to be aware of when defining such a thing.
Please help.
Thanks
Sai

Re: Pool & slot questions

Posted by Sai Sai <sa...@yahoo.in>.

1. Can we think of a job pool similar to a queue.

2. Is it possible to configure a slot if so how.

Please help.
Thanks

Sai

Re: Is it possible to define num of mappers to run for a job

Posted by Sai Sai <sa...@yahoo.in>.

Is it possible to define num of mappers to run for a job.

What r the conditions we need to be aware of when defining such a thing.
Please help.
Thanks
Sai

Re: Is counter a static var

Posted by Sai Sai <sa...@yahoo.in>.

Is counter like a static var. If so is it persisted on the name node or data node.
Any input please.

Thanks
Sai

Re: Is counter a static var

Posted by Sai Sai <sa...@yahoo.in>.

Is counter like a static var. If so is it persisted on the name node or data node.
Any input please.

Thanks
Sai

Re: Is counter a static var

Posted by Sai Sai <sa...@yahoo.in>.

Is counter like a static var. If so is it persisted on the name node or data node.
Any input please.

Thanks
Sai

Re: Is counter a static var

Posted by Sai Sai <sa...@yahoo.in>.

Is counter like a static var. If so is it persisted on the name node or data node.
Any input please.

Thanks
Sai

Re: How hadoop processes image or video files

Posted by Sai Sai <sa...@yahoo.in>.

How r the image files or video files processed using hadoop.
I understand that the byte[] is read by Hadoop using SeqFileFormat in map format but what is done after that 
with this byte[] as it is something which does not make sense in its raw form.
Any input please.

Thanks
Sai

Re: Why/When partitioner is used.

Posted by Harsh J <ha...@cloudera.com>.

Why not also ask yourself, what if you do not send all keys to the
same reducer? Would you get the results you desire that way? :)

On Fri, Jun 7, 2013 at 4:47 PM, Sai Sai <sa...@yahoo.in> wrote:
> I always get confused why we should partition and what is the use of it.
> Why would one want to send all the keys starting with A to Reducer1 and B to
> R2 and so on...
> Is it just to parallelize the reduce process.
> Please help.
> Thanks
> Sai



-- 
Harsh J

RE: Why/When partitioner is used.

Posted by John Lilley <jo...@redpoint.net>.

There are kind of two parts to this.  The semantics of MapReduce promise that all tuples sharing the same key value are sent to the same reducer, so that you can write useful MR applications that do things like “count words” or “summarize by date”.  In order to accomplish that, the shuffle phase of MR performs a partitioning by key to move tuples sharing the same key to the same node where they can be processed together.  You can think of key-partitioning as a strategy that assists in parallel distributed sorting.
john

From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Friday, June 07, 2013 5:17 AM
To: user@hadoop.apache.org
Subject: Re: Why/When partitioner is used.

I always get confused why we should partition and what is the use of it.
Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on...
Is it just to parallelize the reduce process.
Please help.
Thanks
Sai

Re: Why/When partitioner is used.

Posted by Harsh J <ha...@cloudera.com>.

Why not also ask yourself, what if you do not send all keys to the
same reducer? Would you get the results you desire that way? :)

On Fri, Jun 7, 2013 at 4:47 PM, Sai Sai <sa...@yahoo.in> wrote:
> I always get confused why we should partition and what is the use of it.
> Why would one want to send all the keys starting with A to Reducer1 and B to
> R2 and so on...
> Is it just to parallelize the reduce process.
> Please help.
> Thanks
> Sai



-- 
Harsh J

RE: Why/When partitioner is used.

Posted by John Lilley <jo...@redpoint.net>.

There are kind of two parts to this.  The semantics of MapReduce promise that all tuples sharing the same key value are sent to the same reducer, so that you can write useful MR applications that do things like “count words” or “summarize by date”.  In order to accomplish that, the shuffle phase of MR performs a partitioning by key to move tuples sharing the same key to the same node where they can be processed together.  You can think of key-partitioning as a strategy that assists in parallel distributed sorting.
john

From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Friday, June 07, 2013 5:17 AM
To: user@hadoop.apache.org
Subject: Re: Why/When partitioner is used.

I always get confused why we should partition and what is the use of it.
Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on...
Is it just to parallelize the reduce process.
Please help.
Thanks
Sai

Re: How hadoop processes image or video files

Posted by Sai Sai <sa...@yahoo.in>.

How r the image files or video files processed using hadoop.
I understand that the byte[] is read by Hadoop using SeqFileFormat in map format but what is done after that 
with this byte[] as it is something which does not make sense in its raw form.
Any input please.

Thanks
Sai

Re: Why/When partitioner is used.

Posted by Harsh J <ha...@cloudera.com>.

Why not also ask yourself, what if you do not send all keys to the
same reducer? Would you get the results you desire that way? :)

On Fri, Jun 7, 2013 at 4:47 PM, Sai Sai <sa...@yahoo.in> wrote:
> I always get confused why we should partition and what is the use of it.
> Why would one want to send all the keys starting with A to Reducer1 and B to
> R2 and so on...
> Is it just to parallelize the reduce process.
> Please help.
> Thanks
> Sai



-- 
Harsh J

Re: How hadoop processes image or video files

Posted by Sai Sai <sa...@yahoo.in>.

How r the image files or video files processed using hadoop.
I understand that the byte[] is read by Hadoop using SeqFileFormat in map format but what is done after that 
with this byte[] as it is something which does not make sense in its raw form.
Any input please.

Thanks
Sai

Re: How hadoop processes image or video files

Posted by Sai Sai <sa...@yahoo.in>.

How r the image files or video files processed using hadoop.
I understand that the byte[] is read by Hadoop using SeqFileFormat in map format but what is done after that 
with this byte[] as it is something which does not make sense in its raw form.
Any input please.

Thanks
Sai