You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by sam liu <sa...@gmail.com> on 2013/10/16 04:02:54 UTC

Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Hi Experts,

In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
However, seems Yarn did not execute the methods of
TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
below:

Test 1: Add some code in the method readPartitions() and setConf() in
TeraSort#TotalOrderPartitioner to print some words and write some word to a
file.
Expected Result: Some words should be printed and wrote into a file
Actual Result: No word was printed and wrote into a file at all

Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but
only remaining some necessary but empty methods in it
Expected Result: TeraSort job will ocurr some exception, as the specified
Partitioner is not implemented at all
Actual Result: TeraSort job completed successfully without any exception

Above tests confused me a lot, because seems Yarn never use specified
partitioner TeraSort#TotalOrderPartitioner at all during job execution.

Any one can help provide the reasons?

Thanks very much!

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Sam,

Have you tried changing the map or reduce classes and seeing if that has
any effect?

-Sandy


On Fri, Oct 18, 2013 at 8:05 AM, Ravi Prakash <ra...@ymail.com> wrote:

> Sam, I would guess that the jar file you think is running, is not actually
> the one. I am guessing that in the task classpath, there is a normal jar
> file (without your changes) which is being picked up before your modified
> jar file.
>
>
>
>
>   On Thursday, October 17, 2013 10:13 PM, sam liu <sa...@gmail.com>
> wrote:
>  It's really weird and confusing me. Anyone can help this question?
>
> Thanks!
>
>
> 2013/10/16 sam liu <sa...@gmail.com>
>
> Hi Experts,
>
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
> However, seems Yarn did not execute the methods of
> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
> below:
>
> Test 1: Add some code in the method readPartitions() and setConf() in
> TeraSort#TotalOrderPartitioner to print some words and write some word to a
> file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
>
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but
> only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified
> Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
>
> Above tests confused me a lot, because seems Yarn never use specified
> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>
> Any one can help provide the reasons?
>
> Thanks very much!
>
>
>
>
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Sam,

Have you tried changing the map or reduce classes and seeing if that has
any effect?

-Sandy


On Fri, Oct 18, 2013 at 8:05 AM, Ravi Prakash <ra...@ymail.com> wrote:

> Sam, I would guess that the jar file you think is running, is not actually
> the one. I am guessing that in the task classpath, there is a normal jar
> file (without your changes) which is being picked up before your modified
> jar file.
>
>
>
>
>   On Thursday, October 17, 2013 10:13 PM, sam liu <sa...@gmail.com>
> wrote:
>  It's really weird and confusing me. Anyone can help this question?
>
> Thanks!
>
>
> 2013/10/16 sam liu <sa...@gmail.com>
>
> Hi Experts,
>
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
> However, seems Yarn did not execute the methods of
> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
> below:
>
> Test 1: Add some code in the method readPartitions() and setConf() in
> TeraSort#TotalOrderPartitioner to print some words and write some word to a
> file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
>
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but
> only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified
> Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
>
> Above tests confused me a lot, because seems Yarn never use specified
> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>
> Any one can help provide the reasons?
>
> Thanks very much!
>
>
>
>
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Sam,

Have you tried changing the map or reduce classes and seeing if that has
any effect?

-Sandy


On Fri, Oct 18, 2013 at 8:05 AM, Ravi Prakash <ra...@ymail.com> wrote:

> Sam, I would guess that the jar file you think is running, is not actually
> the one. I am guessing that in the task classpath, there is a normal jar
> file (without your changes) which is being picked up before your modified
> jar file.
>
>
>
>
>   On Thursday, October 17, 2013 10:13 PM, sam liu <sa...@gmail.com>
> wrote:
>  It's really weird and confusing me. Anyone can help this question?
>
> Thanks!
>
>
> 2013/10/16 sam liu <sa...@gmail.com>
>
> Hi Experts,
>
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
> However, seems Yarn did not execute the methods of
> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
> below:
>
> Test 1: Add some code in the method readPartitions() and setConf() in
> TeraSort#TotalOrderPartitioner to print some words and write some word to a
> file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
>
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but
> only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified
> Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
>
> Above tests confused me a lot, because seems Yarn never use specified
> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>
> Any one can help provide the reasons?
>
> Thanks very much!
>
>
>
>
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Sam,

Have you tried changing the map or reduce classes and seeing if that has
any effect?

-Sandy


On Fri, Oct 18, 2013 at 8:05 AM, Ravi Prakash <ra...@ymail.com> wrote:

> Sam, I would guess that the jar file you think is running, is not actually
> the one. I am guessing that in the task classpath, there is a normal jar
> file (without your changes) which is being picked up before your modified
> jar file.
>
>
>
>
>   On Thursday, October 17, 2013 10:13 PM, sam liu <sa...@gmail.com>
> wrote:
>  It's really weird and confusing me. Anyone can help this question?
>
> Thanks!
>
>
> 2013/10/16 sam liu <sa...@gmail.com>
>
> Hi Experts,
>
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
> However, seems Yarn did not execute the methods of
> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
> below:
>
> Test 1: Add some code in the method readPartitions() and setConf() in
> TeraSort#TotalOrderPartitioner to print some words and write some word to a
> file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
>
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but
> only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified
> Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
>
> Above tests confused me a lot, because seems Yarn never use specified
> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>
> Any one can help provide the reasons?
>
> Thanks very much!
>
>
>
>
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Ravi Prakash <ra...@ymail.com>.

Sam, I would guess that the jar file you think is running, is not actually the one. I am guessing that in the task classpath, there is a normal jar file (without your changes) which is being picked up before your modified jar file.





On Thursday, October 17, 2013 10:13 PM, sam liu <sa...@gmail.com> wrote:
 
It's really weird and confusing me. Anyone can help this question? 

Thanks!




2013/10/16 sam liu <sa...@gmail.com>

Hi Experts,
>
>In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn did not execute the methods of TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as below:
>
>Test 1: Add some code in the method readPartitions() and setConf() in TeraSort#TotalOrderPartitioner to print some words and write some word to a file.
>Expected Result: Some words should be printed and wrote into a file
>Actual Result: No word was printed and wrote into a file at all
>
>Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but only remaining some necessary but empty methods in it
>
Expected Result: TeraSort job will ocurr some exception, as the specified Partitioner is not implemented at all
>Actual Result: TeraSort job completed successfully without any exception
>
>Above tests confused me a lot, because seems Yarn never use specified partitioner TeraSort#TotalOrderPartitioner at all during job execution. 
>
>Any one can help provide the reasons?
>
>Thanks very much!
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

Furthermore, I did another test: rename TeraSort#TotalOrderPartitioner to
TeraSort#MyOwnTotalOrderPartitioner to avoid conflicting with other
homonymic classes in hadoop classpath. Also, in TeraSort.java, I modified
'job.setPartitionerClass(TotalOrderPartitioner.class);' to
'job.setPartitionerClass(MyOwnTotalOrderPartitioner.class);'. However,
seems the MyOwnTotalOrderPartitioner was not invoked during executing
terasort job.

BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there is a
statement 'DataInputStream reader = fs.open(p);', and I know 'p' is the
path of '_partition.lst'. But I am not clear two details:
- Where is the location of 'p'? It's on hdfs or Linux file system? What's
its absolute path?
- Which part or phase of Hadoop MapReduce copy the _partition.lst file to
the path 'p'? I am very confusing this part

Thanks very much!



2013/10/20 sam liu <sa...@gmail.com>

> After I took following actions, the job still could pass and seems all
> TotalOrderPartitioner classes were not invoked at all:
> - Modified libexec/hadoop-config.sh to put
> hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
> and it should ensure the TeraSort#
> TotalOrderPartitioner will be invoked first
> - Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
> replace with the new generated
> share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar
>
>
> 2013/10/19 Arun C Murthy <ac...@hortonworks.com>
>
>> Apologies for the late response.
>>
>> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
>> org.apache.hadoop.mapred).
>>
>> Did you fiddle with the right TotalOrderPartitioner
>> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>>
>> Arun
>>
>> On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:
>>
>> It's really weird and confusing me. Anyone can help this question?
>>
>> Thanks!
>>
>>
>> 2013/10/16 sam liu <sa...@gmail.com>
>>
>>> Hi Experts,
>>>
>>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>>> However, seems Yarn did not execute the methods of
>>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>>> below:
>>>
>>> Test 1: Add some code in the method readPartitions() and setConf() in
>>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>>> file.
>>> Expected Result: Some words should be printed and wrote into a file
>>> Actual Result: No word was printed and wrote into a file at all
>>>
>>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>>> but only remaining some necessary but empty methods in it
>>> Expected Result: TeraSort job will ocurr some exception, as the
>>> specified Partitioner is not implemented at all
>>> Actual Result: TeraSort job completed successfully without any exception
>>>
>>> Above tests confused me a lot, because seems Yarn never use specified
>>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>>
>>> Any one can help provide the reasons?
>>>
>>> Thanks very much!
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

Furthermore, I did another test: rename TeraSort#TotalOrderPartitioner to
TeraSort#MyOwnTotalOrderPartitioner to avoid conflicting with other
homonymic classes in hadoop classpath. Also, in TeraSort.java, I modified
'job.setPartitionerClass(TotalOrderPartitioner.class);' to
'job.setPartitionerClass(MyOwnTotalOrderPartitioner.class);'. However,
seems the MyOwnTotalOrderPartitioner was not invoked during executing
terasort job.

BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there is a
statement 'DataInputStream reader = fs.open(p);', and I know 'p' is the
path of '_partition.lst'. But I am not clear two details:
- Where is the location of 'p'? It's on hdfs or Linux file system? What's
its absolute path?
- Which part or phase of Hadoop MapReduce copy the _partition.lst file to
the path 'p'? I am very confusing this part

Thanks very much!



2013/10/20 sam liu <sa...@gmail.com>

> After I took following actions, the job still could pass and seems all
> TotalOrderPartitioner classes were not invoked at all:
> - Modified libexec/hadoop-config.sh to put
> hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
> and it should ensure the TeraSort#
> TotalOrderPartitioner will be invoked first
> - Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
> replace with the new generated
> share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar
>
>
> 2013/10/19 Arun C Murthy <ac...@hortonworks.com>
>
>> Apologies for the late response.
>>
>> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
>> org.apache.hadoop.mapred).
>>
>> Did you fiddle with the right TotalOrderPartitioner
>> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>>
>> Arun
>>
>> On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:
>>
>> It's really weird and confusing me. Anyone can help this question?
>>
>> Thanks!
>>
>>
>> 2013/10/16 sam liu <sa...@gmail.com>
>>
>>> Hi Experts,
>>>
>>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>>> However, seems Yarn did not execute the methods of
>>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>>> below:
>>>
>>> Test 1: Add some code in the method readPartitions() and setConf() in
>>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>>> file.
>>> Expected Result: Some words should be printed and wrote into a file
>>> Actual Result: No word was printed and wrote into a file at all
>>>
>>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>>> but only remaining some necessary but empty methods in it
>>> Expected Result: TeraSort job will ocurr some exception, as the
>>> specified Partitioner is not implemented at all
>>> Actual Result: TeraSort job completed successfully without any exception
>>>
>>> Above tests confused me a lot, because seems Yarn never use specified
>>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>>
>>> Any one can help provide the reasons?
>>>
>>> Thanks very much!
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

Furthermore, I did another test: rename TeraSort#TotalOrderPartitioner to
TeraSort#MyOwnTotalOrderPartitioner to avoid conflicting with other
homonymic classes in hadoop classpath. Also, in TeraSort.java, I modified
'job.setPartitionerClass(TotalOrderPartitioner.class);' to
'job.setPartitionerClass(MyOwnTotalOrderPartitioner.class);'. However,
seems the MyOwnTotalOrderPartitioner was not invoked during executing
terasort job.

BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there is a
statement 'DataInputStream reader = fs.open(p);', and I know 'p' is the
path of '_partition.lst'. But I am not clear two details:
- Where is the location of 'p'? It's on hdfs or Linux file system? What's
its absolute path?
- Which part or phase of Hadoop MapReduce copy the _partition.lst file to
the path 'p'? I am very confusing this part

Thanks very much!



2013/10/20 sam liu <sa...@gmail.com>

> After I took following actions, the job still could pass and seems all
> TotalOrderPartitioner classes were not invoked at all:
> - Modified libexec/hadoop-config.sh to put
> hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
> and it should ensure the TeraSort#
> TotalOrderPartitioner will be invoked first
> - Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
> replace with the new generated
> share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar
>
>
> 2013/10/19 Arun C Murthy <ac...@hortonworks.com>
>
>> Apologies for the late response.
>>
>> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
>> org.apache.hadoop.mapred).
>>
>> Did you fiddle with the right TotalOrderPartitioner
>> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>>
>> Arun
>>
>> On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:
>>
>> It's really weird and confusing me. Anyone can help this question?
>>
>> Thanks!
>>
>>
>> 2013/10/16 sam liu <sa...@gmail.com>
>>
>>> Hi Experts,
>>>
>>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>>> However, seems Yarn did not execute the methods of
>>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>>> below:
>>>
>>> Test 1: Add some code in the method readPartitions() and setConf() in
>>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>>> file.
>>> Expected Result: Some words should be printed and wrote into a file
>>> Actual Result: No word was printed and wrote into a file at all
>>>
>>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>>> but only remaining some necessary but empty methods in it
>>> Expected Result: TeraSort job will ocurr some exception, as the
>>> specified Partitioner is not implemented at all
>>> Actual Result: TeraSort job completed successfully without any exception
>>>
>>> Above tests confused me a lot, because seems Yarn never use specified
>>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>>
>>> Any one can help provide the reasons?
>>>
>>> Thanks very much!
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

Furthermore, I did another test: rename TeraSort#TotalOrderPartitioner to
TeraSort#MyOwnTotalOrderPartitioner to avoid conflicting with other
homonymic classes in hadoop classpath. Also, in TeraSort.java, I modified
'job.setPartitionerClass(TotalOrderPartitioner.class);' to
'job.setPartitionerClass(MyOwnTotalOrderPartitioner.class);'. However,
seems the MyOwnTotalOrderPartitioner was not invoked during executing
terasort job.

BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there is a
statement 'DataInputStream reader = fs.open(p);', and I know 'p' is the
path of '_partition.lst'. But I am not clear two details:
- Where is the location of 'p'? It's on hdfs or Linux file system? What's
its absolute path?
- Which part or phase of Hadoop MapReduce copy the _partition.lst file to
the path 'p'? I am very confusing this part

Thanks very much!



2013/10/20 sam liu <sa...@gmail.com>

> After I took following actions, the job still could pass and seems all
> TotalOrderPartitioner classes were not invoked at all:
> - Modified libexec/hadoop-config.sh to put
> hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
> and it should ensure the TeraSort#
> TotalOrderPartitioner will be invoked first
> - Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
> replace with the new generated
> share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar
>
>
> 2013/10/19 Arun C Murthy <ac...@hortonworks.com>
>
>> Apologies for the late response.
>>
>> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
>> org.apache.hadoop.mapred).
>>
>> Did you fiddle with the right TotalOrderPartitioner
>> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>>
>> Arun
>>
>> On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:
>>
>> It's really weird and confusing me. Anyone can help this question?
>>
>> Thanks!
>>
>>
>> 2013/10/16 sam liu <sa...@gmail.com>
>>
>>> Hi Experts,
>>>
>>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>>> However, seems Yarn did not execute the methods of
>>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>>> below:
>>>
>>> Test 1: Add some code in the method readPartitions() and setConf() in
>>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>>> file.
>>> Expected Result: Some words should be printed and wrote into a file
>>> Actual Result: No word was printed and wrote into a file at all
>>>
>>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>>> but only remaining some necessary but empty methods in it
>>> Expected Result: TeraSort job will ocurr some exception, as the
>>> specified Partitioner is not implemented at all
>>> Actual Result: TeraSort job completed successfully without any exception
>>>
>>> Above tests confused me a lot, because seems Yarn never use specified
>>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>>
>>> Any one can help provide the reasons?
>>>
>>> Thanks very much!
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

After I took following actions, the job still could pass and seems all
TotalOrderPartitioner classes were not invoked at all:
- Modified libexec/hadoop-config.sh to put
hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
and it should ensure the TeraSort#
TotalOrderPartitioner will be invoked first
- Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
replace with the new generated
share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar


2013/10/19 Arun C Murthy <ac...@hortonworks.com>

> Apologies for the late response.
>
> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
> org.apache.hadoop.mapred).
>
> Did you fiddle with the right TotalOrderPartitioner
> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>
> Arun
>
> On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:
>
> It's really weird and confusing me. Anyone can help this question?
>
> Thanks!
>
>
> 2013/10/16 sam liu <sa...@gmail.com>
>
>> Hi Experts,
>>
>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>> However, seems Yarn did not execute the methods of
>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>> below:
>>
>> Test 1: Add some code in the method readPartitions() and setConf() in
>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>> file.
>> Expected Result: Some words should be printed and wrote into a file
>> Actual Result: No word was printed and wrote into a file at all
>>
>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>> but only remaining some necessary but empty methods in it
>> Expected Result: TeraSort job will ocurr some exception, as the specified
>> Partitioner is not implemented at all
>> Actual Result: TeraSort job completed successfully without any exception
>>
>> Above tests confused me a lot, because seems Yarn never use specified
>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>
>> Any one can help provide the reasons?
>>
>> Thanks very much!
>>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

After I took following actions, the job still could pass and seems all
TotalOrderPartitioner classes were not invoked at all:
- Modified libexec/hadoop-config.sh to put
hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
and it should ensure the TeraSort#
TotalOrderPartitioner will be invoked first
- Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
replace with the new generated
share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar


2013/10/19 Arun C Murthy <ac...@hortonworks.com>

> Apologies for the late response.
>
> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
> org.apache.hadoop.mapred).
>
> Did you fiddle with the right TotalOrderPartitioner
> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>
> Arun
>
> On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:
>
> It's really weird and confusing me. Anyone can help this question?
>
> Thanks!
>
>
> 2013/10/16 sam liu <sa...@gmail.com>
>
>> Hi Experts,
>>
>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>> However, seems Yarn did not execute the methods of
>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>> below:
>>
>> Test 1: Add some code in the method readPartitions() and setConf() in
>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>> file.
>> Expected Result: Some words should be printed and wrote into a file
>> Actual Result: No word was printed and wrote into a file at all
>>
>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>> but only remaining some necessary but empty methods in it
>> Expected Result: TeraSort job will ocurr some exception, as the specified
>> Partitioner is not implemented at all
>> Actual Result: TeraSort job completed successfully without any exception
>>
>> Above tests confused me a lot, because seems Yarn never use specified
>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>
>> Any one can help provide the reasons?
>>
>> Thanks very much!
>>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

After I took following actions, the job still could pass and seems all
TotalOrderPartitioner classes were not invoked at all:
- Modified libexec/hadoop-config.sh to put
hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
and it should ensure the TeraSort#
TotalOrderPartitioner will be invoked first
- Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
replace with the new generated
share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar


2013/10/19 Arun C Murthy <ac...@hortonworks.com>

> Apologies for the late response.
>
> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
> org.apache.hadoop.mapred).
>
> Did you fiddle with the right TotalOrderPartitioner
> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>
> Arun
>
> On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:
>
> It's really weird and confusing me. Anyone can help this question?
>
> Thanks!
>
>
> 2013/10/16 sam liu <sa...@gmail.com>
>
>> Hi Experts,
>>
>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>> However, seems Yarn did not execute the methods of
>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>> below:
>>
>> Test 1: Add some code in the method readPartitions() and setConf() in
>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>> file.
>> Expected Result: Some words should be printed and wrote into a file
>> Actual Result: No word was printed and wrote into a file at all
>>
>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>> but only remaining some necessary but empty methods in it
>> Expected Result: TeraSort job will ocurr some exception, as the specified
>> Partitioner is not implemented at all
>> Actual Result: TeraSort job completed successfully without any exception
>>
>> Above tests confused me a lot, because seems Yarn never use specified
>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>
>> Any one can help provide the reasons?
>>
>> Thanks very much!
>>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

After I took following actions, the job still could pass and seems all
TotalOrderPartitioner classes were not invoked at all:
- Modified libexec/hadoop-config.sh to put
hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
and it should ensure the TeraSort#
TotalOrderPartitioner will be invoked first
- Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
replace with the new generated
share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar


2013/10/19 Arun C Murthy <ac...@hortonworks.com>

> Apologies for the late response.
>
> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
> org.apache.hadoop.mapred).
>
> Did you fiddle with the right TotalOrderPartitioner
> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>
> Arun
>
> On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:
>
> It's really weird and confusing me. Anyone can help this question?
>
> Thanks!
>
>
> 2013/10/16 sam liu <sa...@gmail.com>
>
>> Hi Experts,
>>
>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>> However, seems Yarn did not execute the methods of
>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>> below:
>>
>> Test 1: Add some code in the method readPartitions() and setConf() in
>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>> file.
>> Expected Result: Some words should be printed and wrote into a file
>> Actual Result: No word was printed and wrote into a file at all
>>
>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>> but only remaining some necessary but empty methods in it
>> Expected Result: TeraSort job will ocurr some exception, as the specified
>> Partitioner is not implemented at all
>> Actual Result: TeraSort job completed successfully without any exception
>>
>> Above tests confused me a lot, because seems Yarn never use specified
>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>
>> Any one can help provide the reasons?
>>
>> Thanks very much!
>>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

Apologies for the late response.

In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not org.apache.hadoop.mapred).

Did you fiddle with the right TotalOrderPartitioner i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?

Arun

On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:

> It's really weird and confusing me. Anyone can help this question? 
> 
> Thanks!
> 
> 
> 2013/10/16 sam liu <sa...@gmail.com>
> Hi Experts,
> 
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn did not execute the methods of TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as below:
> 
> Test 1: Add some code in the method readPartitions() and setConf() in TeraSort#TotalOrderPartitioner to print some words and write some word to a file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
> 
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
> 
> Above tests confused me a lot, because seems Yarn never use specified partitioner TeraSort#TotalOrderPartitioner at all during job execution. 
> 
> Any one can help provide the reasons?
> 
> Thanks very much!
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Ravi Prakash <ra...@ymail.com>.

Sam, I would guess that the jar file you think is running, is not actually the one. I am guessing that in the task classpath, there is a normal jar file (without your changes) which is being picked up before your modified jar file.





On Thursday, October 17, 2013 10:13 PM, sam liu <sa...@gmail.com> wrote:
 
It's really weird and confusing me. Anyone can help this question? 

Thanks!




2013/10/16 sam liu <sa...@gmail.com>

Hi Experts,
>
>In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn did not execute the methods of TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as below:
>
>Test 1: Add some code in the method readPartitions() and setConf() in TeraSort#TotalOrderPartitioner to print some words and write some word to a file.
>Expected Result: Some words should be printed and wrote into a file
>Actual Result: No word was printed and wrote into a file at all
>
>Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but only remaining some necessary but empty methods in it
>
Expected Result: TeraSort job will ocurr some exception, as the specified Partitioner is not implemented at all
>Actual Result: TeraSort job completed successfully without any exception
>
>Above tests confused me a lot, because seems Yarn never use specified partitioner TeraSort#TotalOrderPartitioner at all during job execution. 
>
>Any one can help provide the reasons?
>
>Thanks very much!
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

Apologies for the late response.

In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not org.apache.hadoop.mapred).

Did you fiddle with the right TotalOrderPartitioner i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?

Arun

On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:

> It's really weird and confusing me. Anyone can help this question? 
> 
> Thanks!
> 
> 
> 2013/10/16 sam liu <sa...@gmail.com>
> Hi Experts,
> 
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn did not execute the methods of TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as below:
> 
> Test 1: Add some code in the method readPartitions() and setConf() in TeraSort#TotalOrderPartitioner to print some words and write some word to a file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
> 
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
> 
> Above tests confused me a lot, because seems Yarn never use specified partitioner TeraSort#TotalOrderPartitioner at all during job execution. 
> 
> Any one can help provide the reasons?
> 
> Thanks very much!
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

Apologies for the late response.

In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not org.apache.hadoop.mapred).

Did you fiddle with the right TotalOrderPartitioner i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?

Arun

On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:

> It's really weird and confusing me. Anyone can help this question? 
> 
> Thanks!
> 
> 
> 2013/10/16 sam liu <sa...@gmail.com>
> Hi Experts,
> 
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn did not execute the methods of TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as below:
> 
> Test 1: Add some code in the method readPartitions() and setConf() in TeraSort#TotalOrderPartitioner to print some words and write some word to a file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
> 
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
> 
> Above tests confused me a lot, because seems Yarn never use specified partitioner TeraSort#TotalOrderPartitioner at all during job execution. 
> 
> Any one can help provide the reasons?
> 
> Thanks very much!
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Ravi Prakash <ra...@ymail.com>.

Sam, I would guess that the jar file you think is running, is not actually the one. I am guessing that in the task classpath, there is a normal jar file (without your changes) which is being picked up before your modified jar file.





On Thursday, October 17, 2013 10:13 PM, sam liu <sa...@gmail.com> wrote:
 
It's really weird and confusing me. Anyone can help this question? 

Thanks!




2013/10/16 sam liu <sa...@gmail.com>

Hi Experts,
>
>In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn did not execute the methods of TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as below:
>
>Test 1: Add some code in the method readPartitions() and setConf() in TeraSort#TotalOrderPartitioner to print some words and write some word to a file.
>Expected Result: Some words should be printed and wrote into a file
>Actual Result: No word was printed and wrote into a file at all
>
>Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but only remaining some necessary but empty methods in it
>
Expected Result: TeraSort job will ocurr some exception, as the specified Partitioner is not implemented at all
>Actual Result: TeraSort job completed successfully without any exception
>
>Above tests confused me a lot, because seems Yarn never use specified partitioner TeraSort#TotalOrderPartitioner at all during job execution. 
>
>Any one can help provide the reasons?
>
>Thanks very much!
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Ravi Prakash <ra...@ymail.com>.

Sam, I would guess that the jar file you think is running, is not actually the one. I am guessing that in the task classpath, there is a normal jar file (without your changes) which is being picked up before your modified jar file.





On Thursday, October 17, 2013 10:13 PM, sam liu <sa...@gmail.com> wrote:
 
It's really weird and confusing me. Anyone can help this question? 

Thanks!




2013/10/16 sam liu <sa...@gmail.com>

Hi Experts,
>
>In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn did not execute the methods of TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as below:
>
>Test 1: Add some code in the method readPartitions() and setConf() in TeraSort#TotalOrderPartitioner to print some words and write some word to a file.
>Expected Result: Some words should be printed and wrote into a file
>Actual Result: No word was printed and wrote into a file at all
>
>Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but only remaining some necessary but empty methods in it
>
Expected Result: TeraSort job will ocurr some exception, as the specified Partitioner is not implemented at all
>Actual Result: TeraSort job completed successfully without any exception
>
>Above tests confused me a lot, because seems Yarn never use specified partitioner TeraSort#TotalOrderPartitioner at all during job execution. 
>
>Any one can help provide the reasons?
>
>Thanks very much!
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

Apologies for the late response.

In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not org.apache.hadoop.mapred).

Did you fiddle with the right TotalOrderPartitioner i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?

Arun

On Oct 17, 2013, at 8:12 PM, sam liu <sa...@gmail.com> wrote:

> It's really weird and confusing me. Anyone can help this question? 
> 
> Thanks!
> 
> 
> 2013/10/16 sam liu <sa...@gmail.com>
> Hi Experts,
> 
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn did not execute the methods of TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as below:
> 
> Test 1: Add some code in the method readPartitions() and setConf() in TeraSort#TotalOrderPartitioner to print some words and write some word to a file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
> 
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
> 
> Above tests confused me a lot, because seems Yarn never use specified partitioner TeraSort#TotalOrderPartitioner at all during job execution. 
> 
> Any one can help provide the reasons?
> 
> Thanks very much!
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

It's really weird and confusing me. Anyone can help this question?

Thanks!


2013/10/16 sam liu <sa...@gmail.com>

> Hi Experts,
>
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
> However, seems Yarn did not execute the methods of
> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
> below:
>
> Test 1: Add some code in the method readPartitions() and setConf() in
> TeraSort#TotalOrderPartitioner to print some words and write some word to a
> file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
>
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but
> only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified
> Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
>
> Above tests confused me a lot, because seems Yarn never use specified
> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>
> Any one can help provide the reasons?
>
> Thanks very much!
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

It's really weird and confusing me. Anyone can help this question?

Thanks!


2013/10/16 sam liu <sa...@gmail.com>

> Hi Experts,
>
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
> However, seems Yarn did not execute the methods of
> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
> below:
>
> Test 1: Add some code in the method readPartitions() and setConf() in
> TeraSort#TotalOrderPartitioner to print some words and write some word to a
> file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
>
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but
> only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified
> Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
>
> Above tests confused me a lot, because seems Yarn never use specified
> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>
> Any one can help provide the reasons?
>
> Thanks very much!
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

It's really weird and confusing me. Anyone can help this question?

Thanks!


2013/10/16 sam liu <sa...@gmail.com>

> Hi Experts,
>
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
> However, seems Yarn did not execute the methods of
> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
> below:
>
> Test 1: Add some code in the method readPartitions() and setConf() in
> TeraSort#TotalOrderPartitioner to print some words and write some word to a
> file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
>
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but
> only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified
> Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
>
> Above tests confused me a lot, because seems Yarn never use specified
> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>
> Any one can help provide the reasons?
>
> Thanks very much!
>

Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job?

Posted by sam liu <sa...@gmail.com>.

It's really weird and confusing me. Anyone can help this question?

Thanks!


2013/10/16 sam liu <sa...@gmail.com>

> Hi Experts,
>
> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
> However, seems Yarn did not execute the methods of
> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
> below:
>
> Test 1: Add some code in the method readPartitions() and setConf() in
> TeraSort#TotalOrderPartitioner to print some words and write some word to a
> file.
> Expected Result: Some words should be printed and wrote into a file
> Actual Result: No word was printed and wrote into a file at all
>
> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, but
> only remaining some necessary but empty methods in it
> Expected Result: TeraSort job will ocurr some exception, as the specified
> Partitioner is not implemented at all
> Actual Result: TeraSort job completed successfully without any exception
>
> Above tests confused me a lot, because seems Yarn never use specified
> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>
> Any one can help provide the reasons?
>
> Thanks very much!
>