You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Sen Fang (JIRA)" <ji...@apache.org> on 2014/02/18 20:50:20 UTC

[jira] [Commented] (PIG-3463) Pig should use hadoop local mode for small jobs

    [ https://issues.apache.org/jira/browse/PIG-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904486#comment-13904486 ] 

Sen Fang commented on PIG-3463:
-------------------------------

Aniket, thanks a lot for making this wonderful feature!

However, I had some issues when testing it. It seems that it would fail on a MapOnly task (i.e. 0 reducers). An trivial example of such job would be load a small data and immediately store them back. I'm using Hadoop 2.2 so I'm not sure if that has anything to do with this issue.

The reason reported was that Hadoop expects the Map output format to be LongWritable, Text while the actual output format are PigNullableWritable,Writable. In this case, the PigMapOnly.java passes null,Tuple to collect.

java.lang.Exception: java.io.IOException: Type mismatch in value from map: expec
ted org.apache.hadoop.io.Text, received org.apache.pig.data.BinSedesTuple
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:4
03)
Caused by: java.io.IOException: Type mismatch in value from map: expected org.ap
ache.hadoop.io.Text, received org.apache.pig.data.BinSedesTuple
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java
:1055)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.jav
a:691)
        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(Tas
kInputOutputContextImpl.java:89)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(Wrapp
edMapper.java:112)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOn
ly$Map.collect(PigMapOnly.java:48)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGener
icMapBase.runPipeline(PigGenericMapBase.java:284)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGener
icMapBase.map(PigGenericMapBase.java:277)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGener
icMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(Local
JobRunner.java:235)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:47
1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
        at java.lang.Thread.run(Thread.java:724)

I did find a solution: I moved the local mode conversion code block couple lines above and I think this might have to do with the way Mapper/Reducer plans are processed. I don't know enough to make any firm conclusion and have attached my patch for you to review.

> Pig should use hadoop local mode for small jobs
> -----------------------------------------------
>
>                 Key: PIG-3463
>                 URL: https://issues.apache.org/jira/browse/PIG-3463
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.11.1
>            Reporter: Aniket Mokashi
>            Assignee: Aniket Mokashi
>             Fix For: 0.13.0
>
>         Attachments: JobControlCompiler.java.patch, PIG-3463-1.patch, PIG-3463-3.patch, PIG-3463-6.patch
>
>
> Pig should use hadoop local mode for small jobs - few mappers, few reducers and few mb of data.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)