You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Andreas Paepcke <pa...@gmail.com> on 2010/12/31 19:40:10 UTC
Cannot get Initial/Intermed/Final execs to be called

I wrote a UDF tfidf(), which I run successfully
on an example, as long as I do not implement
the Algebraic interface.

Once I add Algebraic support, I see that
getInitial()/getIntermed()/getFinal() are called
many, many times. But the exec methods of the
three Algebraic classes are never invoked.
Neither is the default implementation method.
The result is always: ().

How do I force the appropriate code path to
be chosen?

Below a pseudo version of the script (for simplification),
followed by an excerpt of the run's output.

The 'SET pig.usenewlogicalplan 'false' is
required, else I receive an error:
ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false.

I am running Pig version 0.8.0 (r1043805)

Thank you for any hints,

Andreas


;;;;;;;;;;  Script Pseudo-Code ;;;;;;;;
SET pig.usenewlogicalplan 'false';
REGISTER...

docs = LOAD '...'


/*
Create data with the following schema:
  {(docId1 {(w1,,tf1,1), (w2,1,tf2,1)})
   (docId2 {(w1,tf1,2), (w3,1,tf3,2)})
  }

*/

theWordTfs  = FOREACH doc GENERATE docId, wordTfs;
docIdPlusWordTfsBag = GROUP theWordTfs all;

-- Call the Algebraic function:
tfidfs = FOREACH docIdPlusWordTfsBag GENERATE
FLATTEN(myutils.TfIdf(theWordTfs));

dump tfidfs;
DESCRIBE tfidfs;

;;;;;;;;;;;;;;;;;;;;;;;; Excerpt from Console outputs ;;;;;;;;;;;;;;;;;;


2010-12-31 10:19:48,968 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Columns pruned for
docs: $1, $2
2010-12-31 10:19:48,968 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned
for docs
2010-12-31 10:19:49,024 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script:
GROUP_BY
   ...
2010-12-31 10:19:49,087 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
tfidfs:
Store(hdfs://ilc0:54310/tmp/temp-1730375604/tmp1619053553:org.apache.pig.impl.io.InterStorage)
- 1-172 Operator Key: 1-172)
2010-12-31 10:19:49,098 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2010-12-31 10:19:49,109 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer
- Choosing to move algebraic foreach to combiner
   ...
2010-12-31 10:19:51,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Neither PARALLEL nor default parallelism
is set for this job. Setting number of reducers to 1
2010-12-31 10:19:51,070 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2010-12-31 10:19:51,260 [Thread-11] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to
process : 1
2010-12-31 10:19:51,260 [Thread-11] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2010-12-31 10:19:51,270 [Thread-11] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2010-12-31 10:19:51,571 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201012310203_0020
   ...
2010-12-31 10:20:16,643 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Encountered Warning ACCESSING_NON_EXISTENT_FIELD 1 time(s).
2010-12-31 10:20:16,643 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
2010-12-31 10:20:16,657 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to
process : 1
2010-12-31 10:20:16,657 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
()