You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sameer Tilak <ss...@live.com> on 2014/06/20 02:55:07 UTC

issues with Apache Pig: embedded




Hi everyone,
I want to read each line in file test.txt -- it will be something like "'Markov    chain Monte Carlo (MCMC) methods are an important algorithmic'"  -- just individual sentences and then I will do some text processing on each sentence in my udf. I get back a score for each of these sentences.  I am including only the relevant snippet of my python script.  Any help with this will be great!


// Script snippet:params ={'infile':  '/data/data.in', 'outfile': '/results/scores/', 'sentence': '0' };
f=open('./test.txt')
for currentline in f:    print 'Current line is' + currentline;     params["sentence"] = currentline;    params["outfile"] = '/results/scores/' + 'algo.out' + str(i);   bound = P.bind(params);      i = i + 1;    if result.isSuccessful() :        print 'Pig job succeeded'    else :        raise 'Pig job failed'
Does not work when the myudf.myfunc is parameterized -- $sentence.P = Pig.compile("A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A GENERATE myudf.myfunc($sentence, line); STORE SCORES into '$outfile';")\
I get the following error:Current line is Markov    chain Monte Carlo (MCMC) methods are an   important algorithmic
2014-06-19 17:40:50,371 [main] INFO  org.apache.pig.scripting.BoundScript - Query to run:A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A GENERATE  myudf.myfunc(Markov, line); STORE SCORES into '/results/scores/algo.out0';
2014-06-19 17:40:51,100 [main] ERROR org.apache.pig.Main - ERROR 1025:<line 1, column 110> Invalid field projection. Projected field [Markov] does not exist in schema: line:chararray.Details at logfile: /apps/software/pig-scripts/pig_1403224834638.log

Works fine when myudf.myfunc 1st parameter is hardcoded:
P = Pig.compile("A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A GENERATE myudf.myfunc('Markov    chain Monte Carlo (MCMC) methods are an important algorithmic', line); STORE SCORES into '$outfile';");