You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sameer Tilak <ss...@live.com> on 2014/06/20 02:55:07 UTC
issues with Apache Pig: embedded
Hi everyone,
I want to read each line in file test.txt -- it will be something like "'Markov chain Monte Carlo (MCMC) methods are an important algorithmic'" -- just individual sentences and then I will do some text processing on each sentence in my udf. I get back a score for each of these sentences. I am including only the relevant snippet of my python script. Any help with this will be great!
// Script snippet:params ={'infile': '/data/data.in', 'outfile': '/results/scores/', 'sentence': '0' };
f=open('./test.txt')
for currentline in f: print 'Current line is' + currentline; params["sentence"] = currentline; params["outfile"] = '/results/scores/' + 'algo.out' + str(i); bound = P.bind(params); i = i + 1; if result.isSuccessful() : print 'Pig job succeeded' else : raise 'Pig job failed'
Does not work when the myudf.myfunc is parameterized -- $sentence.P = Pig.compile("A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A GENERATE myudf.myfunc($sentence, line); STORE SCORES into '$outfile';")\
I get the following error:Current line is Markov chain Monte Carlo (MCMC) methods are an important algorithmic
2014-06-19 17:40:50,371 [main] INFO org.apache.pig.scripting.BoundScript - Query to run:A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A GENERATE myudf.myfunc(Markov, line); STORE SCORES into '/results/scores/algo.out0';
2014-06-19 17:40:51,100 [main] ERROR org.apache.pig.Main - ERROR 1025:<line 1, column 110> Invalid field projection. Projected field [Markov] does not exist in schema: line:chararray.Details at logfile: /apps/software/pig-scripts/pig_1403224834638.log
Works fine when myudf.myfunc 1st parameter is hardcoded:
P = Pig.compile("A = LOAD '/data/data.in' AS (line: chararray); SCORES = FOREACH A GENERATE myudf.myfunc('Markov chain Monte Carlo (MCMC) methods are an important algorithmic', line); STORE SCORES into '$outfile';");