You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by pob <pe...@gmail.com> on 2011/04/23 23:46:08 UTC
python UDF
Hello,
I installed jython + jruby on debian.
export
PIG_CLASSPATH=/path/cassandra-0.7/contrib/pig:/usr/share/java/jython.jar
/path/cassandra-0.7/contrib/pig - here is my udf, myFunc.py
#!/usr/bin/python
@outputSchema("t:tuple(domain:chararray, spam:int, size:int, time:int)")
def toTuple(bag):
#{(colname, value), (...), ...,}
for word in bag:
if word[0] == 'domain':
domain = word[1]
elif word[0] == 'spam':
spam = word[1]
elif word[0] == 'size':
size = word[1]
elif word[0] == 'time':
time = word[1]
return (domain, spam, size, time)
After starting grunt, i type
register '/path/cassandra-0.7/contrib/pig/myFunc.py' using jython as myUDF;
rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
CassandraStorage() AS (key, columns: bag {T: tuple(name:chararray,
value:int)});
d = foreach rows generate myUDF.toTuple($1);
When I type illustrate / dump d; the error occure:
Any idea where problem should be?
Thanks.
2011-04-23 23:40:13,428 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2998: Unhandled internal error. org/jruby/ext/posix/util/Platform
2011-04-23 23:26:02,211 [Thread-14] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.NoClassDefFoundError: org/jruby/ext/posix/util/Platform
at org.python.core.PySystemState.getPath(PySystemState.java:513)
at org.python.core.PySystemState.getPathLazy(PySystemState.java:502)
at org.python.core.util.RelativeFile.<init>(RelativeFile.java:17)
at org.python.core.PyTraceback.getLine(PyTraceback.java:52)
at org.python.core.PyTraceback.tracebackInfo(PyTraceback.java:37)
at org.python.core.PyTraceback.dumpStack(PyTraceback.java:108)
at org.python.core.PyTraceback.dumpStack(PyTraceback.java:119)
at org.python.core.Py.displayException(Py.java:1007)
at org.python.core.PyException.printStackTrace(PyException.java:79)
at org.python.core.PyException.toString(PyException.java:98)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at
org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2011-04-23 23:26:02,331 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0001
Re: python UDF
Posted by pob <pe...@gmail.com>.
Sorry,
my first py method was broken.
The correct one & everything works fine.
@outputSchema("t:(domain:chararray, spam:int, size:int, time:int)")
def toTuple(bag):
if len(bag) != 0:
for word in bag:
if word[0] == 'domain':
domain = word[1]
elif word[0] == 'spam':
spam = word[1]
elif word[0] == 'size':
size = word[1]
elif word[0] == 'time':
time = word[1]
tup = (domain, spam, size, time)
else:
tup = ()
return tup
2011/4/24 pob <pe...@gmail.com>
> ERROR 2998: Unhandled internal error. org/jruby/ext/posix/util/Platform
>
> java.lang.NoClassDefFoundError: org/jruby/ext/posix/util/Platform
> at org.python.core.PySystemState.getPath(PySystemState.java:513)
> at
> org.python.core.PySystemState.getPathLazy(PySystemState.java:502)
> at org.python.core.util.RelativeFile.<init>(RelativeFile.java:17)
> at org.python.core.PyTraceback.getLine(PyTraceback.java:52)
> at org.python.core.PyTraceback.tracebackInfo(PyTraceback.java:37)
> at org.python.core.PyTraceback.dumpStack(PyTraceback.java:108)
> at org.python.core.PyTraceback.dumpStack(PyTraceback.java:119)
> at org.python.core.Py.displayException(Py.java:1007)
> at org.python.core.PyException.printStackTrace(PyException.java:79)
> at org.python.core.PyException.toString(PyException.java:98)
> at java.lang.String.valueOf(String.java:2826)
> at java.lang.StringBuilder.append(StringBuilder.java:115)
> at
> org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
> org.apache.pig.pen.DerivedDataVisitor.evaluateOperator(DerivedDataVisitor.java:354)
> at
> org.apache.pig.pen.DerivedDataVisitor.visit(DerivedDataVisitor.java:232)
> at
> org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:132)
> at
> org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:47)
> at
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at
> org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:94)
> at
> org.apache.pig.pen.LineageTrimmingVisitor.<init>(LineageTrimmingVisitor.java:86)
> at
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:105)
> at org.apache.pig.PigServer.getExamples(PigServer.java:1144)
> at
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:627)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:308)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> at org.apache.pig.Main.run(Main.java:465)
> at org.apache.pig.Main.main(Main.java:107)
>
> ================================================================================
>
>
> 2011/4/23 pob <pe...@gmail.com>
>
>> Hello,
>>
>> I installed jython + jruby on debian.
>>
>> export
>> PIG_CLASSPATH=/path/cassandra-0.7/contrib/pig:/usr/share/java/jython.jar
>>
>> /path/cassandra-0.7/contrib/pig - here is my udf, myFunc.py
>>
>>
>> #!/usr/bin/python
>> @outputSchema("t:tuple(domain:chararray, spam:int, size:int, time:int)")
>> def toTuple(bag):
>>
>> #{(colname, value), (...), ...,}
>>
>> for word in bag:
>> if word[0] == 'domain':
>> domain = word[1]
>> elif word[0] == 'spam':
>> spam = word[1]
>> elif word[0] == 'size':
>> size = word[1]
>> elif word[0] == 'time':
>> time = word[1]
>>
>> return (domain, spam, size, time)
>>
>>
>> After starting grunt, i type
>>
>> register '/path/cassandra-0.7/contrib/pig/myFunc.py' using jython as
>> myUDF;
>>
>> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
>> CassandraStorage() AS (key, columns: bag {T: tuple(name:chararray,
>> value:int)});
>> d = foreach rows generate myUDF.toTuple($1);
>>
>> When I type illustrate / dump d; the error occure:
>>
>>
>> Any idea where problem should be?
>>
>>
>> Thanks.
>>
>>
>>
>> 2011-04-23 23:40:13,428 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 2998: Unhandled internal error. org/jruby/ext/posix/util/Platform
>>
>> 2011-04-23 23:26:02,211 [Thread-14] WARN
>> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
>> java.lang.NoClassDefFoundError: org/jruby/ext/posix/util/Platform
>> at org.python.core.PySystemState.getPath(PySystemState.java:513)
>> at org.python.core.PySystemState.getPathLazy(PySystemState.java:502)
>> at org.python.core.util.RelativeFile.<init>(RelativeFile.java:17)
>> at org.python.core.PyTraceback.getLine(PyTraceback.java:52)
>> at org.python.core.PyTraceback.tracebackInfo(PyTraceback.java:37)
>> at org.python.core.PyTraceback.dumpStack(PyTraceback.java:108)
>> at org.python.core.PyTraceback.dumpStack(PyTraceback.java:119)
>> at org.python.core.Py.displayException(Py.java:1007)
>> at org.python.core.PyException.printStackTrace(PyException.java:79)
>> at org.python.core.PyException.toString(PyException.java:98)
>> at java.lang.String.valueOf(String.java:2826)
>> at java.lang.StringBuilder.append(StringBuilder.java:115)
>> at
>> org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> 2011-04-23 23:26:02,331 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - HadoopJobId: job_local_0001
>>
>>
>
Re: python UDF
Posted by pob <pe...@gmail.com>.
ERROR 2998: Unhandled internal error. org/jruby/ext/posix/util/Platform
java.lang.NoClassDefFoundError: org/jruby/ext/posix/util/Platform
at org.python.core.PySystemState.getPath(PySystemState.java:513)
at org.python.core.PySystemState.getPathLazy(PySystemState.java:502)
at org.python.core.util.RelativeFile.<init>(RelativeFile.java:17)
at org.python.core.PyTraceback.getLine(PyTraceback.java:52)
at org.python.core.PyTraceback.tracebackInfo(PyTraceback.java:37)
at org.python.core.PyTraceback.dumpStack(PyTraceback.java:108)
at org.python.core.PyTraceback.dumpStack(PyTraceback.java:119)
at org.python.core.Py.displayException(Py.java:1007)
at org.python.core.PyException.printStackTrace(PyException.java:79)
at org.python.core.PyException.toString(PyException.java:98)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at
org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.pen.DerivedDataVisitor.evaluateOperator(DerivedDataVisitor.java:354)
at
org.apache.pig.pen.DerivedDataVisitor.visit(DerivedDataVisitor.java:232)
at
org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:132)
at
org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:47)
at
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:94)
at
org.apache.pig.pen.LineageTrimmingVisitor.<init>(LineageTrimmingVisitor.java:86)
at
org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:105)
at org.apache.pig.PigServer.getExamples(PigServer.java:1144)
at
org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:627)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:308)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
at org.apache.pig.Main.run(Main.java:465)
at org.apache.pig.Main.main(Main.java:107)
================================================================================
2011/4/23 pob <pe...@gmail.com>
> Hello,
>
> I installed jython + jruby on debian.
>
> export
> PIG_CLASSPATH=/path/cassandra-0.7/contrib/pig:/usr/share/java/jython.jar
>
> /path/cassandra-0.7/contrib/pig - here is my udf, myFunc.py
>
>
> #!/usr/bin/python
> @outputSchema("t:tuple(domain:chararray, spam:int, size:int, time:int)")
> def toTuple(bag):
>
> #{(colname, value), (...), ...,}
>
> for word in bag:
> if word[0] == 'domain':
> domain = word[1]
> elif word[0] == 'spam':
> spam = word[1]
> elif word[0] == 'size':
> size = word[1]
> elif word[0] == 'time':
> time = word[1]
>
> return (domain, spam, size, time)
>
>
> After starting grunt, i type
>
> register '/path/cassandra-0.7/contrib/pig/myFunc.py' using jython as myUDF;
>
> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
> CassandraStorage() AS (key, columns: bag {T: tuple(name:chararray,
> value:int)});
> d = foreach rows generate myUDF.toTuple($1);
>
> When I type illustrate / dump d; the error occure:
>
>
> Any idea where problem should be?
>
>
> Thanks.
>
>
>
> 2011-04-23 23:40:13,428 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2998: Unhandled internal error. org/jruby/ext/posix/util/Platform
>
> 2011-04-23 23:26:02,211 [Thread-14] WARN
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> java.lang.NoClassDefFoundError: org/jruby/ext/posix/util/Platform
> at org.python.core.PySystemState.getPath(PySystemState.java:513)
> at org.python.core.PySystemState.getPathLazy(PySystemState.java:502)
> at org.python.core.util.RelativeFile.<init>(RelativeFile.java:17)
> at org.python.core.PyTraceback.getLine(PyTraceback.java:52)
> at org.python.core.PyTraceback.tracebackInfo(PyTraceback.java:37)
> at org.python.core.PyTraceback.dumpStack(PyTraceback.java:108)
> at org.python.core.PyTraceback.dumpStack(PyTraceback.java:119)
> at org.python.core.Py.displayException(Py.java:1007)
> at org.python.core.PyException.printStackTrace(PyException.java:79)
> at org.python.core.PyException.toString(PyException.java:98)
> at java.lang.String.valueOf(String.java:2826)
> at java.lang.StringBuilder.append(StringBuilder.java:115)
> at
> org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> 2011-04-23 23:26:02,331 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_local_0001
>
>