You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Woody Anderson (JIRA)" <ji...@apache.org> on 2011/05/26 19:23:47 UTC
[jira] [Commented] (PIG-2098) jython - problem with single item
tuple in bag
[ https://issues.apache.org/jira/browse/PIG-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039799#comment-13039799 ]
Woody Anderson commented on PIG-2098:
-------------------------------------
i have a few comments on this non-bug. it's a dupe of https://issues.apache.org/jira/browse/PIG-1942
which is an improvement, b/c tho it is possible to convert a chararray to a tuple automatically, it's current up to the user to do this.
WITH PIG-1942 patch you can:
@outputSchema("keys:bag{t:tuple(key:chararray)}")
def keys(map):
return map.iterkeys()
WITHOUT, you can write more efficient jython using a list comprehension
@outputSchema("keys:bag{t:tuple(key:chararray)}")
def keys(map):
return [(k,) for k in map.iterkeys()]
> jython - problem with single item tuple in bag
> ----------------------------------------------
>
> Key: PIG-2098
> URL: https://issues.apache.org/jira/browse/PIG-2098
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1, 0.9.0
> Reporter: Vivek Padmanabhan
> Assignee: Woody Anderson
>
> While using phython udf, if I create a tuple with a single field, Pig execution fails with ClassCastException.
> Caused by: java.io.IOException: Error executing function: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert jython type to pig datatype java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple
> at org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111)
> at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
> An example to reproduce the issuue ;
> Pig Script
> {code}
> register 'mapkeys.py' using jython as mapkeys;
> A = load 'mapkeys.data' using PigStorage() as ( aMap: map[] );
> C = foreach A generate mapkeys.keys(aMap);
> dump C;
> {code}
> mapkeys.py
> {code}
> @outputSchema("keys:bag{t:tuple(key:chararray)}")
> def keys(map):
> print "mapkeys.py:keys:map:", map
> outBag = []
> for key in map.iterkeys():
> t = (key) ## doesn't work, causes Pig to crash
> #t = (key,) ## adding empty value works :-/
> outBag.append(t)
> print "mapkeys.py:keys:outBag:", outBag
> return outBag
> {code}
> Input data 'mapkeys.data'
> [name#John,phone#5551212]
> In the udf, t = (key) , because of this the item inside the bag is treated as a string instead of a tuple which causes for the class cast execption.
> If I provide an additional comma, t = (key,) , then the script goes through fine.
> From code what I can see is that ,for "t = (key,)" , pythonToPig(..) recieves the pyObject as [(u'name',), (u'phone',)] from the PyFunction call .
> But for "t = (key)" the return from PyFunction call is [u'name', u'phone']
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira