You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Woody Anderson (JIRA)" <ji...@apache.org> on 2011/05/26 19:23:47 UTC

[jira] [Commented] (PIG-2098) jython - problem with single item tuple in bag

    [ https://issues.apache.org/jira/browse/PIG-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039799#comment-13039799 ] 

Woody Anderson commented on PIG-2098:
-------------------------------------

i have a few comments on this non-bug. it's a dupe of https://issues.apache.org/jira/browse/PIG-1942
which is an improvement, b/c tho it is possible to convert a chararray to a tuple automatically, it's current up to the user to do this.

WITH PIG-1942 patch you can:
@outputSchema("keys:bag{t:tuple(key:chararray)}")
def keys(map):
  return map.iterkeys()

WITHOUT, you can write more efficient jython using a list comprehension
@outputSchema("keys:bag{t:tuple(key:chararray)}")
def keys(map):
  return [(k,) for k in map.iterkeys()]

> jython - problem with single item tuple in bag
> ----------------------------------------------
>
>                 Key: PIG-2098
>                 URL: https://issues.apache.org/jira/browse/PIG-2098
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Woody Anderson
>
> While using phython udf, if I create a tuple with a single field, Pig execution fails with ClassCastException.
> Caused by: java.io.IOException: Error executing function: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert jython type to pig datatype java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple
> 	at org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
> An example to reproduce the issuue ;
> Pig Script
> {code}
> register 'mapkeys.py' using jython as mapkeys;
> A = load 'mapkeys.data' using PigStorage() as ( aMap: map[] );
> C = foreach A generate mapkeys.keys(aMap);
> dump C;
> {code}
> mapkeys.py
> {code}
> @outputSchema("keys:bag{t:tuple(key:chararray)}")
> def keys(map):
>   print "mapkeys.py:keys:map:", map
>   outBag = []
>   for key in map.iterkeys():
>     t = (key) ## doesn't work, causes Pig to crash
>     #t = (key,) ## adding empty value works :-/
>     outBag.append(t)
>   print "mapkeys.py:keys:outBag:", outBag
>   return outBag
> {code}
> Input data 'mapkeys.data'
> [name#John,phone#5551212]
> In the udf, t = (key) , because of this the item inside the bag is treated as a string instead of a tuple which causes for the class cast execption.
> If I provide an additional comma, t = (key,) , then the script goes through fine.
> From code what I can see is that ,for "t = (key,)" , pythonToPig(..) recieves the pyObject as  [(u'name',), (u'phone',)] from the PyFunction call .
> But for "t = (key)" the return from PyFunction call is [u'name', u'phone']

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira