You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2010/01/28 03:15:34 UTC

[jira] Assigned: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified

     [ https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1187:
-----------------------------------

    Assignee: Ashutosh Chauhan

> UTF-8 (international code) breaks with loader when load with schema is specified
> --------------------------------------------------------------------------------
>
>                 Key: PIG-1187
>                 URL: https://issues.apache.org/jira/browse/PIG-1187
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>
> I have a set of Pig statements which dump an international dataset.
> {code}
> INPUT_OBJECT = load 'internationalcode';
> describe INPUT_OBJECT;
> dump INPUT_OBJECT;
> {code}
> Sample output
> (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)})
> It works and dumps results but when I use a schema for loading it fails.
> {code}
> INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag {T: tuple(label:chararray)});
> describe INPUT_OBJECT;
> {code}
> The error message is as follows:2010-01-14 02:23:27,320 FATAL org.apache.hadoop.mapred.Child: Error running child : org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop caused by repeated empty string matches at line 1, column 21.
> 	at org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620)
> 	at org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569)
> 	at org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651)
> 	at org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152)
> 	at org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100)
> 	at org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382)
> 	at org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
> 	at org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68)
> 	at org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.