You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Santhosh Srinivasan (JIRA)" <ji...@apache.org> on 2008/12/16 20:36:56 UTC

[jira] Created: (PIG-567) Handling strings and exceptions in text data parser

Handling strings and exceptions in text data parser
---------------------------------------------------

                 Key: PIG-567
                 URL: https://issues.apache.org/jira/browse/PIG-567
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
            Priority: Minor
             Fix For: types_branch


The text data parser treats a sequence of numerals as integer. If the data is too long to fit into an integer then a number format exception is thrown and no attempts are made to convert the data to a higher type. A couple of questions arise:

1. Should strings be annotated with delimiters like quotes to distinguish them from numbers?
2. Should conversions to higher types or strings be attempted? The conversions have performance implications.

{noformat}

Data file:
{(2985671202194220139L}

Pig script:
a = load 'data' as (list: bag{t: tuple(value: chararray)});
dump a

Output:
2008-12-13 09:08:24,831 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable
to open iterator for alias: a [Unable to store for alias: a [For input string: "2985671202194220139"]]
	at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:178)
	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:647)
	at org.apache.pig.PigServer.store(PigServer.java:452)
	at org.apache.pig.PigServer.store(PigServer.java:421)
	at org.apache.pig.PigServer.openIterator(PigServer.java:384)
	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
	at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94)
	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
	at org.apache.pig.Main.main(Main.java:282)
Caused by: java.io.IOException: Unable to store for alias: a [For input string: "2985671202194220139"]
	... 10 more
Caused by: org.apache.pig.backend.executionengine.ExecException: For input string: "2985671202194220139"
	... 10 more
Caused by: java.lang.NumberFormatException: For input string: "2985671202194220139"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
	at java.lang.Integer.parseInt(Integer.java:459)
	at java.lang.Integer.parseInt(Integer.java:497)
	at org.apache.pig.data.parser.TextDataParser.AtomDatum(TextDataParser.java:291)
	at org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:359)
	at org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:149)
	at org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:85)
	at org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:345)
	at org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
	at org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:70)
	at org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:78)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:861)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:243)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:226)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.store(POStore.java:137)
	at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:62)
	at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:166)
	... 9 more

2008-12-13 09:08:24,833 [main] ERROR org.apache.pig.tools.grunt.GruntParser - Unable to open iterator for alias: a [Unable to store for alias: a [For input string: "2985671202194220139"]]
2008-12-13 09:08:24,834 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Unable to open iterator for alias: a [Unable to store for alias: a [For input string: "2985671202194220139"]]

{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.