You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2010/07/12 21:45:49 UTC
[jira] Updated: (PIG-947) Parsing Bags by PigStorage is not handled
correctly if whitespace before start of tuple.
[ https://issues.apache.org/jira/browse/PIG-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-947:
-------------------------------
Fix Version/s: 0.8.0
> Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.
> ----------------------------------------------------------------------------------------
>
> Key: PIG-947
> URL: https://issues.apache.org/jira/browse/PIG-947
> Project: Pig
> Issue Type: Bug
> Components: data
> Environment: Pig on Hadoop 18
> Reporter: Gandul Azul
> Fix For: 0.8.0
>
>
> PigStorage parser for bags is not working correctly when a tuple in a bag is proceeded by a space. For example, the following is parsed correctly:
> {(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.000044,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}
> while this is not: (Note the space before the second tuple)
> {(-5.243084,3.142401,0.000138,2.071200,0), (-6.021349,0.992683,0.000044,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}
> It seems that the parser when it encounters the space, treats the rest of the line as a String. With a schema, this results in a typecast of string to databag which results in exception.
> |WARN builtin.PigStorage: Unable to interpret value [B@2c9b42e6 in field being converted to type bag, caught ParseException <Encountered " <STRING> " "" at |line 1, column 43.
> |Was expecting:
> | "(" ...
> | > field discarded
> Below is the parser debug output for the parsing of the above error sequence: "2.071200,0), (" from above...
> ****** FOUND A <DOUBLENUMBER> MATCH (2.071200) ******
> Call: AtomDatum
> Consumed token: <<DOUBLENUMBER>: "2.071200" at line 1 column 31>
> Return: AtomDatum
> Return: Datum
> Matched the empty string as <STRING> token.
> Current character : , (44) at line 1 column 39
> No more string literal token matches are possible.
> Currently matched the first 1 characters as a "," token.
> ****** FOUND A "," MATCH (,) ******
> Consumed token: <"," at line 1 column 39>
> Call: Datum
> Matched the empty string as <STRING> token.
> Current character : 0 (48) at line 1 column 40
> No string literal matches possible.
> Starting NFA to match one of : { <STRING>, <SIGNEDINTEGER>, <DOUBLENUMBER> }
> Current character : 0 (48) at line 1 column 40
> Currently matched the first 1 characters as a <SIGNEDINTEGER> token.
> Possible kinds of longer matches : { <STRING>, <SIGNEDINTEGER>, <DOUBLENUMBER>, <LONGINTEGER>,
> <FLOATNUMBER> }
> Current character : ) (41) at line 1 column 41
> Currently matched the first 1 characters as a <SIGNEDINTEGER> token.
> Putting back 1 characters into the input stream.
> ****** FOUND A <SIGNEDINTEGER> MATCH (0) ******
> Call: AtomDatum
> Consumed token: <<SIGNEDINTEGER>: "0" at line 1 column 40>
> Return: AtomDatum
> Return: Datum
> Matched the empty string as <STRING> token.
> Current character : ) (41) at line 1 column 41
> No more string literal token matches are possible.
> Currently matched the first 1 characters as a ")" token.
> ****** FOUND A ")" MATCH ()) ******
> Return: Tuple
> Consumed token: <")" at line 1 column 41>
> Matched the empty string as <STRING> token.
> Current character : , (44) at line 1 column 42
> No more string literal token matches are possible.
> Currently matched the first 1 characters as a "," token.
> ****** FOUND A "," MATCH (,) ******
> Consumed token: <"," at line 1 column 42>
> Matched the empty string as <STRING> token.
> Current character : (32) at line 1 column 43
> No string literal matches possible.
> Starting NFA to match one of : { <STRING>, <SIGNEDINTEGER>, <DOUBLENUMBER> }
> Current character : (32) at line 1 column 43
> Currently matched the first 1 characters as a <STRING> token.
> Possible kinds of longer matches : { <STRING>, <SIGNEDINTEGER>, <DOUBLENUMBER> }
> Current character : ( (40) at line 1 column 44
> Currently matched the first 1 characters as a <STRING> token.
> Putting back 1 characters into the input stream.
> ****** FOUND A <STRING> MATCH ( ) ******
> Return: Bag
> Return: Datum
> Return: Parse
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.