You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jameson Lopp <ja...@bronto.com> on 2011/07/18 21:31:19 UTC

syntax error trying to access dynamically created columns

I'm loading sequence files, of which each row's 'value' is a tab delimited set of columns. I'm 
exploding the values out so that I can work with them separately, but pig's syntax parser is giving 
me a hard time.

-----------------------------------------------------------------
logs = LOAD '/data/2011-07-17/part-*' USING SequenceFileLoader;
logs = FOREACH logs GENERATE
					$0,
					FLATTEN(STRSPLIT ($1, '\t'));

opens = FILTER logs BY $3 == 'open';
-----------------------------------------------------------------

gets me a syntax error:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Out of bound access. 
Trying to access non-existent column: 16. Schema {bytearray,bytearray} has 2 column(s).

which makes sense because if I do a :
grunt> describe logs;
logs: {bytearray,bytearray}

But... I KNOW that $3 exists because I have dumped that data during my debugging and the split / 
flatten are working as expected... how do I tell pig that there are more columns?
-- 
Jameson Lopp
Software Engineer
Bronto Software, Inc.

Re: syntax error trying to access dynamically created columns

Posted by Thejas Nair <th...@hortonworks.com>.
This has been fixed in pig 0.9 . Pig 0.9 should get released in few days.

You can also build it from svn -
svn co http://svn.apache.org/repos/asf/pig/branches/branch-0.9; cd 
branch-0.9; ant

-Thejas


On 7/18/11 12:31 PM, Jameson Lopp wrote:
> I'm loading sequence files, of which each row's 'value' is a tab
> delimited set of columns. I'm exploding the values out so that I can
> work with them separately, but pig's syntax parser is giving me a hard
> time.
>
> -----------------------------------------------------------------
> logs = LOAD '/data/2011-07-17/part-*' USING SequenceFileLoader;
> logs = FOREACH logs GENERATE
> $0,
> FLATTEN(STRSPLIT ($1, '\t'));
>
> opens = FILTER logs BY $3 == 'open';
> -----------------------------------------------------------------
>
> gets me a syntax error:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
> parsing. Out of bound access. Trying to access non-existent column: 16.
> Schema {bytearray,bytearray} has 2 column(s).
>
> which makes sense because if I do a :
> grunt> describe logs;
> logs: {bytearray,bytearray}
>
> But... I KNOW that $3 exists because I have dumped that data during my
> debugging and the split / flatten are working as expected... how do I
> tell pig that there are more columns?