You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mix Nin <pi...@gmail.com> on 2013/07/16 20:04:52 UTC

header of a tuple/bag

Hi,

I am trying query a data set on HDFS using PIG.

Data = LOAD '/user/xx/20130523/*;
x = FOREACH Data GENERATE cookie_id;

I get below error.

<line 2, column 26> Invalid field projection. Projected field [cookie_id]
does not exist

How do i find the column names in the bag "Data" .  The developer who
created the file says, it is coookie_id.
Is there any way I could get schema/header for this?


Thanks

Re: header of a tuple/bag

Posted by 耿龙 <by...@gmail.com>.
hi, you got the error because you did not arrange name to the fields, pig
did not know which column is called cookie_id. you'd better do like below:

Data = LOAD '/user/xx/20130523/*';

x = FOREACH Data GENERATE (chararray) $i AS cookie_id;

i is the index of the cookie_id in the tuple counting from zero

or you can arrange a schema before you use the data, like

Data = LOAD '/user/xx/20130523/*' AS (cookie_id:chararray);


2013/7/17 Mix Nin <pi...@gmail.com>

> Hi,
>
> I am trying query a data set on HDFS using PIG.
>
> Data = LOAD '/user/xx/20130523/*;
> x = FOREACH Data GENERATE cookie_id;
>
> I get below error.
>
> <line 2, column 26> Invalid field projection. Projected field [cookie_id]
> does not exist
>
> How do i find the column names in the bag "Data" .  The developer who
> created the file says, it is coookie_id.
> Is there any way I could get schema/header for this?
>
>
> Thanks
>
>
>

Re: header of a tuple/bag

Posted by 耿龙 <by...@gmail.com>.
hi, you got the error because you did not arrange name to the fields, pig
did not know which column is called cookie_id. you'd better do like below:

Data = LOAD '/user/xx/20130523/*';

x = FOREACH Data GENERATE (chararray) $i AS cookie_id;

i is the index of the cookie_id in the tuple counting from zero

or you can arrange a schema before you use the data, like

Data = LOAD '/user/xx/20130523/*' AS (cookie_id:chararray);


2013/7/17 Mix Nin <pi...@gmail.com>

> Hi,
>
> I am trying query a data set on HDFS using PIG.
>
> Data = LOAD '/user/xx/20130523/*;
> x = FOREACH Data GENERATE cookie_id;
>
> I get below error.
>
> <line 2, column 26> Invalid field projection. Projected field [cookie_id]
> does not exist
>
> How do i find the column names in the bag "Data" .  The developer who
> created the file says, it is coookie_id.
> Is there any way I could get schema/header for this?
>
>
> Thanks
>
>
>

Re: header of a tuple/bag

Posted by 耿龙 <by...@gmail.com>.
hi, you got the error because you did not arrange name to the fields, pig
did not know which column is called cookie_id. you'd better do like below:

Data = LOAD '/user/xx/20130523/*';

x = FOREACH Data GENERATE (chararray) $i AS cookie_id;

i is the index of the cookie_id in the tuple counting from zero

or you can arrange a schema before you use the data, like

Data = LOAD '/user/xx/20130523/*' AS (cookie_id:chararray);


2013/7/17 Mix Nin <pi...@gmail.com>

> Hi,
>
> I am trying query a data set on HDFS using PIG.
>
> Data = LOAD '/user/xx/20130523/*;
> x = FOREACH Data GENERATE cookie_id;
>
> I get below error.
>
> <line 2, column 26> Invalid field projection. Projected field [cookie_id]
> does not exist
>
> How do i find the column names in the bag "Data" .  The developer who
> created the file says, it is coookie_id.
> Is there any way I could get schema/header for this?
>
>
> Thanks
>
>
>

Re: header of a tuple/bag

Posted by 耿龙 <by...@gmail.com>.
hi, you got the error because you did not arrange name to the fields, pig
did not know which column is called cookie_id. you'd better do like below:

Data = LOAD '/user/xx/20130523/*';

x = FOREACH Data GENERATE (chararray) $i AS cookie_id;

i is the index of the cookie_id in the tuple counting from zero

or you can arrange a schema before you use the data, like

Data = LOAD '/user/xx/20130523/*' AS (cookie_id:chararray);


2013/7/17 Mix Nin <pi...@gmail.com>

> Hi,
>
> I am trying query a data set on HDFS using PIG.
>
> Data = LOAD '/user/xx/20130523/*;
> x = FOREACH Data GENERATE cookie_id;
>
> I get below error.
>
> <line 2, column 26> Invalid field projection. Projected field [cookie_id]
> does not exist
>
> How do i find the column names in the bag "Data" .  The developer who
> created the file says, it is coookie_id.
> Is there any way I could get schema/header for this?
>
>
> Thanks
>
>
>

Re: header of a tuple/bag

Posted by Pradeep Gollakota <pr...@gmail.com>.
It generally depends on what type of Storage mechanism is used. If it's
PigStorage() then this information is not encoded into the data.

Assuming that the storage is PigStorage() and that cookie_id is the first
field in the data, your load function should look as follows:

Data = LOAD '/user/xx/20130523/*' using PigStorage() as (cookie_id:
charray, ...);
x = FOREACH Data GENERATE cookie_id;

So, you not only have to define what Storage function to use, you (may)
also have to describe the schema when you load the data.


On Tue, Jul 16, 2013 at 2:04 PM, Mix Nin <pi...@gmail.com> wrote:

> Hi,
>
> I am trying query a data set on HDFS using PIG.
>
> Data = LOAD '/user/xx/20130523/*;
> x = FOREACH Data GENERATE cookie_id;
>
> I get below error.
>
> <line 2, column 26> Invalid field projection. Projected field [cookie_id]
> does not exist
>
> How do i find the column names in the bag "Data" .  The developer who
> created the file says, it is coookie_id.
> Is there any way I could get schema/header for this?
>
>
> Thanks
>