You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sandopolus <sa...@gmail.com> on 2012/01/26 16:31:02 UTC

Load PigStorage with Schema Issues

Hi there

I am trying to load in some data using the PigStorage with a schema. But i
can't seem to get the schema right and was hoping someone could point out
my mistake.

Here is the data being loaded in:
2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,{(customer,27651a7d-0871-49a6-8df4-90305f7e840b),(customerClient,b57f9d15-6de7-486b-9761-46246be4abfe),(clientBuild,7376807c-7448-4785-8e2c-49814f6ce2f9),(country,FR)}

Commands used:
A = LOAD 'testdata.txt' USING PigStorage(',') as (key:chararray,
columns:bag {column:tuple (name:chararray, value:chararray)});
DUMP A;

This results in the following warning and output:
2012-01-26 15:27:51,860 [main] WARN
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1 time(s).

(2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,)

>From the output it doesn't seem to be picking up bag structure, but if i
remove the schema it will dump the data out correctly.
Any help would be much appreciated.

Ta

Sandy

Re: Load PigStorage with Schema Issues

Posted by Thejas Nair <th...@hortonworks.com>.
That is a problem with using "," as the field delimiter.
PigStorage ends up splitting the whole record by the delimiter and the 
second field is also getting split.
If you use some other delimiter for your data (eg,tab or ^A), it should 
work fine.


Thanks,
Thejas

On 1/26/12 7:31 AM, Sandopolus wrote:
> Hi there
>
> I am trying to load in some data using the PigStorage with a schema. But i
> can't seem to get the schema right and was hoping someone could point out
> my mistake.
>
> Here is the data being loaded in:
> 2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,{(customer,27651a7d-0871-49a6-8df4-90305f7e840b),(customerClient,b57f9d15-6de7-486b-9761-46246be4abfe),(clientBuild,7376807c-7448-4785-8e2c-49814f6ce2f9),(country,FR)}
>
> Commands used:
> A = LOAD 'testdata.txt' USING PigStorage(',') as (key:chararray,
> columns:bag {column:tuple (name:chararray, value:chararray)});
> DUMP A;
>
> This results in the following warning and output:
> 2012-01-26 15:27:51,860 [main] WARN
>   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1 time(s).
>
> (2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,)
>
>  From the output it doesn't seem to be picking up bag structure, but if i
> remove the schema it will dump the data out correctly.
> Any help would be much appreciated.
>
> Ta
>
> Sandy
>