You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sandopolus <sa...@gmail.com> on 2012/01/26 16:31:02 UTC
Load PigStorage with Schema Issues
Hi there
I am trying to load in some data using the PigStorage with a schema. But i
can't seem to get the schema right and was hoping someone could point out
my mistake.
Here is the data being loaded in:
2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,{(customer,27651a7d-0871-49a6-8df4-90305f7e840b),(customerClient,b57f9d15-6de7-486b-9761-46246be4abfe),(clientBuild,7376807c-7448-4785-8e2c-49814f6ce2f9),(country,FR)}
Commands used:
A = LOAD 'testdata.txt' USING PigStorage(',') as (key:chararray,
columns:bag {column:tuple (name:chararray, value:chararray)});
DUMP A;
This results in the following warning and output:
2012-01-26 15:27:51,860 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1 time(s).
(2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,)
>From the output it doesn't seem to be picking up bag structure, but if i
remove the schema it will dump the data out correctly.
Any help would be much appreciated.
Ta
Sandy
Re: Load PigStorage with Schema Issues
Posted by Thejas Nair <th...@hortonworks.com>.
That is a problem with using "," as the field delimiter.
PigStorage ends up splitting the whole record by the delimiter and the
second field is also getting split.
If you use some other delimiter for your data (eg,tab or ^A), it should
work fine.
Thanks,
Thejas
On 1/26/12 7:31 AM, Sandopolus wrote:
> Hi there
>
> I am trying to load in some data using the PigStorage with a schema. But i
> can't seem to get the schema right and was hoping someone could point out
> my mistake.
>
> Here is the data being loaded in:
> 2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,{(customer,27651a7d-0871-49a6-8df4-90305f7e840b),(customerClient,b57f9d15-6de7-486b-9761-46246be4abfe),(clientBuild,7376807c-7448-4785-8e2c-49814f6ce2f9),(country,FR)}
>
> Commands used:
> A = LOAD 'testdata.txt' USING PigStorage(',') as (key:chararray,
> columns:bag {column:tuple (name:chararray, value:chararray)});
> DUMP A;
>
> This results in the following warning and output:
> 2012-01-26 15:27:51,860 [main] WARN
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1 time(s).
>
> (2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,)
>
> From the output it doesn't seem to be picking up bag structure, but if i
> remove the schema it will dump the data out correctly.
> Any help would be much appreciated.
>
> Ta
>
> Sandy
>