You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by William Oberman <ob...@civicscience.com> on 2014/03/26 18:51:39 UTC

line feeds

I was debugging some warnings in a script I had:
FIELD_DISCARDED_TYPE_CONVERSION_FAILED
ACCESSING_NON_EXISTENT_FIELD

I got it down to basically these two lines:
--foo was stored using PigStorage
foo = LOAD '....' AS (key:chararray, value:map[chararray]);
STORE foo INTO '...';

The problem is some of the map values have line feeds (\n) in them, which I
think breaks the PigStorage mind in the load path.

Bug?  Or is it "user error" to allow map values with \n's in them.  I mean,
I agree it's weird.  But, I didn't expect Pig to have such trouble with
it...

will

Re: line feeds

Posted by William Oberman <ob...@civicscience.com>.
Thanks for the feedback!

I kind of figured the answer was "use a different load/store func", and
I'll just do that.  I half-posted this message as a warning to other people
to avoid PigStorage for all but the most simple data :-)

will


On Wed, Mar 26, 2014 at 2:37 PM, Cheolsoo Park <pi...@gmail.com> wrote:

> Hi Will,
>
> You're right that PigStorage doesn't handle \n. PigStorage is really a
> dummy reference implementation of Load/StoreFunc, so I'd not recommend to
> use it in production. In particular, when you have complex data structures
> and special characters in data, advanced file formats work far better. Will
> the built-in ParquetLoader<
> http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/builtin/ParquetLoader.html
> >/Storer
> or AvroStorage<
> http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/builtin/AvroStorage.html
> >work
> for you?
>
> Thanks,
> Cheolsoo
>
>
> On Wed, Mar 26, 2014 at 10:51 AM, William Oberman
> <ob...@civicscience.com>wrote:
>
> > I was debugging some warnings in a script I had:
> > FIELD_DISCARDED_TYPE_CONVERSION_FAILED
> > ACCESSING_NON_EXISTENT_FIELD
> >
> > I got it down to basically these two lines:
> > --foo was stored using PigStorage
> > foo = LOAD '....' AS (key:chararray, value:map[chararray]);
> > STORE foo INTO '...';
> >
> > The problem is some of the map values have line feeds (\n) in them,
> which I
> > think breaks the PigStorage mind in the load path.
> >
> > Bug?  Or is it "user error" to allow map values with \n's in them.  I
> mean,
> > I agree it's weird.  But, I didn't expect Pig to have such trouble with
> > it...
> >
> > will
> >
>

Re: line feeds

Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Will,

You're right that PigStorage doesn't handle \n. PigStorage is really a
dummy reference implementation of Load/StoreFunc, so I'd not recommend to
use it in production. In particular, when you have complex data structures
and special characters in data, advanced file formats work far better. Will
the built-in ParquetLoader<http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/builtin/ParquetLoader.html>/Storer
or AvroStorage<http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/builtin/AvroStorage.html>work
for you?

Thanks,
Cheolsoo


On Wed, Mar 26, 2014 at 10:51 AM, William Oberman
<ob...@civicscience.com>wrote:

> I was debugging some warnings in a script I had:
> FIELD_DISCARDED_TYPE_CONVERSION_FAILED
> ACCESSING_NON_EXISTENT_FIELD
>
> I got it down to basically these two lines:
> --foo was stored using PigStorage
> foo = LOAD '....' AS (key:chararray, value:map[chararray]);
> STORE foo INTO '...';
>
> The problem is some of the map values have line feeds (\n) in them, which I
> think breaks the PigStorage mind in the load path.
>
> Bug?  Or is it "user error" to allow map values with \n's in them.  I mean,
> I agree it's weird.  But, I didn't expect Pig to have such trouble with
> it...
>
> will
>