You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by rakesh kothari <rk...@hotmail.com> on 2010/10/21 08:28:52 UTC

Unexpected data type -1 found in stream.

My PIG script that is roughly like this:

A = LOAD input1 USING JsonLoader AS (x:map[]);
B =  LOAD input2 USING JsonLoader AS (x:map[]);

A = FOREACH A GENERATE x, x#'item' AS item:chararray;
B = FOREACH B GENERATE x, x#'item' AS item:chararray;

U = UNION A, B;

DUMP U;


This leads to the following exception:

java.lang.RuntimeException: Unexpected data type -1 found in stream.
        at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:306)
        at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:220)
        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:269)
        at org.apache.pig.impl.io.BinStorageRecordWriter.write(BinStorageRecordWriter.java:69)
        at org.apache.pig.builtin.BinStorage.putNext(BinStorage.java:102)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498)
        at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:234)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Any ideas ?

I am able to dump A and B. 

-Rakesh


 		 	   		  

RE: Unexpected data type -1 found in stream.

Posted by rakesh kothari <rk...@hotmail.com>.
Actually I figured out the issue. There were fields with null in my json and those fields were being serialized to  "org.json.JSONObject.Null" objects and hence PIG was not able to map it to any valid type.

-Rakesh


> From: rkothari_iit@hotmail.com
> To: user@pig.apache.org
> Subject: RE: Unexpected data type -1 found in stream.
> Date: Thu, 21 Oct 2010 11:27:24 -0700
> 
> 
> I am using Pig 0.7. No Luck even after removing explicit cast.
> 
> PIG is not able to determine the type of the elements of the map and failing. I am able to DUMP A and B in isolation. It's the union that's not working.
> 
> DESCRIBE U results in:
> 
> {x: map[ ],item: chararray}
> 
> -Rakesh
> 
> > From: rekhajos@yahoo-inc.com
> > To: user@pig.apache.org; pig-user@hadoop.apache.org
> > Date: Thu, 21 Oct 2010 14:19:36 +0530
> > Subject: Re: Unexpected data type -1 found in stream.
> > 
> > Hi Rakesh,
> > 
> > There was some known concern with explicit cast not working when data is complex type (eg: bags). Check PIG-616. It is marked resolved now.
> > As a confirmatory step, you can try removing the explicit cast of chararray and check?
> > 
> > Thanks & Regards,
> > /Rekha.
> > 
> > On 10/21/10 11:58 AM, "rakesh kothari" <rk...@hotmail.com> wrote:
> > 
> > 
> My PIG script that is roughly like this:
>  
> A = LOAD input1 USING JsonLoader AS (x:map[]);
> B =  LOAD input2 USING JsonLoader AS (x:map[]);
>  
> A = FOREACH A GENERATE x, (chararray) x#'item' AS item:chararray;
> B = FOREACH B GENERATE x, (chararray) x#'item' AS item:chararray;
>  
> U = UNION A, B;
>  
> DUMP U;
>  
>  
> This leads to the following exception:
>  
> java.lang.RuntimeException: Unexpected data type -1 found in stream.
>         at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:306)
>         at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:220)
>         at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:269)
>         at org.apache.pig.impl.io.BinStorageRecordWriter.write(BinStorageRecordWriter.java:69)
>         at org.apache.pig.builtin.BinStorage.putNext(BinStorage.java:102)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
>         at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498)
>         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:234)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>  
> Any ideas ?
>  
> I am able to dump A and B. 
>  
> -Rakesh
>  		 	   		  
 		 	   		  

RE: Unexpected data type -1 found in stream.

Posted by rakesh kothari <rk...@hotmail.com>.
I am using Pig 0.7. No Luck even after removing explicit cast.

PIG is not able to determine the type of the elements of the map and failing. I am able to DUMP A and B in isolation. It's the union that's not working.

DESCRIBE U results in:

{x: map[ ],item: chararray}

-Rakesh

> From: rekhajos@yahoo-inc.com
> To: user@pig.apache.org; pig-user@hadoop.apache.org
> Date: Thu, 21 Oct 2010 14:19:36 +0530
> Subject: Re: Unexpected data type -1 found in stream.
> 
> Hi Rakesh,
> 
> There was some known concern with explicit cast not working when data is complex type (eg: bags). Check PIG-616. It is marked resolved now.
> As a confirmatory step, you can try removing the explicit cast of chararray and check?
> 
> Thanks & Regards,
> /Rekha.
> 
> On 10/21/10 11:58 AM, "rakesh kothari" <rk...@hotmail.com> wrote:
> 
> 
My PIG script that is roughly like this:
 
A = LOAD input1 USING JsonLoader AS (x:map[]);
B =  LOAD input2 USING JsonLoader AS (x:map[]);
 
A = FOREACH A GENERATE x, (chararray) x#'item' AS item:chararray;
B = FOREACH B GENERATE x, (chararray) x#'item' AS item:chararray;
 
U = UNION A, B;
 
DUMP U;
 
 
This leads to the following exception:
 
java.lang.RuntimeException: Unexpected data type -1 found in stream.
        at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:306)
        at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:220)
        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:269)
        at org.apache.pig.impl.io.BinStorageRecordWriter.write(BinStorageRecordWriter.java:69)
        at org.apache.pig.builtin.BinStorage.putNext(BinStorage.java:102)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498)
        at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:234)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 
Any ideas ?
 
I am able to dump A and B. 
 
-Rakesh
 		 	   		  

Re: Displaying source log file names in pig logs

Posted by Guy Bayes <fa...@gmail.com>.
I'm pretty sure they are suppose to be on the Input split of the tasktracker
logs aren't they?

For some reason all the Input-Slits are null

Input-split file: null
Input-split start-offset: -1
Input-split length: -1

thanks
Guy

On Mon, Oct 25, 2010 at 9:02 AM, Romain Rigaux <ro...@gmail.com>wrote:

> Hi,thanks
>


>
> I don't think that filenames are directly available but I do something like
> this in order to get them (I did not try with Pig 0.7+ yet):
>
> Create a new loader inheriting from PigStorage and get the "location" path
> of the data. Then either:
>
>   - print it if everything hasupposeppens in the same task
>   - append it in each records
>
> Hope this helps,
>
> Romain
>
> On Thu, Oct 21, 2010 at 9:57 AM, Guy Bayes <fa...@gmail.com> wrote:
>
> > We have a job that processes several hundred files in a directory
> >
> > We generally glob the directory in a single load statement
> >
> > Sometimes the jobs chokes on a bad row in a single file
> >
> > I could have sworn that pig printed the file name of the chunks it is
> > processing in the task log but cannot see it
> >
> > Does anyone know under what conditions file names are printed, or how to
> > find the file that is causing the issues?
> >
> > Thanks
> > Guy
> > >
> >
>



-- 
you may be acquainted with the night
but i have seen the darkness in the day
and you must know it is a terrifying sight...

Re: Displaying source log file names in pig logs

Posted by Romain Rigaux <ro...@gmail.com>.
Hi,

I don't think that filenames are directly available but I do something like
this in order to get them (I did not try with Pig 0.7+ yet):

Create a new loader inheriting from PigStorage and get the "location" path
of the data. Then either:

   - print it if everything happens in the same task
   - append it in each records

Hope this helps,

Romain

On Thu, Oct 21, 2010 at 9:57 AM, Guy Bayes <fa...@gmail.com> wrote:

> We have a job that processes several hundred files in a directory
>
> We generally glob the directory in a single load statement
>
> Sometimes the jobs chokes on a bad row in a single file
>
> I could have sworn that pig printed the file name of the chunks it is
> processing in the task log but cannot see it
>
> Does anyone know under what conditions file names are printed, or how to
> find the file that is causing the issues?
>
> Thanks
> Guy
> >
>

Displaying source log file names in pig logs

Posted by Guy Bayes <fa...@gmail.com>.
We have a job that processes several hundred files in a directory

We generally glob the directory in a single load statement

Sometimes the jobs chokes on a bad row in a single file

I could have sworn that pig printed the file name of the chunks it is processing in the task log but cannot see it

Does anyone know under what conditions file names are printed, or how to find the file that is causing the issues?

Thanks
Guy
> 

Re: Unexpected data type -1 found in stream.

Posted by Rekha Joshi <re...@yahoo-inc.com>.
Hi Rakesh,

There was some known concern with explicit cast not working when data is complex type (eg: bags). Check PIG-616. It is marked resolved now.
As a confirmatory step, you can try removing the explicit cast of chararray and check?

Thanks & Regards,
/Rekha.

On 10/21/10 11:58 AM, "rakesh kothari" <rk...@hotmail.com> wrote: