You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "Newport, Billy" <Bi...@gs.com> on 2017/01/11 20:16:27 UTC

Received an event in channel 0 while still having data from a record

Anyone seen this before:

Caused by: java.io.IOException: Received an event in channel 0 while still having data from a record. This indicates broken serialization logic. If you are using custom serialization code (Writable or Value types), check their serialization routines. In the case of Kryo, check the respective Kryo serializer.
     at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:98)
     at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:42)
     at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
     at org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:190)
     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642)
     at java.lang.Thread.run(Thread.java:745)


We're on 1.1.4 right now. We're reading parquet file using code like this:

           AvroParquetInputFormat<GenericRecord> inputFormat = new AvroParquetInputFormat<GenericRecord>();
           AvroParquetInputFormat.setAvroReadSchema(job, getMergeSchema(storeName, datasetName));

           // Get patch of input parquet file
           DatasetHdfsInfo info = getLastCompleteMergedDatasetHDFSInfo(storeName, datasetName);

           Path path = new Path(info.getRootDir());

           DataSet<Tuple2<Void, GenericRecord>> d = getExecutionEnvironment().readHadoopFile(inputFormat, Void.class, GenericRecord.class, path.toString(), job);

Re: Received an event in channel 0 while still having data from a record

Posted by Aljoscha Krettek <al...@apache.org>.

Hi Billy,
the stack trace seems to indicate that there is a problem at the point
where the data sink is trying to read the input elements so it doesn't seem
to be related to the source. Could you also post what sinks you have and
what the type of the input elements of these sinks are?

Cheers,
Aljoscha

On Thu, 12 Jan 2017 at 04:45 M. Dale <me...@yahoo.com> wrote:

> How were the Parquet files you are trying to read generated? Same version
> of libraries? I am successfully using the following Scala code to read
> Parquet files using the HadoopInputFormat wrapper. Maybe try that in Java?
>
> val hadoopInputFormat =
>   new HadoopInputFormat[Void, GenericRecord](new AvroParquetInputFormat, classOf[Void], classOf[GenericRecord], job)
>
> AvroParquetInputFormat.setAvroReadSchema(job, EventOnlyRecord.getClassSchema)
> //APIF extends ParquetInputFormat which extends FileInputFormat (FIP)//addInputPath is a static method on FIP.val inputPath = new Path(input)
> FileInputFormat.addInputPath(job, inputPath)
> val rawEvents: DataSet[(Void, GenericRecord)] = env.createInput(hadoopInputFormat)
>
> On 01/11/2017 03:16 PM, Newport, Billy wrote:
>
> Anyone seen this before:
>
>
>
> Caused by: *java.io.IOException*: Received an event in channel 0 while
> still having data from a record. This indicates broken serialization logic.
> If you are using custom serialization code (Writable or Value types), check
> their serialization routines. In the case of Kryo, check the respective
> Kryo serializer.
>
>      at
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(
> *AbstractRecordReader.java:98*)
>
>      at
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(
> *MutableRecordReader.java:42*)
>
>      at org.apache.flink.runtime.operators.util.ReaderIterator.next(
> *ReaderIterator.java:73*)
>
>      at org.apache.flink.runtime.operators.DataSinkTask.invoke(
> *DataSinkTask.java:190*)
>
>      at org.apache.flink.runtime.taskmanager.Task.run(*Task.java:642*)
>
>      at java.lang.Thread.run(*Thread.java:745*)
>
>
>
>
>
> We’re on 1.1.4 right now. We’re reading parquet file using code like this:
>
>
>
>            AvroParquetInputFormat<GenericRecord> inputFormat = *new*
> AvroParquetInputFormat<GenericRecord>();
>
>            AvroParquetInputFormat.*setAvroReadSchema*(job, getMergeSchema(
> storeName, datasetName));
>
>
>
>            // Get patch of input parquet file
>
>            DatasetHdfsInfo info = getLastCompleteMergedDatasetHDFSInfo(
> storeName, datasetName);
>
>
>
>            Path path = *new* Path(info.getRootDir());
>
>
>
>            DataSet<Tuple2<Void, GenericRecord>> d =
> getExecutionEnvironment().readHadoopFile(inputFormat, Void.*class*,
> GenericRecord.*class*, path.toString(), job);
>
>
>
>
>
>
>
>
>

Re: Received an event in channel 0 while still having data from a record

Posted by "M. Dale" <me...@yahoo.com>.

How were the Parquet files you are trying to read generated? Same 
version of libraries? I am successfully using the following Scala code 
to read Parquet files using the HadoopInputFormat wrapper. Maybe try 
that in Java?

val hadoopInputFormat =
   new HadoopInputFormat[Void, GenericRecord](new AvroParquetInputFormat,classOf[Void],classOf[GenericRecord], job)

AvroParquetInputFormat.setAvroReadSchema(job, EventOnlyRecord.getClassSchema)

//APIF extends ParquetInputFormat which extends FileInputFormat (FIP) 
//addInputPath is a static method on FIP. val inputPath =new Path(input)
FileInputFormat.addInputPath(job, inputPath)

val rawEvents: DataSet[(Void, GenericRecord)] = env.createInput(hadoopInputFormat)

On 01/11/2017 03:16 PM, Newport, Billy wrote:
>
> Anyone seen this before:
>
> Caused by: _java.io.IOException_: Received an event in channel 0 while 
> still having data from a record. This indicates broken serialization 
> logic. If you are using custom serialization code (Writable or Value 
> types), check their serialization routines. In the case of Kryo, check 
> the respective Kryo serializer.
>
> at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(_AbstractRecordReader.java:98_)
>
> at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(_MutableRecordReader.java:42_)
>
> at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(_ReaderIterator.java:73_)
>
> at 
> org.apache.flink.runtime.operators.DataSinkTask.invoke(_DataSinkTask.java:190_)
>
> at org.apache.flink.runtime.taskmanager.Task.run(_Task.java:642_)
>
> at java.lang.Thread.run(_Thread.java:745_)
>
> Were on 1.1.4 right now. Were reading parquet file using code like this:
>
> AvroParquetInputFormat<GenericRecord> inputFormat= 
> *new*AvroParquetInputFormat<GenericRecord>();
>
> AvroParquetInputFormat./setAvroReadSchema/(job, 
> getMergeSchema(storeName, datasetName));
>
>            // Get patch of input parquet file
>
> DatasetHdfsInfo info= getLastCompleteMergedDatasetHDFSInfo(storeName, 
> datasetName);
>
> Path path= *new*Path(info.getRootDir());
>
> DataSet<Tuple2<Void, GenericRecord>> d= 
> getExecutionEnvironment().readHadoopFile(inputFormat, Void.*class*, 
> GenericRecord.*class*, path.toString(), job);
>