You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Elias Levy <fe...@gmail.com> on 2018/02/22 20:52:28 UTC

Re: NoClassDefFoundError of a Avro class after cancel then resubmit the same job

I am currently suffering through similar issues.

Had a job running happily, but when it the cluster tried to restarted it
would not find the JSON serializer in it. The job kept trying to restart in
a loop.

Just today I was running a job I built locally.  The job ran fine.  I added
two commits and rebuilt the jar.  Now the job dies when it tries to start
claiming it can't find the time assigner class.  I've unzipped the job jar,
both locally and in the TM blob directory and have confirmed the class is
in it.

This is the backtrace:

java.lang.ClassNotFoundException: com.foo.flink.common.util.TimeAssigner
	at java.net.URLClassLoader.findClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Unknown Source)
	at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73)
	at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
	at java.io.ObjectInputStream.readClassDesc(Unknown Source)
	at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
	at java.io.ObjectInputStream.readObject0(Unknown Source)
	at java.io.ObjectInputStream.readObject(Unknown Source)
	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393)
	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380)
	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368)
	at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
	at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:542)
	at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.<init>(AbstractFetcher.java:167)
	at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.<init>(Kafka09Fetcher.java:89)
	at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.<init>(Kafka010Fetcher.java:62)
	at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203)
	at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
	at java.lang.Thread.run(Unknown Source)


On Tue, Jan 23, 2018 at 7:51 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> We changed a few things between 1.3 and 1.4 concerning Avro. One of the
> main things is that Avro is no longer part of the core Flink class library,
> but needs to be packaged into your application jar file.
>
> The class loading / caching issues of 1.3 with respect to Avro should
> disappear in Flink 1.4, because Avro classes and caches are scoped to the
> job classloaders, so the caches do not go across different jobs, or even
> different operators.
>
>
> *Please check: Make sure you have Avro as a dependency in your jar file
> (in scope "compile").*
>
> Hope that solves the issue.
>
> Stephan
>
>
> On Mon, Jan 22, 2018 at 2:31 PM, Edward <eg...@hotmail.com> wrote:
>
>> Yes, we've seen this issue as well, though it usually takes many more
>> resubmits before the error pops up. Interestingly, of the 7 jobs we run
>> (all
>> of which use different Avro schemas), we only see this issue on 1 of them.
>> Once the NoClassDefFoundError crops up though, it is necessary to recreate
>> the task managers.
>>
>> There's a whole page on the Flink documentation on debugging classloading,
>> and Avro is mentioned several times on that page:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>> monitoring/debugging_classloading.html
>>
>> It seems that (in 1.3 at least) each submitted job has its own
>> classloader,
>> and its own instance of the Avro class definitions. However, the Avro
>> class
>> cache will keep references to the Avro classes from classloaders for the
>> previous cancelled jobs. That said, we haven't been able to find a
>> solution
>> to this error yet. Flink 1.4 would be worth a try because the of the
>> changes
>> to the default classloading behaviour (child-first is the new default, not
>> parent-first).
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/
>>
>
>

Re: NoClassDefFoundError of a Avro class after cancel then resubmit the same job

Posted by Aljoscha Krettek <al...@apache.org>.
@Elias This is a know issue that will be fixed in 1.4.2 which we will do very quickly just because of this bug: https://issues.apache.org/jira/browse/FLINK-8741 <https://issues.apache.org/jira/browse/FLINK-8741>.

> On 23. Feb 2018, at 05:53, Elias Levy <fe...@gmail.com> wrote:
> 
> Something seems to be off with the user code class loader.  The only way I can get my job to start is if I drop the job into the lib folder in the JM and configure the JM's classloader.resolve-order to parent-first.
> 
> Suggestions?
> 
> On Thu, Feb 22, 2018 at 12:52 PM, Elias Levy <fearsome.lucidity@gmail.com <ma...@gmail.com>> wrote:
> I am currently suffering through similar issues.  
> 
> Had a job running happily, but when it the cluster tried to restarted it would not find the JSON serializer in it. The job kept trying to restart in a loop.
> 
> Just today I was running a job I built locally.  The job ran fine.  I added two commits and rebuilt the jar.  Now the job dies when it tries to start claiming it can't find the time assigner class.  I've unzipped the job jar, both locally and in the TM blob directory and have confirmed the class is in it.
> 
> This is the backtrace:
> 
> java.lang.ClassNotFoundException: com.foo.flink.common.util.TimeAssigner
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at java.lang.Class.forName0(Native Method)
> 	at java.lang.Class.forName(Unknown Source)
> 	at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73)
> 	at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
> 	at java.io.ObjectInputStream.readClassDesc(Unknown Source)
> 	at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
> 	at java.io.ObjectInputStream.readObject0(Unknown Source)
> 	at java.io.ObjectInputStream.readObject(Unknown Source)
> 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393)
> 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380)
> 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368)
> 	at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
> 	at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:542)
> 	at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.<init>(AbstractFetcher.java:167)
> 	at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.<init>(Kafka09Fetcher.java:89)
> 	at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.<init>(Kafka010Fetcher.java:62)
> 	at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203)
> 	at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564)
> 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86)
> 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
> 	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
> 	at java.lang.Thread.run(Unknown Source)
> 
> On Tue, Jan 23, 2018 at 7:51 AM, Stephan Ewen <sewen@apache.org <ma...@apache.org>> wrote:
> Hi!
> 
> We changed a few things between 1.3 and 1.4 concerning Avro. One of the main things is that Avro is no longer part of the core Flink class library, but needs to be packaged into your application jar file.
> 
> The class loading / caching issues of 1.3 with respect to Avro should disappear in Flink 1.4, because Avro classes and caches are scoped to the job classloaders, so the caches do not go across different jobs, or even different operators.
> 
> 
> Please check: Make sure you have Avro as a dependency in your jar file (in scope "compile").
> 
> Hope that solves the issue.
> 
> Stephan
> 
> 
> On Mon, Jan 22, 2018 at 2:31 PM, Edward <egbuck@hotmail.com <ma...@hotmail.com>> wrote:
> Yes, we've seen this issue as well, though it usually takes many more
> resubmits before the error pops up. Interestingly, of the 7 jobs we run (all
> of which use different Avro schemas), we only see this issue on 1 of them.
> Once the NoClassDefFoundError crops up though, it is necessary to recreate
> the task managers.
> 
> There's a whole page on the Flink documentation on debugging classloading,
> and Avro is mentioned several times on that page:
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/debugging_classloading.html <https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/debugging_classloading.html>
> 
> It seems that (in 1.3 at least) each submitted job has its own classloader,
> and its own instance of the Avro class definitions. However, the Avro class
> cache will keep references to the Avro classes from classloaders for the
> previous cancelled jobs. That said, we haven't been able to find a solution
> to this error yet. Flink 1.4 would be worth a try because the of the changes
> to the default classloading behaviour (child-first is the new default, not
> parent-first).
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/>
> 
> 
> 


Re: NoClassDefFoundError of a Avro class after cancel then resubmit the same job

Posted by Elias Levy <fe...@gmail.com>.
Something seems to be off with the user code class loader.  The only way I
can get my job to start is if I drop the job into the lib folder in the JM
and configure the JM's classloader.resolve-order to parent-first.

Suggestions?

On Thu, Feb 22, 2018 at 12:52 PM, Elias Levy <fe...@gmail.com>
wrote:

> I am currently suffering through similar issues.
>
> Had a job running happily, but when it the cluster tried to restarted it
> would not find the JSON serializer in it. The job kept trying to restart in
> a loop.
>
> Just today I was running a job I built locally.  The job ran fine.  I
> added two commits and rebuilt the jar.  Now the job dies when it tries to
> start claiming it can't find the time assigner class.  I've unzipped the
> job jar, both locally and in the TM blob directory and have confirmed the
> class is in it.
>
> This is the backtrace:
>
> java.lang.ClassNotFoundException: com.foo.flink.common.util.TimeAssigner
> 	at java.net.URLClassLoader.findClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at java.lang.Class.forName0(Native Method)
> 	at java.lang.Class.forName(Unknown Source)
> 	at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73)
> 	at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
> 	at java.io.ObjectInputStream.readClassDesc(Unknown Source)
> 	at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
> 	at java.io.ObjectInputStream.readObject0(Unknown Source)
> 	at java.io.ObjectInputStream.readObject(Unknown Source)
> 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393)
> 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380)
> 	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368)
> 	at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
> 	at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:542)
> 	at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.<init>(AbstractFetcher.java:167)
> 	at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.<init>(Kafka09Fetcher.java:89)
> 	at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.<init>(Kafka010Fetcher.java:62)
> 	at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203)
> 	at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564)
> 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86)
> 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
> 	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
> 	at java.lang.Thread.run(Unknown Source)
>
>
> On Tue, Jan 23, 2018 at 7:51 AM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi!
>>
>> We changed a few things between 1.3 and 1.4 concerning Avro. One of the
>> main things is that Avro is no longer part of the core Flink class library,
>> but needs to be packaged into your application jar file.
>>
>> The class loading / caching issues of 1.3 with respect to Avro should
>> disappear in Flink 1.4, because Avro classes and caches are scoped to the
>> job classloaders, so the caches do not go across different jobs, or even
>> different operators.
>>
>>
>> *Please check: Make sure you have Avro as a dependency in your jar file
>> (in scope "compile").*
>>
>> Hope that solves the issue.
>>
>> Stephan
>>
>>
>> On Mon, Jan 22, 2018 at 2:31 PM, Edward <eg...@hotmail.com> wrote:
>>
>>> Yes, we've seen this issue as well, though it usually takes many more
>>> resubmits before the error pops up. Interestingly, of the 7 jobs we run
>>> (all
>>> of which use different Avro schemas), we only see this issue on 1 of
>>> them.
>>> Once the NoClassDefFoundError crops up though, it is necessary to
>>> recreate
>>> the task managers.
>>>
>>> There's a whole page on the Flink documentation on debugging
>>> classloading,
>>> and Avro is mentioned several times on that page:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/
>>> monitoring/debugging_classloading.html
>>>
>>> It seems that (in 1.3 at least) each submitted job has its own
>>> classloader,
>>> and its own instance of the Avro class definitions. However, the Avro
>>> class
>>> cache will keep references to the Avro classes from classloaders for the
>>> previous cancelled jobs. That said, we haven't been able to find a
>>> solution
>>> to this error yet. Flink 1.4 would be worth a try because the of the
>>> changes
>>> to the default classloading behaviour (child-first is the new default,
>>> not
>>> parent-first).
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>>> ble.com/
>>>
>>
>>
>