You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Yingjie Cao <ke...@gmail.com> on 2019/12/17 07:43:27 UTC

Potential side-effect of connector code to JM/TM

Hi community,

  After running tpc-ds test suite for several days on a session cluster, we
found a resource leak problem of OrcInputFormat which was reported in
FLINK-15239. The problem comes from the dependent third party library which
creates new internal thread (pool) and never release it. As a result, the
user class loader which is referenced by these threads will never be
garbage collected as well as other classes loaded by the user class loader,
which finally lead to the continually grow of meta space size for JM (AM)
whose meta space size is not limited currently. And for TM whose meta space
size is limited, it will result in meta space oom eventually. I am not sure
if any other connectors/input formats incurs the similar problem.
  In general, it is hard for Flink to restrict the behavior of the third
party dependencies, especially the dependencies of connectors. However, it
will be better if we can supply some mechanism like stronger isolation or
some test facilities to find potential problems, for example, we can run
jobs on a cluster and automatically check something like whether user class
loader can be garbage collected, whether there is thread leak, whether some
shutdown hooks have been registered and so on.
  What do you think? Or should we treat it as a problem?

Best,
Yingjie

Re: Potential side-effect of connector code to JM/TM

Posted by Jingsong Li <ji...@gmail.com>.
Thanks Yingjie for driving.

It is very useful to have this check list.
I think we can list all problematic third-party libraries.
Including hadoop jar:
org.apache.hadoop.fs.FileSystem.StatisticsDataReferenceCleaner.

Because there are too many libraries with this problem. And our Yarn mode
perJob can alleviate this problem. So I think we are just suggesting. No
need to force user not writing these codes or using these third-party
libraries.

Best,
Jingsong Lee

On Wed, Dec 18, 2019 at 4:55 PM Yingjie Cao <ke...@gmail.com> wrote:

> I'd like to do that.
>
> Best,
> Yingjie
>
> Till Rohrmann <tr...@apache.org> 于2019年12月18日周三 下午4:48写道:
>
> > I think we should add this check list to the coding guidelines and
> continue
> > extending it there. Do you wanna update the coding guidelines accordingly
> > Yingjie?
> >
> > Cheers,
> > Till
> >
> > On Wed, Dec 18, 2019 at 8:21 AM Yingjie Cao <ke...@gmail.com>
> > wrote:
> >
> > > Hi Till & Biao,
> > >
> > > Thanks for the reply.
> > >
> > > I agree that supplying some stress or stability tests can really help,
> > > except for the jvm resource leak mentioned above, there may be other
> type
> > > of resource leak like slot or network buffer leak. In addition, other
> > tests
> > > like triggering failover in various different ways, stressing the
> system
> > > with high parallelism and heavy load jobs and running jobs or
> triggering
> > > failover over and over again can also help. I think stress or stability
> > > tests is a big topic and resource leak checking can be a good start.
> > >
> > > As the start of resource leak checking, we may need to collect a check
> > list
> > > which can also help to troubleshoot resource leak problem manually.
> From
> > my
> > > previous experience, I can think of the following ones:
> > > 1. File#deleteOnExit hook leaks string of file path. Flink rest server
> > once
> > > suffered from the problem and it has been fixed currently.
> > > 2. Thread leak. OrcInputFormat suffers from this.
> > > 3. ApplicationShutDownHook reference user classes.
> > > 4. ClassLoader#parallelLockMap may leak because of too many generated
> > > classes. Flink also suffers from this problem and the issue is reported
> > in
> > > FLINK-15024 and need to be resolved.
> > > 5. Some other static fields (like caches implemented by map) of classes
> > > loaded by system class loader also have a potential of resource leak.
> > >
> > > Any other supplementation to this check list is welcomed. And even with
> > > this checklist, its may not trivial to do the check, dumping and
> > analysing
> > > the heap may be a choice. I will do some future survey about that.
> > >
> > > Best,
> > > Yingjie
> > >
> > > Biao Liu <mm...@gmail.com> 于2019年12月17日周二 下午9:02写道:
> > >
> > > > Hi Yingjie,
> > > >
> > > > Thanks for figuring out the impressive bug and bringing this
> > discussion.
> > > >
> > > > I'm afraid there is no such a silver bullet for isolation from
> > > third-party
> > > > library. However I agree that resource checking utils might help.
> > > > It seems that you and Till have already raised some feasible ideas.
> > > > Resource leaking issue looks like quite common. It would be great If
> > > > someone could share some experience. Will keep an eye on this
> > discussion.
> > > >
> > > > Thanks,
> > > > Biao /'bɪ.aʊ/
> > > >
> > > >
> > > >
> > > > On Tue, 17 Dec 2019 at 20:27, Till Rohrmann <tr...@apache.org>
> > > wrote:
> > > >
> > > > > Hi Yingjie,
> > > > >
> > > > > thanks for reporting this issue and starting this discussion. If we
> > are
> > > > > dealing with third party libraries I believe there is always the
> risk
> > > > that
> > > > > one overlooks closing resources. Ideally we make it as hard from
> > > Flink's
> > > > > perspective as possible but realistically it is hard to completely
> > > avoid.
> > > > > Hence, I believe that it would be beneficial to have some tooling
> > (e.g.
> > > > > stress tests) which could help to surface these kind of problems.
> > Maybe
> > > > one
> > > > > could automate it so that a dev only needs to provide a user jar
> and
> > > then
> > > > > this jar is being executed several times and the cluster is checked
> > for
> > > > > anomalies.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Dec 17, 2019 at 8:43 AM Yingjie Cao <
> kevin.yingjie@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi community,
> > > > > >
> > > > > >   After running tpc-ds test suite for several days on a session
> > > > cluster,
> > > > > we
> > > > > > found a resource leak problem of OrcInputFormat which was
> reported
> > in
> > > > > > FLINK-15239. The problem comes from the dependent third party
> > library
> > > > > which
> > > > > > creates new internal thread (pool) and never release it. As a
> > result,
> > > > the
> > > > > > user class loader which is referenced by these threads will never
> > be
> > > > > > garbage collected as well as other classes loaded by the user
> class
> > > > > loader,
> > > > > > which finally lead to the continually grow of meta space size for
> > JM
> > > > (AM)
> > > > > > whose meta space size is not limited currently. And for TM whose
> > meta
> > > > > space
> > > > > > size is limited, it will result in meta space oom eventually. I
> am
> > > not
> > > > > sure
> > > > > > if any other connectors/input formats incurs the similar problem.
> > > > > >   In general, it is hard for Flink to restrict the behavior of
> the
> > > > third
> > > > > > party dependencies, especially the dependencies of connectors.
> > > However,
> > > > > it
> > > > > > will be better if we can supply some mechanism like stronger
> > > isolation
> > > > or
> > > > > > some test facilities to find potential problems, for example, we
> > can
> > > > run
> > > > > > jobs on a cluster and automatically check something like whether
> > user
> > > > > class
> > > > > > loader can be garbage collected, whether there is thread leak,
> > > whether
> > > > > some
> > > > > > shutdown hooks have been registered and so on.
> > > > > >   What do you think? Or should we treat it as a problem?
> > > > > >
> > > > > > Best,
> > > > > > Yingjie
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Best, Jingsong Lee

Re: Potential side-effect of connector code to JM/TM

Posted by Yingjie Cao <ke...@gmail.com>.
I'd like to do that.

Best,
Yingjie

Till Rohrmann <tr...@apache.org> 于2019年12月18日周三 下午4:48写道:

> I think we should add this check list to the coding guidelines and continue
> extending it there. Do you wanna update the coding guidelines accordingly
> Yingjie?
>
> Cheers,
> Till
>
> On Wed, Dec 18, 2019 at 8:21 AM Yingjie Cao <ke...@gmail.com>
> wrote:
>
> > Hi Till & Biao,
> >
> > Thanks for the reply.
> >
> > I agree that supplying some stress or stability tests can really help,
> > except for the jvm resource leak mentioned above, there may be other type
> > of resource leak like slot or network buffer leak. In addition, other
> tests
> > like triggering failover in various different ways, stressing the system
> > with high parallelism and heavy load jobs and running jobs or triggering
> > failover over and over again can also help. I think stress or stability
> > tests is a big topic and resource leak checking can be a good start.
> >
> > As the start of resource leak checking, we may need to collect a check
> list
> > which can also help to troubleshoot resource leak problem manually. From
> my
> > previous experience, I can think of the following ones:
> > 1. File#deleteOnExit hook leaks string of file path. Flink rest server
> once
> > suffered from the problem and it has been fixed currently.
> > 2. Thread leak. OrcInputFormat suffers from this.
> > 3. ApplicationShutDownHook reference user classes.
> > 4. ClassLoader#parallelLockMap may leak because of too many generated
> > classes. Flink also suffers from this problem and the issue is reported
> in
> > FLINK-15024 and need to be resolved.
> > 5. Some other static fields (like caches implemented by map) of classes
> > loaded by system class loader also have a potential of resource leak.
> >
> > Any other supplementation to this check list is welcomed. And even with
> > this checklist, its may not trivial to do the check, dumping and
> analysing
> > the heap may be a choice. I will do some future survey about that.
> >
> > Best,
> > Yingjie
> >
> > Biao Liu <mm...@gmail.com> 于2019年12月17日周二 下午9:02写道:
> >
> > > Hi Yingjie,
> > >
> > > Thanks for figuring out the impressive bug and bringing this
> discussion.
> > >
> > > I'm afraid there is no such a silver bullet for isolation from
> > third-party
> > > library. However I agree that resource checking utils might help.
> > > It seems that you and Till have already raised some feasible ideas.
> > > Resource leaking issue looks like quite common. It would be great If
> > > someone could share some experience. Will keep an eye on this
> discussion.
> > >
> > > Thanks,
> > > Biao /'bɪ.aʊ/
> > >
> > >
> > >
> > > On Tue, 17 Dec 2019 at 20:27, Till Rohrmann <tr...@apache.org>
> > wrote:
> > >
> > > > Hi Yingjie,
> > > >
> > > > thanks for reporting this issue and starting this discussion. If we
> are
> > > > dealing with third party libraries I believe there is always the risk
> > > that
> > > > one overlooks closing resources. Ideally we make it as hard from
> > Flink's
> > > > perspective as possible but realistically it is hard to completely
> > avoid.
> > > > Hence, I believe that it would be beneficial to have some tooling
> (e.g.
> > > > stress tests) which could help to surface these kind of problems.
> Maybe
> > > one
> > > > could automate it so that a dev only needs to provide a user jar and
> > then
> > > > this jar is being executed several times and the cluster is checked
> for
> > > > anomalies.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Dec 17, 2019 at 8:43 AM Yingjie Cao <kevin.yingjie@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi community,
> > > > >
> > > > >   After running tpc-ds test suite for several days on a session
> > > cluster,
> > > > we
> > > > > found a resource leak problem of OrcInputFormat which was reported
> in
> > > > > FLINK-15239. The problem comes from the dependent third party
> library
> > > > which
> > > > > creates new internal thread (pool) and never release it. As a
> result,
> > > the
> > > > > user class loader which is referenced by these threads will never
> be
> > > > > garbage collected as well as other classes loaded by the user class
> > > > loader,
> > > > > which finally lead to the continually grow of meta space size for
> JM
> > > (AM)
> > > > > whose meta space size is not limited currently. And for TM whose
> meta
> > > > space
> > > > > size is limited, it will result in meta space oom eventually. I am
> > not
> > > > sure
> > > > > if any other connectors/input formats incurs the similar problem.
> > > > >   In general, it is hard for Flink to restrict the behavior of the
> > > third
> > > > > party dependencies, especially the dependencies of connectors.
> > However,
> > > > it
> > > > > will be better if we can supply some mechanism like stronger
> > isolation
> > > or
> > > > > some test facilities to find potential problems, for example, we
> can
> > > run
> > > > > jobs on a cluster and automatically check something like whether
> user
> > > > class
> > > > > loader can be garbage collected, whether there is thread leak,
> > whether
> > > > some
> > > > > shutdown hooks have been registered and so on.
> > > > >   What do you think? Or should we treat it as a problem?
> > > > >
> > > > > Best,
> > > > > Yingjie
> > > > >
> > > >
> > >
> >
>

Re: Potential side-effect of connector code to JM/TM

Posted by Till Rohrmann <tr...@apache.org>.
I think we should add this check list to the coding guidelines and continue
extending it there. Do you wanna update the coding guidelines accordingly
Yingjie?

Cheers,
Till

On Wed, Dec 18, 2019 at 8:21 AM Yingjie Cao <ke...@gmail.com> wrote:

> Hi Till & Biao,
>
> Thanks for the reply.
>
> I agree that supplying some stress or stability tests can really help,
> except for the jvm resource leak mentioned above, there may be other type
> of resource leak like slot or network buffer leak. In addition, other tests
> like triggering failover in various different ways, stressing the system
> with high parallelism and heavy load jobs and running jobs or triggering
> failover over and over again can also help. I think stress or stability
> tests is a big topic and resource leak checking can be a good start.
>
> As the start of resource leak checking, we may need to collect a check list
> which can also help to troubleshoot resource leak problem manually. From my
> previous experience, I can think of the following ones:
> 1. File#deleteOnExit hook leaks string of file path. Flink rest server once
> suffered from the problem and it has been fixed currently.
> 2. Thread leak. OrcInputFormat suffers from this.
> 3. ApplicationShutDownHook reference user classes.
> 4. ClassLoader#parallelLockMap may leak because of too many generated
> classes. Flink also suffers from this problem and the issue is reported in
> FLINK-15024 and need to be resolved.
> 5. Some other static fields (like caches implemented by map) of classes
> loaded by system class loader also have a potential of resource leak.
>
> Any other supplementation to this check list is welcomed. And even with
> this checklist, its may not trivial to do the check, dumping and analysing
> the heap may be a choice. I will do some future survey about that.
>
> Best,
> Yingjie
>
> Biao Liu <mm...@gmail.com> 于2019年12月17日周二 下午9:02写道:
>
> > Hi Yingjie,
> >
> > Thanks for figuring out the impressive bug and bringing this discussion.
> >
> > I'm afraid there is no such a silver bullet for isolation from
> third-party
> > library. However I agree that resource checking utils might help.
> > It seems that you and Till have already raised some feasible ideas.
> > Resource leaking issue looks like quite common. It would be great If
> > someone could share some experience. Will keep an eye on this discussion.
> >
> > Thanks,
> > Biao /'bɪ.aʊ/
> >
> >
> >
> > On Tue, 17 Dec 2019 at 20:27, Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > > Hi Yingjie,
> > >
> > > thanks for reporting this issue and starting this discussion. If we are
> > > dealing with third party libraries I believe there is always the risk
> > that
> > > one overlooks closing resources. Ideally we make it as hard from
> Flink's
> > > perspective as possible but realistically it is hard to completely
> avoid.
> > > Hence, I believe that it would be beneficial to have some tooling (e.g.
> > > stress tests) which could help to surface these kind of problems. Maybe
> > one
> > > could automate it so that a dev only needs to provide a user jar and
> then
> > > this jar is being executed several times and the cluster is checked for
> > > anomalies.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Dec 17, 2019 at 8:43 AM Yingjie Cao <ke...@gmail.com>
> > > wrote:
> > >
> > > > Hi community,
> > > >
> > > >   After running tpc-ds test suite for several days on a session
> > cluster,
> > > we
> > > > found a resource leak problem of OrcInputFormat which was reported in
> > > > FLINK-15239. The problem comes from the dependent third party library
> > > which
> > > > creates new internal thread (pool) and never release it. As a result,
> > the
> > > > user class loader which is referenced by these threads will never be
> > > > garbage collected as well as other classes loaded by the user class
> > > loader,
> > > > which finally lead to the continually grow of meta space size for JM
> > (AM)
> > > > whose meta space size is not limited currently. And for TM whose meta
> > > space
> > > > size is limited, it will result in meta space oom eventually. I am
> not
> > > sure
> > > > if any other connectors/input formats incurs the similar problem.
> > > >   In general, it is hard for Flink to restrict the behavior of the
> > third
> > > > party dependencies, especially the dependencies of connectors.
> However,
> > > it
> > > > will be better if we can supply some mechanism like stronger
> isolation
> > or
> > > > some test facilities to find potential problems, for example, we can
> > run
> > > > jobs on a cluster and automatically check something like whether user
> > > class
> > > > loader can be garbage collected, whether there is thread leak,
> whether
> > > some
> > > > shutdown hooks have been registered and so on.
> > > >   What do you think? Or should we treat it as a problem?
> > > >
> > > > Best,
> > > > Yingjie
> > > >
> > >
> >
>

Re: Potential side-effect of connector code to JM/TM

Posted by Yingjie Cao <ke...@gmail.com>.
Hi Till & Biao,

Thanks for the reply.

I agree that supplying some stress or stability tests can really help,
except for the jvm resource leak mentioned above, there may be other type
of resource leak like slot or network buffer leak. In addition, other tests
like triggering failover in various different ways, stressing the system
with high parallelism and heavy load jobs and running jobs or triggering
failover over and over again can also help. I think stress or stability
tests is a big topic and resource leak checking can be a good start.

As the start of resource leak checking, we may need to collect a check list
which can also help to troubleshoot resource leak problem manually. From my
previous experience, I can think of the following ones:
1. File#deleteOnExit hook leaks string of file path. Flink rest server once
suffered from the problem and it has been fixed currently.
2. Thread leak. OrcInputFormat suffers from this.
3. ApplicationShutDownHook reference user classes.
4. ClassLoader#parallelLockMap may leak because of too many generated
classes. Flink also suffers from this problem and the issue is reported in
FLINK-15024 and need to be resolved.
5. Some other static fields (like caches implemented by map) of classes
loaded by system class loader also have a potential of resource leak.

Any other supplementation to this check list is welcomed. And even with
this checklist, its may not trivial to do the check, dumping and analysing
the heap may be a choice. I will do some future survey about that.

Best,
Yingjie

Biao Liu <mm...@gmail.com> 于2019年12月17日周二 下午9:02写道:

> Hi Yingjie,
>
> Thanks for figuring out the impressive bug and bringing this discussion.
>
> I'm afraid there is no such a silver bullet for isolation from third-party
> library. However I agree that resource checking utils might help.
> It seems that you and Till have already raised some feasible ideas.
> Resource leaking issue looks like quite common. It would be great If
> someone could share some experience. Will keep an eye on this discussion.
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Tue, 17 Dec 2019 at 20:27, Till Rohrmann <tr...@apache.org> wrote:
>
> > Hi Yingjie,
> >
> > thanks for reporting this issue and starting this discussion. If we are
> > dealing with third party libraries I believe there is always the risk
> that
> > one overlooks closing resources. Ideally we make it as hard from Flink's
> > perspective as possible but realistically it is hard to completely avoid.
> > Hence, I believe that it would be beneficial to have some tooling (e.g.
> > stress tests) which could help to surface these kind of problems. Maybe
> one
> > could automate it so that a dev only needs to provide a user jar and then
> > this jar is being executed several times and the cluster is checked for
> > anomalies.
> >
> > Cheers,
> > Till
> >
> > On Tue, Dec 17, 2019 at 8:43 AM Yingjie Cao <ke...@gmail.com>
> > wrote:
> >
> > > Hi community,
> > >
> > >   After running tpc-ds test suite for several days on a session
> cluster,
> > we
> > > found a resource leak problem of OrcInputFormat which was reported in
> > > FLINK-15239. The problem comes from the dependent third party library
> > which
> > > creates new internal thread (pool) and never release it. As a result,
> the
> > > user class loader which is referenced by these threads will never be
> > > garbage collected as well as other classes loaded by the user class
> > loader,
> > > which finally lead to the continually grow of meta space size for JM
> (AM)
> > > whose meta space size is not limited currently. And for TM whose meta
> > space
> > > size is limited, it will result in meta space oom eventually. I am not
> > sure
> > > if any other connectors/input formats incurs the similar problem.
> > >   In general, it is hard for Flink to restrict the behavior of the
> third
> > > party dependencies, especially the dependencies of connectors. However,
> > it
> > > will be better if we can supply some mechanism like stronger isolation
> or
> > > some test facilities to find potential problems, for example, we can
> run
> > > jobs on a cluster and automatically check something like whether user
> > class
> > > loader can be garbage collected, whether there is thread leak, whether
> > some
> > > shutdown hooks have been registered and so on.
> > >   What do you think? Or should we treat it as a problem?
> > >
> > > Best,
> > > Yingjie
> > >
> >
>

Re: Potential side-effect of connector code to JM/TM

Posted by Biao Liu <mm...@gmail.com>.
Hi Yingjie,

Thanks for figuring out the impressive bug and bringing this discussion.

I'm afraid there is no such a silver bullet for isolation from third-party
library. However I agree that resource checking utils might help.
It seems that you and Till have already raised some feasible ideas.
Resource leaking issue looks like quite common. It would be great If
someone could share some experience. Will keep an eye on this discussion.

Thanks,
Biao /'bɪ.aʊ/



On Tue, 17 Dec 2019 at 20:27, Till Rohrmann <tr...@apache.org> wrote:

> Hi Yingjie,
>
> thanks for reporting this issue and starting this discussion. If we are
> dealing with third party libraries I believe there is always the risk that
> one overlooks closing resources. Ideally we make it as hard from Flink's
> perspective as possible but realistically it is hard to completely avoid.
> Hence, I believe that it would be beneficial to have some tooling (e.g.
> stress tests) which could help to surface these kind of problems. Maybe one
> could automate it so that a dev only needs to provide a user jar and then
> this jar is being executed several times and the cluster is checked for
> anomalies.
>
> Cheers,
> Till
>
> On Tue, Dec 17, 2019 at 8:43 AM Yingjie Cao <ke...@gmail.com>
> wrote:
>
> > Hi community,
> >
> >   After running tpc-ds test suite for several days on a session cluster,
> we
> > found a resource leak problem of OrcInputFormat which was reported in
> > FLINK-15239. The problem comes from the dependent third party library
> which
> > creates new internal thread (pool) and never release it. As a result, the
> > user class loader which is referenced by these threads will never be
> > garbage collected as well as other classes loaded by the user class
> loader,
> > which finally lead to the continually grow of meta space size for JM (AM)
> > whose meta space size is not limited currently. And for TM whose meta
> space
> > size is limited, it will result in meta space oom eventually. I am not
> sure
> > if any other connectors/input formats incurs the similar problem.
> >   In general, it is hard for Flink to restrict the behavior of the third
> > party dependencies, especially the dependencies of connectors. However,
> it
> > will be better if we can supply some mechanism like stronger isolation or
> > some test facilities to find potential problems, for example, we can run
> > jobs on a cluster and automatically check something like whether user
> class
> > loader can be garbage collected, whether there is thread leak, whether
> some
> > shutdown hooks have been registered and so on.
> >   What do you think? Or should we treat it as a problem?
> >
> > Best,
> > Yingjie
> >
>

Re: Potential side-effect of connector code to JM/TM

Posted by Till Rohrmann <tr...@apache.org>.
Hi Yingjie,

thanks for reporting this issue and starting this discussion. If we are
dealing with third party libraries I believe there is always the risk that
one overlooks closing resources. Ideally we make it as hard from Flink's
perspective as possible but realistically it is hard to completely avoid.
Hence, I believe that it would be beneficial to have some tooling (e.g.
stress tests) which could help to surface these kind of problems. Maybe one
could automate it so that a dev only needs to provide a user jar and then
this jar is being executed several times and the cluster is checked for
anomalies.

Cheers,
Till

On Tue, Dec 17, 2019 at 8:43 AM Yingjie Cao <ke...@gmail.com> wrote:

> Hi community,
>
>   After running tpc-ds test suite for several days on a session cluster, we
> found a resource leak problem of OrcInputFormat which was reported in
> FLINK-15239. The problem comes from the dependent third party library which
> creates new internal thread (pool) and never release it. As a result, the
> user class loader which is referenced by these threads will never be
> garbage collected as well as other classes loaded by the user class loader,
> which finally lead to the continually grow of meta space size for JM (AM)
> whose meta space size is not limited currently. And for TM whose meta space
> size is limited, it will result in meta space oom eventually. I am not sure
> if any other connectors/input formats incurs the similar problem.
>   In general, it is hard for Flink to restrict the behavior of the third
> party dependencies, especially the dependencies of connectors. However, it
> will be better if we can supply some mechanism like stronger isolation or
> some test facilities to find potential problems, for example, we can run
> jobs on a cluster and automatically check something like whether user class
> loader can be garbage collected, whether there is thread leak, whether some
> shutdown hooks have been registered and so on.
>   What do you think? Or should we treat it as a problem?
>
> Best,
> Yingjie
>