You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Quanlong Huang <hu...@126.com> on 2018/08/01 13:02:04 UTC

Re:Re: Impala Incident Sharing

Hi Todd and Philip,


The idea of separate ClassLoader is cool. But I would vote for executing java UDFs in a separate JVM. We don't need to create a JVM for each UDF. They can share the same JVM with a configurable hard limit. If the JVM for UDFs runs out of heap size, we can destroy the JVM and fail the queries. (Not sure if this is doable) The drawback is that some innocent queries using java UDFs will fail too, but the cluster will still be healthy.


The patch of adding JVM pause logs and expose JMX metrics is really helpful. I'd like to join the code review.


Thanks,
Quanlong

At 2018-08-01 05:47:25, "Philip Zeyliger" <ph...@cloudera.com.INVALID> wrote:
>Hi Quanlong,
>
>Thank you for the incident note! You might be interested in
>https://gerrit.cloudera.org/#/c/10998/ which is adding some instrumentation
>to make it easier to notice with monitoring tools that we're running out of
>memory or having large GC pauses.
>
>-- Philip
>
>On Tue, Jul 31, 2018 at 8:21 AM Todd Lipcon <to...@cloudera.com.invalid>
>wrote:
>
>> Hi Quanlong,
>>
>> Thanks for the writeup. I wonder if we could two of the issues with a
>> single approach of using a separate ClassLoader instance when we load and
>> instantiate custom UDFs. For one, using a custom ClassLoader means that we
>> could probably ensure that the user's jar is checked first even if there is
>> a conflicting class on the Impala classpath. For two, if we close the
>> custom classloader and ensure that we don't retain any references to it or
>> to classes created by it (i.e the UDF classes), then we should be able to
>> collect leaked memory on the next GC. In the particular case you pointed
>> out, the leak is in static fields of the UDF, and those static fields would
>> get destructed when destructing the classloader.
>>
>> On the downside, making this change could negatively harm performance if
>> any UDFs are relying on their static initializer blocks to do expensive
>> setup. Maybe a compromise would be to make this behavior optional or to
>> cache the ClassLoader via a WeakReference across instantiations of a UDF so
>> that, if there isn't memory pressure, the UDF doesn't need to be
>> re-class-loaded repeatedly.
>>
>> As for tracking memory explicitly, the way that HBase does it is via
>> somewhat careful accounting inline with all of the code that allocates
>> memory. I don't think that approach is feasible in UDFs because the UDF
>> code is provided by end users, and they aren't likely to want to carefully
>> track their heap usage. I think the only way we could effectively "sandbox"
>> UDFs would be to actually execute them in a separate JVM with a limited
>> heap size, but that would be somewhat complex to implement without a big
>> performance drop.
>>
>> -Todd
>>
>> On Tue, Jul 31, 2018 at 7:53 AM, Quanlong Huang <hu...@126.com>
>> wrote:
>>
>> > Hi all,
>> >
>> >
>> > Just to draw your attention of a patch, I'd like to share an incident
>> with
>> > you.
>> >
>> >
>> > Two months ago we encountered an incident in one of our Impala clusters
>> in
>> > version 2.5-cdh5.7.3. Queries were pilled up and most of them were in
>> > CREATED state. It looks like an infamous incident when the catalogd is
>> > loading metadata of huge tables and finally OOM so metadata loading
>> > requests from impalad are stuck. However, the catalogd looks good this
>> > time. The logs showed it's not loading any metadata. Then I looked into
>> the
>> > statestored and found many timeout logs like "Unable to send topic update
>> > message to subscriber impalad@xxx:22000, received error: RPC timed out".
>> > Chose one of the problematic impalad and dug into its log, I found most
>> of
>> > them are OOM. Logs look like:
>> >
>> >
>> > E0524 13:46:46.200798  4405impala-server.cc:1344] There was an error
>> > processing the impalad catalog update. Requesting a full topic update to
>> > recover: OutOfMemoryError: Java heap space
>> >         at com.cloudera.impala.hive.executor.UdfExecutor.evaluate(
>> > UdfExecutor.java:402)
>> >         at com.cloudera.impala.hive.executor.UdfExecutor.evaluate(
>> > UdfExecutor.java:329)
>> > Caused by: java.lang.reflect.InvocationTargetException
>> >         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
>> >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> > DelegatingMethodAccessorImpl.java:43)E0524 13:49:06.433274
>> > 4405impala-server.cc:1344] There was an error processing the impalad
>> > catalog update. Requesting a full topic update to recover:
>> > OutOfMemoryError: Java heap space
>> >         at java.lang.reflect.Method.invoke(Method.java:498)
>> >         at com.cloudera.impala.hive.executor.UdfExecutor.evaluate(
>> > UdfExecutor.java:394)
>> >         ... 1 more
>> > Caused by: java.lang.OutOfMemoryError: Java heap space
>> > E0524 13:56:44.460489  4405impala-server.cc:1344] There was an error
>> > processing the impalad catalog update. Requesting a full topic update to
>> > recover: OutOfMemoryError: Java heap space
>> > E0524 14:03:45.630472  4405impala-server.cc:1344] There was an error
>> > processing the impalad catalog update. Requesting a full topic update to
>> > recover: OutOfMemoryError: Java heap space
>> >
>> >
>> > At that time I can only restart the whole cluster since most of the
>> > impalads were showing OOM errors. It's quite painful but after that the
>> > cluster recovered.
>> >
>> >
>> > After I dug into the query history and replayed the queries in a test
>> > cluster, I realized it's caused by a UDF we added one day before. Our
>> users
>> > want to parse JSON strings, but there're no builtin functions for this
>> > (IMPALA-376). So we simply added the Hive UDF get_json_object by
>> >
>> >
>> > create function get_json_object location
>> 'hdfs://xxx/hive-exec-1.1.0-cdh5.7.3.jar'
>> > SYMBOL='org.apache.hadoop.hive.ql.udf.UDFJson';
>> >
>> >
>> > No problem with this query. But there's a bug in hive-1.1.0-cdh5.7.3 that
>> > would cause memory leak if more than one UdfJson objects are running in a
>> > JVM (HIVE-16196). That's why the impalads got OOM. Memory in the JVM is
>> not
>> > tracked. The bug in UdfJson is fixed in later hive versions (
>> > https://github.com/cloudera/hive/commit/8397f933ea3ee5e2087b4def6b58ed
>> > 5d77026108#diff-e0b4cff52301169fede5c9550e968b67  ). However, using new
>> > hive jars (e.g. hive-exec-1.1.0-cdh5.16.0-SNAPSHOT-core.jar) in the
>> > CREATE FUNCTION statement does not work, since the class name is still
>> the
>> > same and the hive jars in version cdh5.7.3 are already in the CLASSPATH.
>> > Impala will still load the old implementation of UDFJson. To work around
>> > this, I copied the implementation of UDFJson and use another class name,
>> > then use my custom jar file instead.
>> >
>> >
>> > This is not the final fix since memory used in java UDFs are not tracked.
>> > It's still unsafe. Looking into the C++ implementation of
>> get_json_object (
>> > https://github.com/nazgul33/impala-get-json-object-udf) provided by an
>> > early comment in IMPALA-376, it still not tracks the memory. So I give
>> > another implementation and it's ready for code review:
>> > https://gerrit.cloudera.org/c/10950/ (It has passed the pre-view-test:
>> > https://jenkins.impala.io/job/pre-review-test/189/ ) Yes, this's the
>> > purpose of this email. Thank you for reading!
>> >
>> >
>> > Further thoughts:
>> > * We might need to mention in the docs that adding java UDFs may not use
>> > the given jar file if the impala CLASSPATH already contains a jar file
>> > containing the same class.
>> > * We should avoid using Hive builtin UDFs and any other Java UDFs since
>> > their memory is not tracked.
>> > * How to track memory used in JVM? HBase, a pure java project, is able to
>> > track its MemStore and BlockCache size. Can we learn from it?
>> >
>> >
>> > Thanks,
>> > Quanlong
>> > --
>> > Quanlong Huang
>> > Software Developer, Hulu
>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>

Re: Re: Impala Incident Sharing

Posted by Todd Lipcon <to...@cloudera.com.INVALID>.
On Wed, Aug 1, 2018 at 6:02 AM, Quanlong Huang <hu...@126.com>
wrote:

> Hi Todd and Philip,
>
>
> The idea of separate ClassLoader is cool. But I would vote for executing
> java UDFs in a separate JVM. We don't need to create a JVM for each UDF.
> They can share the same JVM with a configurable hard limit. If the JVM for
> UDFs runs out of heap size, we can destroy the JVM and fail the queries.
> (Not sure if this is doable) The drawback is that some innocent queries
> using java UDFs will fail too, but the cluster will still be healthy.
>

I think this would be great, but we need to make sure not to regress
performance significantly. For example, maybe we'd need to expose row
batches as a shared memory region and then map them as a DirectByteBuffer
on the Java side. I think this would probably be a lot of work, though, and
maybe the Java UDF usage is uncommon enough that it may not be worth all of
the coding?

-Todd


>
> The patch of adding JVM pause logs and expose JMX metrics is really
> helpful. I'd like to join the code review.
>
>
> Thanks,
> Quanlong
>
> At 2018-08-01 05:47:25, "Philip Zeyliger" <ph...@cloudera.com.INVALID>
> wrote:
> >Hi Quanlong,
> >
> >Thank you for the incident note! You might be interested in
> >https://gerrit.cloudera.org/#/c/10998/ which is adding some
> instrumentation
> >to make it easier to notice with monitoring tools that we're running out
> of
> >memory or having large GC pauses.
> >
> >-- Philip
> >
> >On Tue, Jul 31, 2018 at 8:21 AM Todd Lipcon <to...@cloudera.com.invalid>
> >wrote:
> >
> >> Hi Quanlong,
> >>
> >> Thanks for the writeup. I wonder if we could two of the issues with a
> >> single approach of using a separate ClassLoader instance when we load
> and
> >> instantiate custom UDFs. For one, using a custom ClassLoader means that
> we
> >> could probably ensure that the user's jar is checked first even if
> there is
> >> a conflicting class on the Impala classpath. For two, if we close the
> >> custom classloader and ensure that we don't retain any references to it
> or
> >> to classes created by it (i.e the UDF classes), then we should be able
> to
> >> collect leaked memory on the next GC. In the particular case you pointed
> >> out, the leak is in static fields of the UDF, and those static fields
> would
> >> get destructed when destructing the classloader.
> >>
> >> On the downside, making this change could negatively harm performance if
> >> any UDFs are relying on their static initializer blocks to do expensive
> >> setup. Maybe a compromise would be to make this behavior optional or to
> >> cache the ClassLoader via a WeakReference across instantiations of a
> UDF so
> >> that, if there isn't memory pressure, the UDF doesn't need to be
> >> re-class-loaded repeatedly.
> >>
> >> As for tracking memory explicitly, the way that HBase does it is via
> >> somewhat careful accounting inline with all of the code that allocates
> >> memory. I don't think that approach is feasible in UDFs because the UDF
> >> code is provided by end users, and they aren't likely to want to
> carefully
> >> track their heap usage. I think the only way we could effectively
> "sandbox"
> >> UDFs would be to actually execute them in a separate JVM with a limited
> >> heap size, but that would be somewhat complex to implement without a big
> >> performance drop.
> >>
> >> -Todd
> >>
> >> On Tue, Jul 31, 2018 at 7:53 AM, Quanlong Huang <huang_quanlong@126.com
> >
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> >
> >> > Just to draw your attention of a patch, I'd like to share an incident
> >> with
> >> > you.
> >> >
> >> >
> >> > Two months ago we encountered an incident in one of our Impala
> clusters
> >> in
> >> > version 2.5-cdh5.7.3. Queries were pilled up and most of them were in
> >> > CREATED state. It looks like an infamous incident when the catalogd is
> >> > loading metadata of huge tables and finally OOM so metadata loading
> >> > requests from impalad are stuck. However, the catalogd looks good this
> >> > time. The logs showed it's not loading any metadata. Then I looked
> into
> >> the
> >> > statestored and found many timeout logs like "Unable to send topic
> update
> >> > message to subscriber impalad@xxx:22000, received error: RPC timed
> out".
> >> > Chose one of the problematic impalad and dug into its log, I found
> most
> >> of
> >> > them are OOM. Logs look like:
> >> >
> >> >
> >> > E0524 13:46:46.200798  4405impala-server.cc:1344] There was an error
> >> > processing the impalad catalog update. Requesting a full topic update
> to
> >> > recover: OutOfMemoryError: Java heap space
> >> >         at com.cloudera.impala.hive.executor.UdfExecutor.evaluate(
> >> > UdfExecutor.java:402)
> >> >         at com.cloudera.impala.hive.executor.UdfExecutor.evaluate(
> >> > UdfExecutor.java:329)
> >> > Caused by: java.lang.reflect.InvocationTargetException
> >> >         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown
> Source)
> >> >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> >> > DelegatingMethodAccessorImpl.java:43)E0524 13:49:06.433274
> >> > 4405impala-server.cc:1344] There was an error processing the impalad
> >> > catalog update. Requesting a full topic update to recover:
> >> > OutOfMemoryError: Java heap space
> >> >         at java.lang.reflect.Method.invoke(Method.java:498)
> >> >         at com.cloudera.impala.hive.executor.UdfExecutor.evaluate(
> >> > UdfExecutor.java:394)
> >> >         ... 1 more
> >> > Caused by: java.lang.OutOfMemoryError: Java heap space
> >> > E0524 13:56:44.460489  4405impala-server.cc:1344] There was an error
> >> > processing the impalad catalog update. Requesting a full topic update
> to
> >> > recover: OutOfMemoryError: Java heap space
> >> > E0524 14:03:45.630472  4405impala-server.cc:1344] There was an error
> >> > processing the impalad catalog update. Requesting a full topic update
> to
> >> > recover: OutOfMemoryError: Java heap space
> >> >
> >> >
> >> > At that time I can only restart the whole cluster since most of the
> >> > impalads were showing OOM errors. It's quite painful but after that
> the
> >> > cluster recovered.
> >> >
> >> >
> >> > After I dug into the query history and replayed the queries in a test
> >> > cluster, I realized it's caused by a UDF we added one day before. Our
> >> users
> >> > want to parse JSON strings, but there're no builtin functions for this
> >> > (IMPALA-376). So we simply added the Hive UDF get_json_object by
> >> >
> >> >
> >> > create function get_json_object location
> >> 'hdfs://xxx/hive-exec-1.1.0-cdh5.7.3.jar'
> >> > SYMBOL='org.apache.hadoop.hive.ql.udf.UDFJson';
> >> >
> >> >
> >> > No problem with this query. But there's a bug in hive-1.1.0-cdh5.7.3
> that
> >> > would cause memory leak if more than one UdfJson objects are running
> in a
> >> > JVM (HIVE-16196). That's why the impalads got OOM. Memory in the JVM
> is
> >> not
> >> > tracked. The bug in UdfJson is fixed in later hive versions (
> >> > https://github.com/cloudera/hive/commit/
> 8397f933ea3ee5e2087b4def6b58ed
> >> > 5d77026108#diff-e0b4cff52301169fede5c9550e968b67  ). However, using
> new
> >> > hive jars (e.g. hive-exec-1.1.0-cdh5.16.0-SNAPSHOT-core.jar) in the
> >> > CREATE FUNCTION statement does not work, since the class name is still
> >> the
> >> > same and the hive jars in version cdh5.7.3 are already in the
> CLASSPATH.
> >> > Impala will still load the old implementation of UDFJson. To work
> around
> >> > this, I copied the implementation of UDFJson and use another class
> name,
> >> > then use my custom jar file instead.
> >> >
> >> >
> >> > This is not the final fix since memory used in java UDFs are not
> tracked.
> >> > It's still unsafe. Looking into the C++ implementation of
> >> get_json_object (
> >> > https://github.com/nazgul33/impala-get-json-object-udf) provided by
> an
> >> > early comment in IMPALA-376, it still not tracks the memory. So I give
> >> > another implementation and it's ready for code review:
> >> > https://gerrit.cloudera.org/c/10950/ (It has passed the
> pre-view-test:
> >> > https://jenkins.impala.io/job/pre-review-test/189/ ) Yes, this's the
> >> > purpose of this email. Thank you for reading!
> >> >
> >> >
> >> > Further thoughts:
> >> > * We might need to mention in the docs that adding java UDFs may not
> use
> >> > the given jar file if the impala CLASSPATH already contains a jar file
> >> > containing the same class.
> >> > * We should avoid using Hive builtin UDFs and any other Java UDFs
> since
> >> > their memory is not tracked.
> >> > * How to track memory used in JVM? HBase, a pure java project, is
> able to
> >> > track its MemStore and BlockCache size. Can we learn from it?
> >> >
> >> >
> >> > Thanks,
> >> > Quanlong
> >> > --
> >> > Quanlong Huang
> >> > Software Developer, Hulu
> >>
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
>



-- 
Todd Lipcon
Software Engineer, Cloudera