You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Joshi, Rekha" <Re...@intuit.com> on 2012/08/29 13:00:25 UTC
Re: HCatalog Thrift Error
Thanks for confirming Agateaa.
Since the Hcat server behaves normally , and you observed the issue in your log just once, it does drop in a concern for me at the moment.
Also not sure if it is CMS related/environment related behavior.At some point of time I might try to replicate your system, and update you if I face this too.
However cc-ing to thrift dev mailing list as well, as there are some known libthrift/TBinaryProtocol issue inline with yours -
https://issues.apache.org/jira/browse/THRIFT-1643
Thanks
Rekha
From: agateaaa <ag...@gmail.com>>
Reply-To: <hc...@incubator.apache.org>>
Date: Tue, 28 Aug 2012 07:50:00 -0700
To: <hc...@incubator.apache.org>>
Subject: Re: HCatalog Thrift Error
Hi Rekha
Yes the hcatalog server was up and and still running. I can query tables via pig scripts and also run hive queries. As a matter of fact its still
running.
Before I applied a patch for THRIFT-1468 I had seen my server crash frequently under similar circumstances (OutOfMemory). Since the patch
havent seen any crashes (just that error once)
I did take java heap dump just after I saw the error and did not see any increase in the heap size. I read in GC tuning docs that if
full gc is taking longer (taking 98% of time), JVM may throw that OutOfMemory error - but I am not really sure (I am using CMS so I am not sure if that
applies)
I can check if I get same error as THRIFT-1205
Isnt HIVE-2715 same as fixing THRIFT-1468 (atleast for in terms of its resolution)?
Thanks
A
On Tue, Aug 28, 2012 at 2:33 AM, Joshi, Rekha <Re...@intuit.com>> wrote:
Hi Agateaa,
Impressive bug description.
Can you confirm HCat server was up (inspite of thread dump/GC) and for all practical purposes commands were getting executed in a normal fashion for fairly good time after the GC issues were noticed on log?
Unless there is a self-healing effect built-in :-) /timeout after which the error is automatically invalid/system is reset/space is reclaimed, there must be a way it would have directly impact the system, and not just known because one checks the log.
I do not have the same patched environment as yours, but would you care to unpatch Thrift-1468 and then check if your system bug behavior is in sync with -
https://issues.apache.org/jira/browse/THRIFT-1205
https://issues.apache.org/jira/browse/THRIFT-1468
https://issues.apache.org/jira/browse/HIVE-2715
Or especially since you did not enter arbitrary data, can you confirm you get usual if you do enter provide arbitrary data?
Thanks
Rekha
From: agateaaa <ag...@gmail.com>>
Reply-To: <hc...@incubator.apache.org>>
Date: Mon, 27 Aug 2012 10:38:01 -0700
To: <hc...@incubator.apache.org>>
Subject: Re: HCatalog Thrift Error
Correction:
I have a fairly small server (VM) 1GB RAM and 1 CPU and using HCatalog Version 0.4, Hive 0.9 (patched for HIVE-3008) with Thrift 0.7 (patched for THRIFT-1468)
On Mon, Aug 27, 2012 at 10:27 AM, agateaaa <ag...@gmail.com>> wrote:
Hi,
I got this error over the weekend hcat.err log file.
Noticed at the approximately same time Full GC was happening in the gc logs.
Exception in thread "pool-1-thread-200" java.lang.OutOfMemoryError: Java heap space
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Exception in thread "pool-1-thread-201" java.lang.OutOfMemoryError: Java heap space
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Exception in thread "pool-1-thread-202" java.lang.OutOfMemoryError: Java heap space
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Exception in thread "pool-1-thread-203" java.lang.OutOfMemoryError: Java heap space
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I noticed that the hcatalog server had not shutdown, don't see any other abnormality in the logs
Searching led me to these two thrift issues
https://issues.apache.org/jira/browse/THRIFT-601
https://issues.apache.org/jira/browse/THRIFT-1205
Only difference is that in my case HCatalog server did not crash and I wasn't trying to send
any arbritary data to the thrift server at the telnet port
I have a fairly small server (VM) 1GB RAM and 1 CPU and using HCatalog Version 0.4, Hive 0.9 (patched HIVE-3008) with Thrift 0.7 (patched for THRIFT-1438)
Has anyone seen this before ?
Thanks
- A
Re: HCatalog Thrift Error
Posted by Travis Crawford <tr...@gmail.com>.
Hey thrift & hcat gurus -
We've also noticed this OOM issue when processing corrupt thrift
messages. We're attempting to work around this issue as follows (see
https://github.com/kevinweil/elephant-bird/pull/239/files#L5R45):
@Override
public void deserialize(TBase base, byte[] bytes) throws TException {
// set upper bound on bytes available so that protocol does not try
// to allocate and read large amounts of data in case of corrupt input
protocol.setReadLength(bytes.length);
super.deserialize(base, bytes);
}
Would it make sense to setReadLength directly in TDeserializer.deserialize?
https://github.com/apache/thrift/blob/trunk/lib/java/src/org/apache/thrift/TDeserializer.java#L60
--travis
On Wed, Aug 29, 2012 at 4:00 AM, Joshi, Rekha <Re...@intuit.com> wrote:
> Thanks for confirming Agateaa.
>
> Since the Hcat server behaves normally , and you observed the issue in your log just once, it does drop in a concern for me at the moment.
> Also not sure if it is CMS related/environment related behavior.At some point of time I might try to replicate your system, and update you if I face this too.
>
> However cc-ing to thrift dev mailing list as well, as there are some known libthrift/TBinaryProtocol issue inline with yours -
> https://issues.apache.org/jira/browse/THRIFT-1643
>
> Thanks
> Rekha
>
> From: agateaaa <ag...@gmail.com>>
> Reply-To: <hc...@incubator.apache.org>>
> Date: Tue, 28 Aug 2012 07:50:00 -0700
> To: <hc...@incubator.apache.org>>
> Subject: Re: HCatalog Thrift Error
>
> Hi Rekha
>
> Yes the hcatalog server was up and and still running. I can query tables via pig scripts and also run hive queries. As a matter of fact its still
> running.
>
> Before I applied a patch for THRIFT-1468 I had seen my server crash frequently under similar circumstances (OutOfMemory). Since the patch
> havent seen any crashes (just that error once)
>
> I did take java heap dump just after I saw the error and did not see any increase in the heap size. I read in GC tuning docs that if
> full gc is taking longer (taking 98% of time), JVM may throw that OutOfMemory error - but I am not really sure (I am using CMS so I am not sure if that
> applies)
>
> I can check if I get same error as THRIFT-1205
>
> Isnt HIVE-2715 same as fixing THRIFT-1468 (atleast for in terms of its resolution)?
>
> Thanks
> A
>
>
>
>
>
> On Tue, Aug 28, 2012 at 2:33 AM, Joshi, Rekha <Re...@intuit.com>> wrote:
> Hi Agateaa,
>
> Impressive bug description.
>
> Can you confirm HCat server was up (inspite of thread dump/GC) and for all practical purposes commands were getting executed in a normal fashion for fairly good time after the GC issues were noticed on log?
> Unless there is a self-healing effect built-in :-) /timeout after which the error is automatically invalid/system is reset/space is reclaimed, there must be a way it would have directly impact the system, and not just known because one checks the log.
>
> I do not have the same patched environment as yours, but would you care to unpatch Thrift-1468 and then check if your system bug behavior is in sync with -
> https://issues.apache.org/jira/browse/THRIFT-1205
> https://issues.apache.org/jira/browse/THRIFT-1468
> https://issues.apache.org/jira/browse/HIVE-2715
>
> Or especially since you did not enter arbitrary data, can you confirm you get usual if you do enter provide arbitrary data?
>
> Thanks
> Rekha
>
> From: agateaaa <ag...@gmail.com>>
> Reply-To: <hc...@incubator.apache.org>>
> Date: Mon, 27 Aug 2012 10:38:01 -0700
> To: <hc...@incubator.apache.org>>
> Subject: Re: HCatalog Thrift Error
>
> Correction:
>
> I have a fairly small server (VM) 1GB RAM and 1 CPU and using HCatalog Version 0.4, Hive 0.9 (patched for HIVE-3008) with Thrift 0.7 (patched for THRIFT-1468)
>
>
> On Mon, Aug 27, 2012 at 10:27 AM, agateaaa <ag...@gmail.com>> wrote:
> Hi,
>
> I got this error over the weekend hcat.err log file.
>
> Noticed at the approximately same time Full GC was happening in the gc logs.
>
> Exception in thread "pool-1-thread-200" java.lang.OutOfMemoryError: Java heap space
> at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
> at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Exception in thread "pool-1-thread-201" java.lang.OutOfMemoryError: Java heap space
> at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
> at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Exception in thread "pool-1-thread-202" java.lang.OutOfMemoryError: Java heap space
> at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
> at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Exception in thread "pool-1-thread-203" java.lang.OutOfMemoryError: Java heap space
> at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
> at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
>
> I noticed that the hcatalog server had not shutdown, don't see any other abnormality in the logs
>
>
> Searching led me to these two thrift issues
> https://issues.apache.org/jira/browse/THRIFT-601
> https://issues.apache.org/jira/browse/THRIFT-1205
>
> Only difference is that in my case HCatalog server did not crash and I wasn't trying to send
> any arbritary data to the thrift server at the telnet port
>
> I have a fairly small server (VM) 1GB RAM and 1 CPU and using HCatalog Version 0.4, Hive 0.9 (patched HIVE-3008) with Thrift 0.7 (patched for THRIFT-1438)
>
> Has anyone seen this before ?
>
> Thanks
> - A
>
>
>
Re: HCatalog Thrift Error
Posted by Travis Crawford <tr...@gmail.com>.
Hey thrift & hcat gurus -
We've also noticed this OOM issue when processing corrupt thrift
messages. We're attempting to work around this issue as follows (see
https://github.com/kevinweil/elephant-bird/pull/239/files#L5R45):
@Override
public void deserialize(TBase base, byte[] bytes) throws TException {
// set upper bound on bytes available so that protocol does not try
// to allocate and read large amounts of data in case of corrupt input
protocol.setReadLength(bytes.length);
super.deserialize(base, bytes);
}
Would it make sense to setReadLength directly in TDeserializer.deserialize?
https://github.com/apache/thrift/blob/trunk/lib/java/src/org/apache/thrift/TDeserializer.java#L60
--travis
On Wed, Aug 29, 2012 at 4:00 AM, Joshi, Rekha <Re...@intuit.com> wrote:
> Thanks for confirming Agateaa.
>
> Since the Hcat server behaves normally , and you observed the issue in your log just once, it does drop in a concern for me at the moment.
> Also not sure if it is CMS related/environment related behavior.At some point of time I might try to replicate your system, and update you if I face this too.
>
> However cc-ing to thrift dev mailing list as well, as there are some known libthrift/TBinaryProtocol issue inline with yours -
> https://issues.apache.org/jira/browse/THRIFT-1643
>
> Thanks
> Rekha
>
> From: agateaaa <ag...@gmail.com>>
> Reply-To: <hc...@incubator.apache.org>>
> Date: Tue, 28 Aug 2012 07:50:00 -0700
> To: <hc...@incubator.apache.org>>
> Subject: Re: HCatalog Thrift Error
>
> Hi Rekha
>
> Yes the hcatalog server was up and and still running. I can query tables via pig scripts and also run hive queries. As a matter of fact its still
> running.
>
> Before I applied a patch for THRIFT-1468 I had seen my server crash frequently under similar circumstances (OutOfMemory). Since the patch
> havent seen any crashes (just that error once)
>
> I did take java heap dump just after I saw the error and did not see any increase in the heap size. I read in GC tuning docs that if
> full gc is taking longer (taking 98% of time), JVM may throw that OutOfMemory error - but I am not really sure (I am using CMS so I am not sure if that
> applies)
>
> I can check if I get same error as THRIFT-1205
>
> Isnt HIVE-2715 same as fixing THRIFT-1468 (atleast for in terms of its resolution)?
>
> Thanks
> A
>
>
>
>
>
> On Tue, Aug 28, 2012 at 2:33 AM, Joshi, Rekha <Re...@intuit.com>> wrote:
> Hi Agateaa,
>
> Impressive bug description.
>
> Can you confirm HCat server was up (inspite of thread dump/GC) and for all practical purposes commands were getting executed in a normal fashion for fairly good time after the GC issues were noticed on log?
> Unless there is a self-healing effect built-in :-) /timeout after which the error is automatically invalid/system is reset/space is reclaimed, there must be a way it would have directly impact the system, and not just known because one checks the log.
>
> I do not have the same patched environment as yours, but would you care to unpatch Thrift-1468 and then check if your system bug behavior is in sync with -
> https://issues.apache.org/jira/browse/THRIFT-1205
> https://issues.apache.org/jira/browse/THRIFT-1468
> https://issues.apache.org/jira/browse/HIVE-2715
>
> Or especially since you did not enter arbitrary data, can you confirm you get usual if you do enter provide arbitrary data?
>
> Thanks
> Rekha
>
> From: agateaaa <ag...@gmail.com>>
> Reply-To: <hc...@incubator.apache.org>>
> Date: Mon, 27 Aug 2012 10:38:01 -0700
> To: <hc...@incubator.apache.org>>
> Subject: Re: HCatalog Thrift Error
>
> Correction:
>
> I have a fairly small server (VM) 1GB RAM and 1 CPU and using HCatalog Version 0.4, Hive 0.9 (patched for HIVE-3008) with Thrift 0.7 (patched for THRIFT-1468)
>
>
> On Mon, Aug 27, 2012 at 10:27 AM, agateaaa <ag...@gmail.com>> wrote:
> Hi,
>
> I got this error over the weekend hcat.err log file.
>
> Noticed at the approximately same time Full GC was happening in the gc logs.
>
> Exception in thread "pool-1-thread-200" java.lang.OutOfMemoryError: Java heap space
> at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
> at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Exception in thread "pool-1-thread-201" java.lang.OutOfMemoryError: Java heap space
> at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
> at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Exception in thread "pool-1-thread-202" java.lang.OutOfMemoryError: Java heap space
> at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
> at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Exception in thread "pool-1-thread-203" java.lang.OutOfMemoryError: Java heap space
> at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
> at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:81)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
>
> I noticed that the hcatalog server had not shutdown, don't see any other abnormality in the logs
>
>
> Searching led me to these two thrift issues
> https://issues.apache.org/jira/browse/THRIFT-601
> https://issues.apache.org/jira/browse/THRIFT-1205
>
> Only difference is that in my case HCatalog server did not crash and I wasn't trying to send
> any arbritary data to the thrift server at the telnet port
>
> I have a fairly small server (VM) 1GB RAM and 1 CPU and using HCatalog Version 0.4, Hive 0.9 (patched HIVE-3008) with Thrift 0.7 (patched for THRIFT-1438)
>
> Has anyone seen this before ?
>
> Thanks
> - A
>
>
>