You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@phoenix.apache.org by James Taylor <ja...@apache.org> on 2016/05/04 17:15:09 UTC

Re: Jenkins build failures?

Our Jenkins builds have improved, but we're seeing some issues:
- timeouts with the new org.apache.phoenix.hive.HivePhoenixStoreIT test.
- consistent failure with 4.x-HBase-1.1 build. I suspect that Jenkins build
is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for quite a while.
There's likely some changes that were made to the other Jenkins build
scripts that weren't made to this one
- flapping of
the org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
test in 0.98 and 1.0
- no email sent for 0.98 build (as far as I can tell)

If folks have time to look into these, that'd be much appreciated.

    James



On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <ja...@apache.org>
wrote:

> The defaults when tests are running are much lower than the standard
> Phoenix defaults (see QueryServicesTestImpl and
> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why the
> HashJoinIT and SortMergeJoinIT tests (I think these are the culprits) do
> not seem to adhere to these (or maybe override them?). They fail for me on
> my Mac, but they do pass on a Linux box. Would be awesome if someone could
> investigate and submit a patch to fix these.
>
> Thanks,
> James
>
> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
>> The default thread pool sizes for HDFS, HBase, ZK, and the Phoenix client
>> are all contributing to this huge thread count.
>>
>> A good starting point would be to take a jstack of the IT process and
>> count, group by threads with similar name. Reconfigure to reduce all those
>> groups to something like 10 each, see if the test still runs reliably on
>> local hardware.
>>
>> On Friday, April 29, 2016, Sergey Soldatov <se...@gmail.com>
>> wrote:
>>
>> > but the way, we need to do something with those OOMs and "unable to
>> > create new native thread" in ITs. It's quite strange to see in 10
>> > lines test such kind of failures. Especially when queries for table
>> > with less than 10 rows generate over 2500 threads. Does anybody know
>> > whether it's zk related issue?
>> >
>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor <jamestaylor@apache.org
>> > <javascript:;>> wrote:
>> > > A patch would be much appreciated, Sergey.
>> > >
>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <
>> > sergeysoldatov@gmail.com <javascript:;>>
>> > > wrote:
>> > >
>> > >> As for flume module - flume-ng is coming with commons-io 2.1 while
>> > >> hadoop & hbase require org.apache.commons.io.Charsets which was
>> > >> introduced in 2.3. Easy way is to move dependency on flume-ng after
>> > >> the dependencies on hbase/hadoop.
>> > >>
>> > >> The last thing about ConcurrentHashMap - it definitely means that the
>> > >> code was compiled with 1.8 since 1.7 returns a simple Set while 1.8
>> > >> returns KeySetView
>> > >>
>> > >>
>> > >>
>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <josh.elser@gmail.com
>> > <javascript:;>> wrote:
>> > >> > *tl;dr*
>> > >> >
>> > >> > * I'm removing ubuntu-us1 from all pools
>> > >> > * Phoenix-Flume ITs look busted
>> > >> > * UpsertValuesIT looks busted
>> > >> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in its
>> > entirety.
>> > >> >
>> > >> > Details below...
>> > >> >
>> > >> > It looks like we have a bunch of different reasons for the
>> failures.
>> > >> > Starting with Phoenix-master:
>> > >> >
>> > >> >>>>
>> > >> > org.apache.phoenix.schema.NewerTableAlreadyExistsException: ERROR
>> 1013
>> > >> > (42M04): Table already exists. tableName=T
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
>> > >> > <<<
>> > >> >
>> > >> > I've seen this coming out of a few different tests (I think I've
>> also
>> > run
>> > >> > into it on my own, but that's another thing)
>> > >> >
>> > >> > Some of them look like the Jenkins build host is just over-taxed:
>> > >> >
>> > >> >>>>
>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0) failed;
>> > error='Cannot
>> > >> > allocate memory' (errno=12)
>> > >> > #
>> > >> > # There is insufficient memory for the Java Runtime Environment to
>> > >> continue.
>> > >> > # Native memory allocation (malloc) failed to allocate 331350016
>> bytes
>> > >> for
>> > >> > committing reserved memory.
>> > >> > # An error report file with more information is saved as:
>> > >> > #
>> > >> >
>> > >>
>> >
>> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0) failed;
>> > error='Cannot
>> > >> > allocate memory' (errno=12)
>> > >> > #
>> > >> > <<<
>> > >> >
>> > >> > and
>> > >> >
>> > >> >>>>
>> > >> > -------------------------------------------------------
>> > >> >  T E S T S
>> > >> > -------------------------------------------------------
>> > >> > Build step 'Invoke top-level Maven targets' marked build as failure
>> > >> > <<<
>> > >> >
>> > >> > Both of these issues are limited to the host "ubuntu-us1". Let me
>> just
>> > >> > remove him from the pool (on Phoenix-master) and see if that helps
>> at
>> > >> all.
>> > >> >
>> > >> > I also see some sporadic failures of some Flume tests
>> > >> >
>> > >> >>>>
>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT
>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>> 0.004
>> > sec
>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
>> > >> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed: 0.004 sec
>> <<<
>> > >> ERROR!
>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save in
>> any
>> > >> > storage directories while saving namespace.
>> > >> > Caused by: java.io.IOException: Failed to save in any storage
>> > directories
>> > >> > while saving namespace.
>> > >> >
>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT
>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>> 0.005
>> > sec
>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.RegexEventSerializerIT
>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT  Time elapsed:
>> 0.004
>> > sec
>> > >> > <<< ERROR!
>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save in
>> any
>> > >> > storage directories while saving namespace.
>> > >> > Caused by: java.io.IOException: Failed to save in any storage
>> > directories
>> > >> > while saving namespace.
>> > >> > <<<
>> > >> >
>> > >> > I'm not sure what the error message means at a glance.
>> > >> >
>> > >> > For Phoenix-HBase-1.1:
>> > >> >
>> > >> >>>>
>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> > >> java.lang.NoSuchMethodError:
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >         at
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> > >> >         at
>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> > >> >         at
>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> > >> > Caused by: java.lang.NoSuchMethodError:
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> > >> >         at
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> > >> >         ... 4 more
>> > >> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
>> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer(2279): error
>> > telling
>> > >> > master we are up
>> > >> > com.google.protobuf.ServiceException:
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> > >> java.lang.NoSuchMethodError:
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >         at
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> > >> >         at
>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> > >> >         at
>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> > >> > Caused by: java.lang.NoSuchMethodError:
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> > >> >         at
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> > >> >         ... 4 more
>> > >> >
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>> > >> >         at java.security.AccessController.doPrivileged(Native
>> Method)
>> > >> >         at javax.security.auth.Subject.doAs(Subject.java:356)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> > >> > Caused by:
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> > >> java.lang.NoSuchMethodError:
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >         at
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> > >> >         at
>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> > >> >         at
>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> > >> > Caused by: java.lang.NoSuchMethodError:
>> > >> >
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> > >> >         at
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> > >> >         ... 4 more
>> > >> >
>> > >> >         at
>> > >> >
>> > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
>> > >> >         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
>> > >> >         ... 13 more
>> > >> > <<<
>> > >> >
>> > >> > We have hit-or-miss on this error message which keeps
>> hbase:namespace
>> > >> from
>> > >> > being assigned (as the RS's can never report into the hmaster).
>> This
>> > is
>> > >> > happening across a couple of the nodes (ubuntu-[3,4,6]). I had
>> tried
>> > to
>> > >> look
>> > >> > into this one over the weekend (and was lead to a JDK8 built jar,
>> > >> running on
>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the
>> > >> hbase-server-1.1.3.jar
>> > >> > from central, I see it was built with 1.7.0_80 (which I think means
>> > the
>> > >> JDK8
>> > >> > thought is a red-herring). I'm really confused by this one,
>> actually.
>> > >> > Something must be amiss here.
>> > >> >
>> > >> > For Phoenix-HBase-1.0:
>> > >> >
>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT failure, and
>> > >> timeouts
>> > >> > on ubuntu-us1. There is one crash on H10, but that might just be
>> bad
>> > >> luck.
>> > >> >
>> > >> > For Phoenix-HBase-0.98:
>> > >> >
>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1.
>> > >> >
>> > >> >
>> > >> > James Taylor wrote:
>> > >> >>
>> > >> >> Anyone know why our Jenkins builds keep failing? Is it
>> environmental
>> > and
>> > >> >> is
>> > >> >> there anything we can do about it?
>> > >> >>
>> > >> >> Thanks,
>> > >> >> James
>> > >> >>
>> > >> >
>> > >>
>> >
>>
>
>

Re: Jenkins build failures?

Posted by Josh Elser <el...@apache.org>.

+1 Great digging, Sergey!

Sergey Soldatov wrote:
> James,
> Sure. I will file a JIRA and check about non zero thread pool size (not
> sure that would help since it's initialized in getDefaultExecutor and
> always used if no other pool is provided in HTable constructor).
> Thanks,
> Sergey
>
> On Mon, May 23, 2016 at 8:11 PM, James Taylor<ja...@apache.org>
> wrote:
>
>> Thanks, Sergey. Sounds like you're on to it. We could try configuring
>> those tests with a non zero thread pool size so they don't
>> use SynchronousQueue. Want to file a JIRA with this info so we don't lose
>> track of it?
>>
>>      James
>>
>> On Tue, May 17, 2016 at 11:21 PM, Sergey Soldatov<
>> sergeysoldatov@gmail.com>  wrote:
>>
>>> Getting back to the failures with OOM/unable to create a native thread.
>>> Those files have around 100 tests inside each that are running on top of
>>> the phoenix. In total they generate over 2500 scans. (system.catalog,
>>> sequences and regular scans over table).  The problem that on HBase side
>>> all scans are going through the ThreadPoolExecutor generated in HTable.
>>> Which is using SynchronousQueue as the queue. As from the javadoc for
>>> ThreadPoolExecutor:
>>>
>>> *Direct handoffs. A good default choice for a work queue is a
>>> SynchronousQueue that hands off tasks to threads without otherwise holding
>>> them. Here, an attempt to queue a task will fail if no threads are
>>> immediately available to run it, so a new thread will be constructed. This
>>> policy avoids lockups when handling sets of requests that might have
>>> internal dependencies. Direct handoffs generally require unbounded
>>> maximumPoolSizes to avoid rejection of new submitted tasks. This in turn
>>> admits the possibility of unbounded thread growth when commands continue
>>> to
>>> arrive on average faster than they can be processed.*
>>>
>>>
>>> And actually we hit exactly last  case. But still there isl a question.
>>> Since all those tests all passing correctly and the scans are completed
>>> during execution (I checked that) it's not clear why all those threads are
>>> still alive. If someone has a suggestion why it could happen it will be
>>> interesting to listen. Otherwise I will dig deeper a bit later.  Possible
>>> also it's worth to change the queue in HBase to something less aggressive
>>> in terms of thread creation.
>>>
>>> Thanks,
>>> Sergey
>>>
>>>
>>> On Thu, May 5, 2016 at 8:24 AM, James Taylor<ja...@apache.org>
>>> wrote:
>>>
>>>> Looks like all Jenkins builds are failing, but it seems environmental?
>>> Do
>>>> we need to exclude some particular kind of host(s)?
>>>>
>>>> On Wed, May 4, 2016 at 5:25 PM, James Taylor<ja...@apache.org>
>>>> wrote:
>>>>
>>>>> Thanks, Sergey!
>>>>>
>>>>> On Wed, May 4, 2016 at 5:22 PM, Sergey Soldatov<
>>>> sergeysoldatov@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> James,
>>>>>> Ah, didn't notice that timeouts are not shown in the final report as
>>>>>> failures. It seems that the build is using JDK 1.7 and test run OOM
>>>>>> with PermGen space. Fixed in PHOENIX-2879
>>>>>>
>>>>>> Thanks,
>>>>>> Sergey
>>>>>>
>>>>>> On Wed, May 4, 2016 at 1:48 PM, James Taylor<jamestaylor@apache.org
>>>>>> wrote:
>>>>>>> Sergey, on master branch (which is HBase 1.2):
>>>>>>> https://builds.apache.org/job/Phoenix-master/1214/console
>>>>>>>
>>>>>>> On Wed, May 4, 2016 at 1:31 PM, Sergey Soldatov<
>>>>>> sergeysoldatov@gmail.com>
>>>>>>> wrote:
>>>>>>>> James,
>>>>>>>> Regarding HivePhoenixStoreIT. Are you talking about
>>>>>>>> Phoenix-4.x-HBase-1.0  job? Last build passed it successfully.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 4, 2016 at 10:15 AM, James Taylor<
>>>> jamestaylor@apache.org>
>>>>>>>> wrote:
>>>>>>>>> Our Jenkins builds have improved, but we're seeing some issues:
>>>>>>>>> - timeouts with the new
>>> org.apache.phoenix.hive.HivePhoenixStoreIT
>>>>>> test.
>>>>>>>>> - consistent failure with 4.x-HBase-1.1 build. I suspect that
>>>> Jenkins
>>>>>>>>> build
>>>>>>>>> is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for
>>> quite
>>>> a
>>>>>>>>> while.
>>>>>>>>> There's likely some changes that were made to the other Jenkins
>>>> build
>>>>>>>>> scripts that weren't made to this one
>>>>>>>>> - flapping of
>>>>>>>>> the
>>>>>>>>>
>>> org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
>>>>>>>>> test in 0.98 and 1.0
>>>>>>>>> - no email sent for 0.98 build (as far as I can tell)
>>>>>>>>>
>>>>>>>>> If folks have time to look into these, that'd be much
>>> appreciated.
>>>>>>>>>      James
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Apr 30, 2016 at 11:55 AM, James Taylor<
>>>>>> jamestaylor@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The defaults when tests are running are much lower than the
>>>> standard
>>>>>>>>>> Phoenix defaults (see QueryServicesTestImpl and
>>>>>>>>>> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why
>>> the
>>>>>>>>>> HashJoinIT and SortMergeJoinIT tests (I think these are the
>>>>>> culprits)
>>>>>>>>>> do
>>>>>>>>>> not seem to adhere to these (or maybe override them?). They
>>> fail
>>>>>> for me
>>>>>>>>>> on
>>>>>>>>>> my Mac, but they do pass on a Linux box. Would be awesome if
>>>> someone
>>>>>>>>>> could
>>>>>>>>>> investigate and submit a patch to fix these.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> James
>>>>>>>>>>
>>>>>>>>>> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk<
>>>> ndimiduk@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The default thread pool sizes for HDFS, HBase, ZK, and the
>>>> Phoenix
>>>>>>>>>>> client
>>>>>>>>>>> are all contributing to this huge thread count.
>>>>>>>>>>>
>>>>>>>>>>> A good starting point would be to take a jstack of the IT
>>> process
>>>>>> and
>>>>>>>>>>> count, group by threads with similar name. Reconfigure to
>>> reduce
>>>>>> all
>>>>>>>>>>> those
>>>>>>>>>>> groups to something like 10 each, see if the test still runs
>>>>>> reliably
>>>>>>>>>>> on
>>>>>>>>>>> local hardware.
>>>>>>>>>>>
>>>>>>>>>>> On Friday, April 29, 2016, Sergey Soldatov<
>>>>>> sergeysoldatov@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> but the way, we need to do something with those OOMs and
>>>> "unable
>>>>>> to
>>>>>>>>>>>> create new native thread" in ITs. It's quite strange to see
>>> in
>>>> 10
>>>>>>>>>>>> lines test such kind of failures. Especially when queries
>>> for
>>>>>> table
>>>>>>>>>>>> with less than 10 rows generate over 2500 threads. Does
>>> anybody
>>>>>> know
>>>>>>>>>>>> whether it's zk related issue?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Apr 29, 2016 at 7:51 AM, James Taylor
>>>>>>>>>>>> <jamestaylor@apache.org
>>>>>>>>>>>> <javascript:;>>  wrote:
>>>>>>>>>>>>> A patch would be much appreciated, Sergey.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov<
>>>>>>>>>>>> sergeysoldatov@gmail.com<javascript:;>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for flume module - flume-ng is coming with commons-io
>>> 2.1
>>>>>>>>>>>>>> while
>>>>>>>>>>>>>> hadoop&  hbase require org.apache.commons.io.Charsets
>>> which
>>>>>> was
>>>>>>>>>>>>>> introduced in 2.3. Easy way is to move dependency on
>>>> flume-ng
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>> the dependencies on hbase/hadoop.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The last thing about ConcurrentHashMap - it definitely
>>> means
>>>>>> that
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> code was compiled with 1.8 since 1.7 returns a simple Set
>>>>>> while
>>>>>>>>>>>>>> 1.8
>>>>>>>>>>>>>> returns KeySetView
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser<
>>>>>> josh.elser@gmail.com
>>>>>>>>>>>> <javascript:;>>  wrote:
>>>>>>>>>>>>>>> *tl;dr*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * I'm removing ubuntu-us1 from all pools
>>>>>>>>>>>>>>> * Phoenix-Flume ITs look busted
>>>>>>>>>>>>>>> * UpsertValuesIT looks busted
>>>>>>>>>>>>>>> * Something is weirdly wrong with
>>> Phoenix-4.x-HBase-1.1 in
>>>>>> its
>>>>>>>>>>>> entirety.
>>>>>>>>>>>>>>> Details below...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It looks like we have a bunch of different reasons for
>>> the
>>>>>>>>>>> failures.
>>>>>>>>>>>>>>> Starting with Phoenix-master:
>>>>>>>>>>>>>>>
>>>> org.apache.phoenix.schema.NewerTableAlreadyExistsException:
>>>>>>>>>>>>>>> ERROR
>>>>>>>>>>> 1013
>>>>>>>>>>>>>>> (42M04): Table already exists. tableName=T
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
>>>>>>>>>>>>>>> <<<
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've seen this coming out of a few different tests (I
>>>> think
>>>>>>>>>>>>>>> I've
>>>>>>>>>>> also
>>>>>>>>>>>> run
>>>>>>>>>>>>>>> into it on my own, but that's another thing)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Some of them look like the Jenkins build host is just
>>>>>>>>>>>>>>> over-taxed:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>>>>>>>>>>>>>>> os::commit_memory(0x00000007e7600000, 331350016, 0)
>>>> failed;
>>>>>>>>>>>> error='Cannot
>>>>>>>>>>>>>>> allocate memory' (errno=12)
>>>>>>>>>>>>>>> #
>>>>>>>>>>>>>>> # There is insufficient memory for the Java Runtime
>>>>>> Environment
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> continue.
>>>>>>>>>>>>>>> # Native memory allocation (malloc) failed to allocate
>>>>>>>>>>>>>>> 331350016
>>>>>>>>>>> bytes
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> committing reserved memory.
>>>>>>>>>>>>>>> # An error report file with more information is saved
>>> as:
>>>>>>>>>>>>>>> #
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
>>>>>>>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>>>>>>>>>>>>>>> os::commit_memory(0x00000007ea600000, 273678336, 0)
>>>> failed;
>>>>>>>>>>>> error='Cannot
>>>>>>>>>>>>>>> allocate memory' (errno=12)
>>>>>>>>>>>>>>> #
>>>>>>>>>>>>>>> <<<
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>>>>>>>   T E S T S
>>>>>>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>>>>>>> Build step 'Invoke top-level Maven targets' marked
>>> build
>>>> as
>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>> <<<
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Both of these issues are limited to the host
>>> "ubuntu-us1".
>>>>>> Let
>>>>>>>>>>>>>>> me
>>>>>>>>>>> just
>>>>>>>>>>>>>>> remove him from the pool (on Phoenix-master) and see if
>>>> that
>>>>>>>>>>>>>>> helps
>>>>>>>>>>> at
>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>> I also see some sporadic failures of some Flume tests
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Running org.apache.phoenix.flume.PhoenixSinkIT
>>>>>>>>>>>>>>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>>>>>> elapsed:
>>>>>>>>>>> 0.004
>>>>>>>>>>>> sec
>>>>>>>>>>>>>>> <<<  FAILURE! - in
>>> org.apache.phoenix.flume.PhoenixSinkIT
>>>>>>>>>>>>>>> org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed:
>>>> 0.004
>>>>>> sec
>>>>>>>>>>> <<<
>>>>>>>>>>>>>> ERROR!
>>>>>>>>>>>>>>> java.lang.RuntimeException: java.io.IOException:
>>> Failed to
>>>>>> save
>>>>>>>>>>>>>>> in
>>>>>>>>>>> any
>>>>>>>>>>>>>>> storage directories while saving namespace.
>>>>>>>>>>>>>>> Caused by: java.io.IOException: Failed to save in any
>>>>>> storage
>>>>>>>>>>>> directories
>>>>>>>>>>>>>>> while saving namespace.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Running org.apache.phoenix.flume.RegexEventSerializerIT
>>>>>>>>>>>>>>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>>>>>> elapsed:
>>>>>>>>>>> 0.005
>>>>>>>>>>>> sec
>>>>>>>>>>>>>>> <<<  FAILURE! - in
>>>>>>>>>>>>>>> org.apache.phoenix.flume.RegexEventSerializerIT
>>>>>>>>>>>>>>> org.apache.phoenix.flume.RegexEventSerializerIT  Time
>>>>>> elapsed:
>>>>>>>>>>> 0.004
>>>>>>>>>>>> sec
>>>>>>>>>>>>>>> <<<  ERROR!
>>>>>>>>>>>>>>> java.lang.RuntimeException: java.io.IOException:
>>> Failed to
>>>>>> save
>>>>>>>>>>>>>>> in
>>>>>>>>>>> any
>>>>>>>>>>>>>>> storage directories while saving namespace.
>>>>>>>>>>>>>>> Caused by: java.io.IOException: Failed to save in any
>>>>>> storage
>>>>>>>>>>>> directories
>>>>>>>>>>>>>>> while saving namespace.
>>>>>>>>>>>>>>> <<<
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm not sure what the error message means at a glance.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For Phoenix-HBase-1.1:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.hadoop.hbase.DoNotRetryIOException:
>>>>>>>>>>>>>> java.lang.NoSuchMethodError:
>>>>>>>>>>>
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>>>>>>>>>>>>>>          at
>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>>>>>>>>>>>>>>>          at
>>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>>>>>>>>>>>>>>>          at java.lang.Thread.run(Thread.java:745)
>>>>>>>>>>>>>>> Caused by: java.lang.NoSuchMethodError:
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>>>>>>>>>>>>>>>          at
>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>>>>>>>>>>>>>>>          ... 4 more
>>>>>>>>>>>>>>> 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
>>>>>>>>>>>>>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer(2279):
>>>>>> error
>>>>>>>>>>>> telling
>>>>>>>>>>>>>>> master we are up
>>>>>>>>>>>>>>> com.google.protobuf.ServiceException:
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>>>>>>>>>>>>>>> org.apache.hadoop.hbase.DoNotRetryIOException:
>>>>>>>>>>>>>> java.lang.NoSuchMethodError:
>>>>>>>>>>>
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>>>>>>>>>>>>>>          at
>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>>>>>>>>>>>>>>>          at
>>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>>>>>>>>>>>>>>>          at java.lang.Thread.run(Thread.java:745)
>>>>>>>>>>>>>>> Caused by: java.lang.NoSuchMethodError:
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>>>>>>>>>>>>>>>          at
>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>>>>>>>>>>>>>>>          ... 4 more
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>>>>>>>>>>>>>>>          at
>>>>>> java.security.AccessController.doPrivileged(Native
>>>>>>>>>>> Method)
>>>>>>>>>>>>>>>          at
>>>>>> javax.security.auth.Subject.doAs(Subject.java:356)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>>>>>>>>>>>>>>>          at java.lang.Thread.run(Thread.java:745)
>>>>>>>>>>>>>>> Caused by:
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>>>>>>>>>>>>>>> org.apache.hadoop.hbase.DoNotRetryIOException:
>>>>>>>>>>>>>> java.lang.NoSuchMethodError:
>>>>>>>>>>>
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>>>>>>>>>>>>>>          at
>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>>>>>>>>>>>>>>>          at
>>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>>>>>>>>>>>>>>>          at java.lang.Thread.run(Thread.java:745)
>>>>>>>>>>>>>>> Caused by: java.lang.NoSuchMethodError:
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>>>>>>>>>>>>>>>          at
>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>>>>>>>>>>>>>>>          ... 4 more
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
>>>>>>>>>>>>>>>          ... 13 more
>>>>>>>>>>>>>>> <<<
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We have hit-or-miss on this error message which keeps
>>>>>>>>>>> hbase:namespace
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> being assigned (as the RS's can never report into the
>>>>>> hmaster).
>>>>>>>>>>> This
>>>>>>>>>>>> is
>>>>>>>>>>>>>>> happening across a couple of the nodes
>>> (ubuntu-[3,4,6]). I
>>>>>> had
>>>>>>>>>>> tried
>>>>>>>>>>>> to
>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>> into this one over the weekend (and was lead to a JDK8
>>>> built
>>>>>>>>>>>>>>> jar,
>>>>>>>>>>>>>> running on
>>>>>>>>>>>>>>> JDK7), but if I look at META-INF/MANIFEST.mf in the
>>>>>>>>>>>>>> hbase-server-1.1.3.jar
>>>>>>>>>>>>>>> from central, I see it was built with 1.7.0_80 (which I
>>>>>> think
>>>>>>>>>>>>>>> means
>>>>>>>>>>>> the
>>>>>>>>>>>>>> JDK8
>>>>>>>>>>>>>>> thought is a red-herring). I'm really confused by this
>>>> one,
>>>>>>>>>>> actually.
>>>>>>>>>>>>>>> Something must be amiss here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For Phoenix-HBase-1.0:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We see the same Phoenix-Flume failures, UpsertValuesIT
>>>>>> failure,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> timeouts
>>>>>>>>>>>>>>> on ubuntu-us1. There is one crash on H10, but that
>>> might
>>>>>> just
>>>>>>>>>>>>>>> be
>>>>>>>>>>> bad
>>>>>>>>>>>>>> luck.
>>>>>>>>>>>>>>> For Phoenix-HBase-0.98:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Same UpsertValuesIT failure and failures on ubuntu-us1.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> James Taylor wrote:
>>>>>>>>>>>>>>>> Anyone know why our Jenkins builds keep failing? Is it
>>>>>>>>>>> environmental
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> there anything we can do about it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> James
>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>
>

Re: Jenkins build failures?

Posted by Sergey Soldatov <se...@gmail.com>.

James,
Sure. I will file a JIRA and check about non zero thread pool size (not
sure that would help since it's initialized in getDefaultExecutor and
always used if no other pool is provided in HTable constructor).
Thanks,
Sergey

On Mon, May 23, 2016 at 8:11 PM, James Taylor <ja...@apache.org>
wrote:

> Thanks, Sergey. Sounds like you're on to it. We could try configuring
> those tests with a non zero thread pool size so they don't
> use SynchronousQueue. Want to file a JIRA with this info so we don't lose
> track of it?
>
>     James
>
> On Tue, May 17, 2016 at 11:21 PM, Sergey Soldatov <
> sergeysoldatov@gmail.com> wrote:
>
>> Getting back to the failures with OOM/unable to create a native thread.
>> Those files have around 100 tests inside each that are running on top of
>> the phoenix. In total they generate over 2500 scans. (system.catalog,
>> sequences and regular scans over table).  The problem that on HBase side
>> all scans are going through the ThreadPoolExecutor generated in HTable.
>> Which is using SynchronousQueue as the queue. As from the javadoc for
>> ThreadPoolExecutor:
>>
>> *Direct handoffs. A good default choice for a work queue is a
>> SynchronousQueue that hands off tasks to threads without otherwise holding
>> them. Here, an attempt to queue a task will fail if no threads are
>> immediately available to run it, so a new thread will be constructed. This
>> policy avoids lockups when handling sets of requests that might have
>> internal dependencies. Direct handoffs generally require unbounded
>> maximumPoolSizes to avoid rejection of new submitted tasks. This in turn
>> admits the possibility of unbounded thread growth when commands continue
>> to
>> arrive on average faster than they can be processed.*
>>
>>
>> And actually we hit exactly last  case. But still there isl a question.
>> Since all those tests all passing correctly and the scans are completed
>> during execution (I checked that) it's not clear why all those threads are
>> still alive. If someone has a suggestion why it could happen it will be
>> interesting to listen. Otherwise I will dig deeper a bit later.  Possible
>> also it's worth to change the queue in HBase to something less aggressive
>> in terms of thread creation.
>>
>> Thanks,
>> Sergey
>>
>>
>> On Thu, May 5, 2016 at 8:24 AM, James Taylor <ja...@apache.org>
>> wrote:
>>
>> > Looks like all Jenkins builds are failing, but it seems environmental?
>> Do
>> > we need to exclude some particular kind of host(s)?
>> >
>> > On Wed, May 4, 2016 at 5:25 PM, James Taylor <ja...@apache.org>
>> > wrote:
>> >
>> > > Thanks, Sergey!
>> > >
>> > > On Wed, May 4, 2016 at 5:22 PM, Sergey Soldatov <
>> > sergeysoldatov@gmail.com>
>> > > wrote:
>> > >
>> > >> James,
>> > >> Ah, didn't notice that timeouts are not shown in the final report as
>> > >> failures. It seems that the build is using JDK 1.7 and test run OOM
>> > >> with PermGen space. Fixed in PHOENIX-2879
>> > >>
>> > >> Thanks,
>> > >> Sergey
>> > >>
>> > >> On Wed, May 4, 2016 at 1:48 PM, James Taylor <jamestaylor@apache.org
>> >
>> > >> wrote:
>> > >> > Sergey, on master branch (which is HBase 1.2):
>> > >> > https://builds.apache.org/job/Phoenix-master/1214/console
>> > >> >
>> > >> > On Wed, May 4, 2016 at 1:31 PM, Sergey Soldatov <
>> > >> sergeysoldatov@gmail.com>
>> > >> > wrote:
>> > >> >>
>> > >> >> James,
>> > >> >> Regarding HivePhoenixStoreIT. Are you talking about
>> > >> >> Phoenix-4.x-HBase-1.0  job? Last build passed it successfully.
>> > >> >>
>> > >> >>
>> > >> >> On Wed, May 4, 2016 at 10:15 AM, James Taylor <
>> > jamestaylor@apache.org>
>> > >> >> wrote:
>> > >> >> > Our Jenkins builds have improved, but we're seeing some issues:
>> > >> >> > - timeouts with the new
>> org.apache.phoenix.hive.HivePhoenixStoreIT
>> > >> test.
>> > >> >> > - consistent failure with 4.x-HBase-1.1 build. I suspect that
>> > Jenkins
>> > >> >> > build
>> > >> >> > is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for
>> quite
>> > a
>> > >> >> > while.
>> > >> >> > There's likely some changes that were made to the other Jenkins
>> > build
>> > >> >> > scripts that weren't made to this one
>> > >> >> > - flapping of
>> > >> >> > the
>> > >> >> >
>> > >>
>> >
>> org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
>> > >> >> > test in 0.98 and 1.0
>> > >> >> > - no email sent for 0.98 build (as far as I can tell)
>> > >> >> >
>> > >> >> > If folks have time to look into these, that'd be much
>> appreciated.
>> > >> >> >
>> > >> >> >     James
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <
>> > >> jamestaylor@apache.org>
>> > >> >> > wrote:
>> > >> >> >
>> > >> >> >> The defaults when tests are running are much lower than the
>> > standard
>> > >> >> >> Phoenix defaults (see QueryServicesTestImpl and
>> > >> >> >> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why
>> the
>> > >> >> >> HashJoinIT and SortMergeJoinIT tests (I think these are the
>> > >> culprits)
>> > >> >> >> do
>> > >> >> >> not seem to adhere to these (or maybe override them?). They
>> fail
>> > >> for me
>> > >> >> >> on
>> > >> >> >> my Mac, but they do pass on a Linux box. Would be awesome if
>> > someone
>> > >> >> >> could
>> > >> >> >> investigate and submit a patch to fix these.
>> > >> >> >>
>> > >> >> >> Thanks,
>> > >> >> >> James
>> > >> >> >>
>> > >> >> >> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <
>> > ndimiduk@gmail.com>
>> > >> >> >> wrote:
>> > >> >> >>
>> > >> >> >>> The default thread pool sizes for HDFS, HBase, ZK, and the
>> > Phoenix
>> > >> >> >>> client
>> > >> >> >>> are all contributing to this huge thread count.
>> > >> >> >>>
>> > >> >> >>> A good starting point would be to take a jstack of the IT
>> process
>> > >> and
>> > >> >> >>> count, group by threads with similar name. Reconfigure to
>> reduce
>> > >> all
>> > >> >> >>> those
>> > >> >> >>> groups to something like 10 each, see if the test still runs
>> > >> reliably
>> > >> >> >>> on
>> > >> >> >>> local hardware.
>> > >> >> >>>
>> > >> >> >>> On Friday, April 29, 2016, Sergey Soldatov <
>> > >> sergeysoldatov@gmail.com>
>> > >> >> >>> wrote:
>> > >> >> >>>
>> > >> >> >>> > but the way, we need to do something with those OOMs and
>> > "unable
>> > >> to
>> > >> >> >>> > create new native thread" in ITs. It's quite strange to see
>> in
>> > 10
>> > >> >> >>> > lines test such kind of failures. Especially when queries
>> for
>> > >> table
>> > >> >> >>> > with less than 10 rows generate over 2500 threads. Does
>> anybody
>> > >> know
>> > >> >> >>> > whether it's zk related issue?
>> > >> >> >>> >
>> > >> >> >>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor
>> > >> >> >>> > <jamestaylor@apache.org
>> > >> >> >>> > <javascript:;>> wrote:
>> > >> >> >>> > > A patch would be much appreciated, Sergey.
>> > >> >> >>> > >
>> > >> >> >>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <
>> > >> >> >>> > sergeysoldatov@gmail.com <javascript:;>>
>> > >> >> >>> > > wrote:
>> > >> >> >>> > >
>> > >> >> >>> > >> As for flume module - flume-ng is coming with commons-io
>> 2.1
>> > >> >> >>> > >> while
>> > >> >> >>> > >> hadoop & hbase require org.apache.commons.io.Charsets
>> which
>> > >> was
>> > >> >> >>> > >> introduced in 2.3. Easy way is to move dependency on
>> > flume-ng
>> > >> >> >>> > >> after
>> > >> >> >>> > >> the dependencies on hbase/hadoop.
>> > >> >> >>> > >>
>> > >> >> >>> > >> The last thing about ConcurrentHashMap - it definitely
>> means
>> > >> that
>> > >> >> >>> > >> the
>> > >> >> >>> > >> code was compiled with 1.8 since 1.7 returns a simple Set
>> > >> while
>> > >> >> >>> > >> 1.8
>> > >> >> >>> > >> returns KeySetView
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >>
>> > >> >> >>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <
>> > >> josh.elser@gmail.com
>> > >> >> >>> > <javascript:;>> wrote:
>> > >> >> >>> > >> > *tl;dr*
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > * I'm removing ubuntu-us1 from all pools
>> > >> >> >>> > >> > * Phoenix-Flume ITs look busted
>> > >> >> >>> > >> > * UpsertValuesIT looks busted
>> > >> >> >>> > >> > * Something is weirdly wrong with
>> Phoenix-4.x-HBase-1.1 in
>> > >> its
>> > >> >> >>> > entirety.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Details below...
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > It looks like we have a bunch of different reasons for
>> the
>> > >> >> >>> failures.
>> > >> >> >>> > >> > Starting with Phoenix-master:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >>>>
>> > >> >> >>> > >> >
>> > org.apache.phoenix.schema.NewerTableAlreadyExistsException:
>> > >> >> >>> > >> > ERROR
>> > >> >> >>> 1013
>> > >> >> >>> > >> > (42M04): Table already exists. tableName=T
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
>> > >> >> >>> > >> > <<<
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > I've seen this coming out of a few different tests (I
>> > think
>> > >> >> >>> > >> > I've
>> > >> >> >>> also
>> > >> >> >>> > run
>> > >> >> >>> > >> > into it on my own, but that's another thing)
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Some of them look like the Jenkins build host is just
>> > >> >> >>> > >> > over-taxed:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >>>>
>> > >> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> > >> >> >>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0)
>> > failed;
>> > >> >> >>> > error='Cannot
>> > >> >> >>> > >> > allocate memory' (errno=12)
>> > >> >> >>> > >> > #
>> > >> >> >>> > >> > # There is insufficient memory for the Java Runtime
>> > >> Environment
>> > >> >> >>> > >> > to
>> > >> >> >>> > >> continue.
>> > >> >> >>> > >> > # Native memory allocation (malloc) failed to allocate
>> > >> >> >>> > >> > 331350016
>> > >> >> >>> bytes
>> > >> >> >>> > >> for
>> > >> >> >>> > >> > committing reserved memory.
>> > >> >> >>> > >> > # An error report file with more information is saved
>> as:
>> > >> >> >>> > >> > #
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
>> > >> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> > >> >> >>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0)
>> > failed;
>> > >> >> >>> > error='Cannot
>> > >> >> >>> > >> > allocate memory' (errno=12)
>> > >> >> >>> > >> > #
>> > >> >> >>> > >> > <<<
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > and
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >>>>
>> > >> >> >>> > >> > -------------------------------------------------------
>> > >> >> >>> > >> >  T E S T S
>> > >> >> >>> > >> > -------------------------------------------------------
>> > >> >> >>> > >> > Build step 'Invoke top-level Maven targets' marked
>> build
>> > as
>> > >> >> >>> > >> > failure
>> > >> >> >>> > >> > <<<
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Both of these issues are limited to the host
>> "ubuntu-us1".
>> > >> Let
>> > >> >> >>> > >> > me
>> > >> >> >>> just
>> > >> >> >>> > >> > remove him from the pool (on Phoenix-master) and see if
>> > that
>> > >> >> >>> > >> > helps
>> > >> >> >>> at
>> > >> >> >>> > >> all.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > I also see some sporadic failures of some Flume tests
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >>>>
>> > >> >> >>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT
>> > >> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>> > >> elapsed:
>> > >> >> >>> 0.004
>> > >> >> >>> > sec
>> > >> >> >>> > >> > <<< FAILURE! - in
>> org.apache.phoenix.flume.PhoenixSinkIT
>> > >> >> >>> > >> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed:
>> > 0.004
>> > >> sec
>> > >> >> >>> <<<
>> > >> >> >>> > >> ERROR!
>> > >> >> >>> > >> > java.lang.RuntimeException: java.io.IOException:
>> Failed to
>> > >> save
>> > >> >> >>> > >> > in
>> > >> >> >>> any
>> > >> >> >>> > >> > storage directories while saving namespace.
>> > >> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any
>> > >> storage
>> > >> >> >>> > directories
>> > >> >> >>> > >> > while saving namespace.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT
>> > >> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>> > >> elapsed:
>> > >> >> >>> 0.005
>> > >> >> >>> > sec
>> > >> >> >>> > >> > <<< FAILURE! - in
>> > >> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT
>> > >> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT  Time
>> > >> elapsed:
>> > >> >> >>> 0.004
>> > >> >> >>> > sec
>> > >> >> >>> > >> > <<< ERROR!
>> > >> >> >>> > >> > java.lang.RuntimeException: java.io.IOException:
>> Failed to
>> > >> save
>> > >> >> >>> > >> > in
>> > >> >> >>> any
>> > >> >> >>> > >> > storage directories while saving namespace.
>> > >> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any
>> > >> storage
>> > >> >> >>> > directories
>> > >> >> >>> > >> > while saving namespace.
>> > >> >> >>> > >> > <<<
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > I'm not sure what the error message means at a glance.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > For Phoenix-HBase-1.1:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >>>>
>> > >> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> > >> >> >>> > >> java.lang.NoSuchMethodError:
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >>
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >>
>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> > >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> > >> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >>
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> > >> >> >>> > >> >         ... 4 more
>> > >> >> >>> > >> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
>> > >> >> >>> > >> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer(2279):
>> > >> error
>> > >> >> >>> > telling
>> > >> >> >>> > >> > master we are up
>> > >> >> >>> > >> > com.google.protobuf.ServiceException:
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> > >> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> > >> >> >>> > >> java.lang.NoSuchMethodError:
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >>
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >>
>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> > >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> > >> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >>
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> > >> >> >>> > >> >         ... 4 more
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>> > >> >> >>> > >> >         at
>> > >> java.security.AccessController.doPrivileged(Native
>> > >> >> >>> Method)
>> > >> >> >>> > >> >         at
>> > >> javax.security.auth.Subject.doAs(Subject.java:356)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>> > >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> > >> >> >>> > >> > Caused by:
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> > >> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> > >> >> >>> > >> java.lang.NoSuchMethodError:
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >>
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >>
>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> > >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> > >> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >>
>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> > >> >> >>> > >> >         ... 4 more
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> >
>> > >> >> >>> >
>> > >>
>> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
>> > >> >> >>> > >> >         at
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
>> > >> >> >>> > >> >         ... 13 more
>> > >> >> >>> > >> > <<<
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > We have hit-or-miss on this error message which keeps
>> > >> >> >>> hbase:namespace
>> > >> >> >>> > >> from
>> > >> >> >>> > >> > being assigned (as the RS's can never report into the
>> > >> hmaster).
>> > >> >> >>> This
>> > >> >> >>> > is
>> > >> >> >>> > >> > happening across a couple of the nodes
>> (ubuntu-[3,4,6]). I
>> > >> had
>> > >> >> >>> tried
>> > >> >> >>> > to
>> > >> >> >>> > >> look
>> > >> >> >>> > >> > into this one over the weekend (and was lead to a JDK8
>> > built
>> > >> >> >>> > >> > jar,
>> > >> >> >>> > >> running on
>> > >> >> >>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the
>> > >> >> >>> > >> hbase-server-1.1.3.jar
>> > >> >> >>> > >> > from central, I see it was built with 1.7.0_80 (which I
>> > >> think
>> > >> >> >>> > >> > means
>> > >> >> >>> > the
>> > >> >> >>> > >> JDK8
>> > >> >> >>> > >> > thought is a red-herring). I'm really confused by this
>> > one,
>> > >> >> >>> actually.
>> > >> >> >>> > >> > Something must be amiss here.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > For Phoenix-HBase-1.0:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT
>> > >> failure,
>> > >> >> >>> > >> > and
>> > >> >> >>> > >> timeouts
>> > >> >> >>> > >> > on ubuntu-us1. There is one crash on H10, but that
>> might
>> > >> just
>> > >> >> >>> > >> > be
>> > >> >> >>> bad
>> > >> >> >>> > >> luck.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > For Phoenix-HBase-0.98:
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1.
>> > >> >> >>> > >> >
>> > >> >> >>> > >> >
>> > >> >> >>> > >> > James Taylor wrote:
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> Anyone know why our Jenkins builds keep failing? Is it
>> > >> >> >>> environmental
>> > >> >> >>> > and
>> > >> >> >>> > >> >> is
>> > >> >> >>> > >> >> there anything we can do about it?
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >> Thanks,
>> > >> >> >>> > >> >> James
>> > >> >> >>> > >> >>
>> > >> >> >>> > >> >
>> > >> >> >>> > >>
>> > >> >> >>> >
>> > >> >> >>>
>> > >> >> >>
>> > >> >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Jenkins build failures?

Posted by James Taylor <ja...@apache.org>.

Thanks, Sergey. Sounds like you're on to it. We could try configuring those
tests with a non zero thread pool size so they don't
use SynchronousQueue. Want to file a JIRA with this info so we don't lose
track of it?

    James

On Tue, May 17, 2016 at 11:21 PM, Sergey Soldatov <se...@gmail.com>
wrote:

> Getting back to the failures with OOM/unable to create a native thread.
> Those files have around 100 tests inside each that are running on top of
> the phoenix. In total they generate over 2500 scans. (system.catalog,
> sequences and regular scans over table).  The problem that on HBase side
> all scans are going through the ThreadPoolExecutor generated in HTable.
> Which is using SynchronousQueue as the queue. As from the javadoc for
> ThreadPoolExecutor:
>
> *Direct handoffs. A good default choice for a work queue is a
> SynchronousQueue that hands off tasks to threads without otherwise holding
> them. Here, an attempt to queue a task will fail if no threads are
> immediately available to run it, so a new thread will be constructed. This
> policy avoids lockups when handling sets of requests that might have
> internal dependencies. Direct handoffs generally require unbounded
> maximumPoolSizes to avoid rejection of new submitted tasks. This in turn
> admits the possibility of unbounded thread growth when commands continue to
> arrive on average faster than they can be processed.*
>
> And actually we hit exactly last  case. But still there isl a question.
> Since all those tests all passing correctly and the scans are completed
> during execution (I checked that) it's not clear why all those threads are
> still alive. If someone has a suggestion why it could happen it will be
> interesting to listen. Otherwise I will dig deeper a bit later.  Possible
> also it's worth to change the queue in HBase to something less aggressive
> in terms of thread creation.
>
> Thanks,
> Sergey
>
>
> On Thu, May 5, 2016 at 8:24 AM, James Taylor <ja...@apache.org>
> wrote:
>
> > Looks like all Jenkins builds are failing, but it seems environmental? Do
> > we need to exclude some particular kind of host(s)?
> >
> > On Wed, May 4, 2016 at 5:25 PM, James Taylor <ja...@apache.org>
> > wrote:
> >
> > > Thanks, Sergey!
> > >
> > > On Wed, May 4, 2016 at 5:22 PM, Sergey Soldatov <
> > sergeysoldatov@gmail.com>
> > > wrote:
> > >
> > >> James,
> > >> Ah, didn't notice that timeouts are not shown in the final report as
> > >> failures. It seems that the build is using JDK 1.7 and test run OOM
> > >> with PermGen space. Fixed in PHOENIX-2879
> > >>
> > >> Thanks,
> > >> Sergey
> > >>
> > >> On Wed, May 4, 2016 at 1:48 PM, James Taylor <ja...@apache.org>
> > >> wrote:
> > >> > Sergey, on master branch (which is HBase 1.2):
> > >> > https://builds.apache.org/job/Phoenix-master/1214/console
> > >> >
> > >> > On Wed, May 4, 2016 at 1:31 PM, Sergey Soldatov <
> > >> sergeysoldatov@gmail.com>
> > >> > wrote:
> > >> >>
> > >> >> James,
> > >> >> Regarding HivePhoenixStoreIT. Are you talking about
> > >> >> Phoenix-4.x-HBase-1.0  job? Last build passed it successfully.
> > >> >>
> > >> >>
> > >> >> On Wed, May 4, 2016 at 10:15 AM, James Taylor <
> > jamestaylor@apache.org>
> > >> >> wrote:
> > >> >> > Our Jenkins builds have improved, but we're seeing some issues:
> > >> >> > - timeouts with the new
> org.apache.phoenix.hive.HivePhoenixStoreIT
> > >> test.
> > >> >> > - consistent failure with 4.x-HBase-1.1 build. I suspect that
> > Jenkins
> > >> >> > build
> > >> >> > is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for
> quite
> > a
> > >> >> > while.
> > >> >> > There's likely some changes that were made to the other Jenkins
> > build
> > >> >> > scripts that weren't made to this one
> > >> >> > - flapping of
> > >> >> > the
> > >> >> >
> > >>
> >
> org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
> > >> >> > test in 0.98 and 1.0
> > >> >> > - no email sent for 0.98 build (as far as I can tell)
> > >> >> >
> > >> >> > If folks have time to look into these, that'd be much
> appreciated.
> > >> >> >
> > >> >> >     James
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <
> > >> jamestaylor@apache.org>
> > >> >> > wrote:
> > >> >> >
> > >> >> >> The defaults when tests are running are much lower than the
> > standard
> > >> >> >> Phoenix defaults (see QueryServicesTestImpl and
> > >> >> >> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why
> the
> > >> >> >> HashJoinIT and SortMergeJoinIT tests (I think these are the
> > >> culprits)
> > >> >> >> do
> > >> >> >> not seem to adhere to these (or maybe override them?). They fail
> > >> for me
> > >> >> >> on
> > >> >> >> my Mac, but they do pass on a Linux box. Would be awesome if
> > someone
> > >> >> >> could
> > >> >> >> investigate and submit a patch to fix these.
> > >> >> >>
> > >> >> >> Thanks,
> > >> >> >> James
> > >> >> >>
> > >> >> >> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <
> > ndimiduk@gmail.com>
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >>> The default thread pool sizes for HDFS, HBase, ZK, and the
> > Phoenix
> > >> >> >>> client
> > >> >> >>> are all contributing to this huge thread count.
> > >> >> >>>
> > >> >> >>> A good starting point would be to take a jstack of the IT
> process
> > >> and
> > >> >> >>> count, group by threads with similar name. Reconfigure to
> reduce
> > >> all
> > >> >> >>> those
> > >> >> >>> groups to something like 10 each, see if the test still runs
> > >> reliably
> > >> >> >>> on
> > >> >> >>> local hardware.
> > >> >> >>>
> > >> >> >>> On Friday, April 29, 2016, Sergey Soldatov <
> > >> sergeysoldatov@gmail.com>
> > >> >> >>> wrote:
> > >> >> >>>
> > >> >> >>> > but the way, we need to do something with those OOMs and
> > "unable
> > >> to
> > >> >> >>> > create new native thread" in ITs. It's quite strange to see
> in
> > 10
> > >> >> >>> > lines test such kind of failures. Especially when queries for
> > >> table
> > >> >> >>> > with less than 10 rows generate over 2500 threads. Does
> anybody
> > >> know
> > >> >> >>> > whether it's zk related issue?
> > >> >> >>> >
> > >> >> >>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor
> > >> >> >>> > <jamestaylor@apache.org
> > >> >> >>> > <javascript:;>> wrote:
> > >> >> >>> > > A patch would be much appreciated, Sergey.
> > >> >> >>> > >
> > >> >> >>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <
> > >> >> >>> > sergeysoldatov@gmail.com <javascript:;>>
> > >> >> >>> > > wrote:
> > >> >> >>> > >
> > >> >> >>> > >> As for flume module - flume-ng is coming with commons-io
> 2.1
> > >> >> >>> > >> while
> > >> >> >>> > >> hadoop & hbase require org.apache.commons.io.Charsets
> which
> > >> was
> > >> >> >>> > >> introduced in 2.3. Easy way is to move dependency on
> > flume-ng
> > >> >> >>> > >> after
> > >> >> >>> > >> the dependencies on hbase/hadoop.
> > >> >> >>> > >>
> > >> >> >>> > >> The last thing about ConcurrentHashMap - it definitely
> means
> > >> that
> > >> >> >>> > >> the
> > >> >> >>> > >> code was compiled with 1.8 since 1.7 returns a simple Set
> > >> while
> > >> >> >>> > >> 1.8
> > >> >> >>> > >> returns KeySetView
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >>
> > >> >> >>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <
> > >> josh.elser@gmail.com
> > >> >> >>> > <javascript:;>> wrote:
> > >> >> >>> > >> > *tl;dr*
> > >> >> >>> > >> >
> > >> >> >>> > >> > * I'm removing ubuntu-us1 from all pools
> > >> >> >>> > >> > * Phoenix-Flume ITs look busted
> > >> >> >>> > >> > * UpsertValuesIT looks busted
> > >> >> >>> > >> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1
> in
> > >> its
> > >> >> >>> > entirety.
> > >> >> >>> > >> >
> > >> >> >>> > >> > Details below...
> > >> >> >>> > >> >
> > >> >> >>> > >> > It looks like we have a bunch of different reasons for
> the
> > >> >> >>> failures.
> > >> >> >>> > >> > Starting with Phoenix-master:
> > >> >> >>> > >> >
> > >> >> >>> > >> >>>>
> > >> >> >>> > >> >
> > org.apache.phoenix.schema.NewerTableAlreadyExistsException:
> > >> >> >>> > >> > ERROR
> > >> >> >>> 1013
> > >> >> >>> > >> > (42M04): Table already exists. tableName=T
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
> > >> >> >>> > >> > <<<
> > >> >> >>> > >> >
> > >> >> >>> > >> > I've seen this coming out of a few different tests (I
> > think
> > >> >> >>> > >> > I've
> > >> >> >>> also
> > >> >> >>> > run
> > >> >> >>> > >> > into it on my own, but that's another thing)
> > >> >> >>> > >> >
> > >> >> >>> > >> > Some of them look like the Jenkins build host is just
> > >> >> >>> > >> > over-taxed:
> > >> >> >>> > >> >
> > >> >> >>> > >> >>>>
> > >> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> > >> >> >>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0)
> > failed;
> > >> >> >>> > error='Cannot
> > >> >> >>> > >> > allocate memory' (errno=12)
> > >> >> >>> > >> > #
> > >> >> >>> > >> > # There is insufficient memory for the Java Runtime
> > >> Environment
> > >> >> >>> > >> > to
> > >> >> >>> > >> continue.
> > >> >> >>> > >> > # Native memory allocation (malloc) failed to allocate
> > >> >> >>> > >> > 331350016
> > >> >> >>> bytes
> > >> >> >>> > >> for
> > >> >> >>> > >> > committing reserved memory.
> > >> >> >>> > >> > # An error report file with more information is saved
> as:
> > >> >> >>> > >> > #
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
> > >> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> > >> >> >>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0)
> > failed;
> > >> >> >>> > error='Cannot
> > >> >> >>> > >> > allocate memory' (errno=12)
> > >> >> >>> > >> > #
> > >> >> >>> > >> > <<<
> > >> >> >>> > >> >
> > >> >> >>> > >> > and
> > >> >> >>> > >> >
> > >> >> >>> > >> >>>>
> > >> >> >>> > >> > -------------------------------------------------------
> > >> >> >>> > >> >  T E S T S
> > >> >> >>> > >> > -------------------------------------------------------
> > >> >> >>> > >> > Build step 'Invoke top-level Maven targets' marked build
> > as
> > >> >> >>> > >> > failure
> > >> >> >>> > >> > <<<
> > >> >> >>> > >> >
> > >> >> >>> > >> > Both of these issues are limited to the host
> "ubuntu-us1".
> > >> Let
> > >> >> >>> > >> > me
> > >> >> >>> just
> > >> >> >>> > >> > remove him from the pool (on Phoenix-master) and see if
> > that
> > >> >> >>> > >> > helps
> > >> >> >>> at
> > >> >> >>> > >> all.
> > >> >> >>> > >> >
> > >> >> >>> > >> > I also see some sporadic failures of some Flume tests
> > >> >> >>> > >> >
> > >> >> >>> > >> >>>>
> > >> >> >>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT
> > >> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
> > >> elapsed:
> > >> >> >>> 0.004
> > >> >> >>> > sec
> > >> >> >>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
> > >> >> >>> > >> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed:
> > 0.004
> > >> sec
> > >> >> >>> <<<
> > >> >> >>> > >> ERROR!
> > >> >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed
> to
> > >> save
> > >> >> >>> > >> > in
> > >> >> >>> any
> > >> >> >>> > >> > storage directories while saving namespace.
> > >> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any
> > >> storage
> > >> >> >>> > directories
> > >> >> >>> > >> > while saving namespace.
> > >> >> >>> > >> >
> > >> >> >>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT
> > >> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
> > >> elapsed:
> > >> >> >>> 0.005
> > >> >> >>> > sec
> > >> >> >>> > >> > <<< FAILURE! - in
> > >> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT
> > >> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT  Time
> > >> elapsed:
> > >> >> >>> 0.004
> > >> >> >>> > sec
> > >> >> >>> > >> > <<< ERROR!
> > >> >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed
> to
> > >> save
> > >> >> >>> > >> > in
> > >> >> >>> any
> > >> >> >>> > >> > storage directories while saving namespace.
> > >> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any
> > >> storage
> > >> >> >>> > directories
> > >> >> >>> > >> > while saving namespace.
> > >> >> >>> > >> > <<<
> > >> >> >>> > >> >
> > >> >> >>> > >> > I'm not sure what the error message means at a glance.
> > >> >> >>> > >> >
> > >> >> >>> > >> > For Phoenix-HBase-1.1:
> > >> >> >>> > >> >
> > >> >> >>> > >> >>>>
> > >> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> > >> >> >>> > >> java.lang.NoSuchMethodError:
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> > >> >> >>> > >> >         at
> > >> >> >>> > >>
> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> > >> >> >>> > >> >         at
> > >> >> >>> > >>
> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> > >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> > >> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> > >> >> >>> > >> >         at
> > >> >> >>> > >>
> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> > >> >> >>> > >> >         ... 4 more
> > >> >> >>> > >> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
> > >> >> >>> > >> >
> org.apache.hadoop.hbase.regionserver.HRegionServer(2279):
> > >> error
> > >> >> >>> > telling
> > >> >> >>> > >> > master we are up
> > >> >> >>> > >> > com.google.protobuf.ServiceException:
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
> > >> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> > >> >> >>> > >> java.lang.NoSuchMethodError:
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> > >> >> >>> > >> >         at
> > >> >> >>> > >>
> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> > >> >> >>> > >> >         at
> > >> >> >>> > >>
> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> > >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> > >> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> > >> >> >>> > >> >         at
> > >> >> >>> > >>
> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> > >> >> >>> > >> >         ... 4 more
> > >> >> >>> > >> >
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
> > >> >> >>> > >> >         at
> > >> java.security.AccessController.doPrivileged(Native
> > >> >> >>> Method)
> > >> >> >>> > >> >         at
> > >> javax.security.auth.Subject.doAs(Subject.java:356)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
> > >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> > >> >> >>> > >> > Caused by:
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
> > >> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> > >> >> >>> > >> java.lang.NoSuchMethodError:
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> > >> >> >>> > >> >         at
> > >> >> >>> > >>
> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> > >> >> >>> > >> >         at
> > >> >> >>> > >>
> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> > >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> > >> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> > >> >> >>> > >> >         at
> > >> >> >>> > >>
> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> > >> >> >>> > >> >         ... 4 more
> > >> >> >>> > >> >
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> >
> > >> >> >>> >
> > >>
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
> > >> >> >>> > >> >         at
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
> > >> >> >>> > >> >         ... 13 more
> > >> >> >>> > >> > <<<
> > >> >> >>> > >> >
> > >> >> >>> > >> > We have hit-or-miss on this error message which keeps
> > >> >> >>> hbase:namespace
> > >> >> >>> > >> from
> > >> >> >>> > >> > being assigned (as the RS's can never report into the
> > >> hmaster).
> > >> >> >>> This
> > >> >> >>> > is
> > >> >> >>> > >> > happening across a couple of the nodes
> (ubuntu-[3,4,6]). I
> > >> had
> > >> >> >>> tried
> > >> >> >>> > to
> > >> >> >>> > >> look
> > >> >> >>> > >> > into this one over the weekend (and was lead to a JDK8
> > built
> > >> >> >>> > >> > jar,
> > >> >> >>> > >> running on
> > >> >> >>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the
> > >> >> >>> > >> hbase-server-1.1.3.jar
> > >> >> >>> > >> > from central, I see it was built with 1.7.0_80 (which I
> > >> think
> > >> >> >>> > >> > means
> > >> >> >>> > the
> > >> >> >>> > >> JDK8
> > >> >> >>> > >> > thought is a red-herring). I'm really confused by this
> > one,
> > >> >> >>> actually.
> > >> >> >>> > >> > Something must be amiss here.
> > >> >> >>> > >> >
> > >> >> >>> > >> > For Phoenix-HBase-1.0:
> > >> >> >>> > >> >
> > >> >> >>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT
> > >> failure,
> > >> >> >>> > >> > and
> > >> >> >>> > >> timeouts
> > >> >> >>> > >> > on ubuntu-us1. There is one crash on H10, but that might
> > >> just
> > >> >> >>> > >> > be
> > >> >> >>> bad
> > >> >> >>> > >> luck.
> > >> >> >>> > >> >
> > >> >> >>> > >> > For Phoenix-HBase-0.98:
> > >> >> >>> > >> >
> > >> >> >>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1.
> > >> >> >>> > >> >
> > >> >> >>> > >> >
> > >> >> >>> > >> > James Taylor wrote:
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> Anyone know why our Jenkins builds keep failing? Is it
> > >> >> >>> environmental
> > >> >> >>> > and
> > >> >> >>> > >> >> is
> > >> >> >>> > >> >> there anything we can do about it?
> > >> >> >>> > >> >>
> > >> >> >>> > >> >> Thanks,
> > >> >> >>> > >> >> James
> > >> >> >>> > >> >>
> > >> >> >>> > >> >
> > >> >> >>> > >>
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>
> > >> >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Jenkins build failures?

Posted by Sergey Soldatov <se...@gmail.com>.

Getting back to the failures with OOM/unable to create a native thread.
Those files have around 100 tests inside each that are running on top of
the phoenix. In total they generate over 2500 scans. (system.catalog,
sequences and regular scans over table).  The problem that on HBase side
all scans are going through the ThreadPoolExecutor generated in HTable.
Which is using SynchronousQueue as the queue. As from the javadoc for
ThreadPoolExecutor:

*Direct handoffs. A good default choice for a work queue is a
SynchronousQueue that hands off tasks to threads without otherwise holding
them. Here, an attempt to queue a task will fail if no threads are
immediately available to run it, so a new thread will be constructed. This
policy avoids lockups when handling sets of requests that might have
internal dependencies. Direct handoffs generally require unbounded
maximumPoolSizes to avoid rejection of new submitted tasks. This in turn
admits the possibility of unbounded thread growth when commands continue to
arrive on average faster than they can be processed.*

And actually we hit exactly last  case. But still there isl a question.
Since all those tests all passing correctly and the scans are completed
during execution (I checked that) it's not clear why all those threads are
still alive. If someone has a suggestion why it could happen it will be
interesting to listen. Otherwise I will dig deeper a bit later.  Possible
also it's worth to change the queue in HBase to something less aggressive
in terms of thread creation.

Thanks,
Sergey


On Thu, May 5, 2016 at 8:24 AM, James Taylor <ja...@apache.org> wrote:

> Looks like all Jenkins builds are failing, but it seems environmental? Do
> we need to exclude some particular kind of host(s)?
>
> On Wed, May 4, 2016 at 5:25 PM, James Taylor <ja...@apache.org>
> wrote:
>
> > Thanks, Sergey!
> >
> > On Wed, May 4, 2016 at 5:22 PM, Sergey Soldatov <
> sergeysoldatov@gmail.com>
> > wrote:
> >
> >> James,
> >> Ah, didn't notice that timeouts are not shown in the final report as
> >> failures. It seems that the build is using JDK 1.7 and test run OOM
> >> with PermGen space. Fixed in PHOENIX-2879
> >>
> >> Thanks,
> >> Sergey
> >>
> >> On Wed, May 4, 2016 at 1:48 PM, James Taylor <ja...@apache.org>
> >> wrote:
> >> > Sergey, on master branch (which is HBase 1.2):
> >> > https://builds.apache.org/job/Phoenix-master/1214/console
> >> >
> >> > On Wed, May 4, 2016 at 1:31 PM, Sergey Soldatov <
> >> sergeysoldatov@gmail.com>
> >> > wrote:
> >> >>
> >> >> James,
> >> >> Regarding HivePhoenixStoreIT. Are you talking about
> >> >> Phoenix-4.x-HBase-1.0  job? Last build passed it successfully.
> >> >>
> >> >>
> >> >> On Wed, May 4, 2016 at 10:15 AM, James Taylor <
> jamestaylor@apache.org>
> >> >> wrote:
> >> >> > Our Jenkins builds have improved, but we're seeing some issues:
> >> >> > - timeouts with the new org.apache.phoenix.hive.HivePhoenixStoreIT
> >> test.
> >> >> > - consistent failure with 4.x-HBase-1.1 build. I suspect that
> Jenkins
> >> >> > build
> >> >> > is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for quite
> a
> >> >> > while.
> >> >> > There's likely some changes that were made to the other Jenkins
> build
> >> >> > scripts that weren't made to this one
> >> >> > - flapping of
> >> >> > the
> >> >> >
> >>
> org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
> >> >> > test in 0.98 and 1.0
> >> >> > - no email sent for 0.98 build (as far as I can tell)
> >> >> >
> >> >> > If folks have time to look into these, that'd be much appreciated.
> >> >> >
> >> >> >     James
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <
> >> jamestaylor@apache.org>
> >> >> > wrote:
> >> >> >
> >> >> >> The defaults when tests are running are much lower than the
> standard
> >> >> >> Phoenix defaults (see QueryServicesTestImpl and
> >> >> >> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why the
> >> >> >> HashJoinIT and SortMergeJoinIT tests (I think these are the
> >> culprits)
> >> >> >> do
> >> >> >> not seem to adhere to these (or maybe override them?). They fail
> >> for me
> >> >> >> on
> >> >> >> my Mac, but they do pass on a Linux box. Would be awesome if
> someone
> >> >> >> could
> >> >> >> investigate and submit a patch to fix these.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> James
> >> >> >>
> >> >> >> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <
> ndimiduk@gmail.com>
> >> >> >> wrote:
> >> >> >>
> >> >> >>> The default thread pool sizes for HDFS, HBase, ZK, and the
> Phoenix
> >> >> >>> client
> >> >> >>> are all contributing to this huge thread count.
> >> >> >>>
> >> >> >>> A good starting point would be to take a jstack of the IT process
> >> and
> >> >> >>> count, group by threads with similar name. Reconfigure to reduce
> >> all
> >> >> >>> those
> >> >> >>> groups to something like 10 each, see if the test still runs
> >> reliably
> >> >> >>> on
> >> >> >>> local hardware.
> >> >> >>>
> >> >> >>> On Friday, April 29, 2016, Sergey Soldatov <
> >> sergeysoldatov@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>> > but the way, we need to do something with those OOMs and
> "unable
> >> to
> >> >> >>> > create new native thread" in ITs. It's quite strange to see in
> 10
> >> >> >>> > lines test such kind of failures. Especially when queries for
> >> table
> >> >> >>> > with less than 10 rows generate over 2500 threads. Does anybody
> >> know
> >> >> >>> > whether it's zk related issue?
> >> >> >>> >
> >> >> >>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor
> >> >> >>> > <jamestaylor@apache.org
> >> >> >>> > <javascript:;>> wrote:
> >> >> >>> > > A patch would be much appreciated, Sergey.
> >> >> >>> > >
> >> >> >>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <
> >> >> >>> > sergeysoldatov@gmail.com <javascript:;>>
> >> >> >>> > > wrote:
> >> >> >>> > >
> >> >> >>> > >> As for flume module - flume-ng is coming with commons-io 2.1
> >> >> >>> > >> while
> >> >> >>> > >> hadoop & hbase require org.apache.commons.io.Charsets which
> >> was
> >> >> >>> > >> introduced in 2.3. Easy way is to move dependency on
> flume-ng
> >> >> >>> > >> after
> >> >> >>> > >> the dependencies on hbase/hadoop.
> >> >> >>> > >>
> >> >> >>> > >> The last thing about ConcurrentHashMap - it definitely means
> >> that
> >> >> >>> > >> the
> >> >> >>> > >> code was compiled with 1.8 since 1.7 returns a simple Set
> >> while
> >> >> >>> > >> 1.8
> >> >> >>> > >> returns KeySetView
> >> >> >>> > >>
> >> >> >>> > >>
> >> >> >>> > >>
> >> >> >>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <
> >> josh.elser@gmail.com
> >> >> >>> > <javascript:;>> wrote:
> >> >> >>> > >> > *tl;dr*
> >> >> >>> > >> >
> >> >> >>> > >> > * I'm removing ubuntu-us1 from all pools
> >> >> >>> > >> > * Phoenix-Flume ITs look busted
> >> >> >>> > >> > * UpsertValuesIT looks busted
> >> >> >>> > >> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in
> >> its
> >> >> >>> > entirety.
> >> >> >>> > >> >
> >> >> >>> > >> > Details below...
> >> >> >>> > >> >
> >> >> >>> > >> > It looks like we have a bunch of different reasons for the
> >> >> >>> failures.
> >> >> >>> > >> > Starting with Phoenix-master:
> >> >> >>> > >> >
> >> >> >>> > >> >>>>
> >> >> >>> > >> >
> org.apache.phoenix.schema.NewerTableAlreadyExistsException:
> >> >> >>> > >> > ERROR
> >> >> >>> 1013
> >> >> >>> > >> > (42M04): Table already exists. tableName=T
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
> >> >> >>> > >> > <<<
> >> >> >>> > >> >
> >> >> >>> > >> > I've seen this coming out of a few different tests (I
> think
> >> >> >>> > >> > I've
> >> >> >>> also
> >> >> >>> > run
> >> >> >>> > >> > into it on my own, but that's another thing)
> >> >> >>> > >> >
> >> >> >>> > >> > Some of them look like the Jenkins build host is just
> >> >> >>> > >> > over-taxed:
> >> >> >>> > >> >
> >> >> >>> > >> >>>>
> >> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> >> >> >>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0)
> failed;
> >> >> >>> > error='Cannot
> >> >> >>> > >> > allocate memory' (errno=12)
> >> >> >>> > >> > #
> >> >> >>> > >> > # There is insufficient memory for the Java Runtime
> >> Environment
> >> >> >>> > >> > to
> >> >> >>> > >> continue.
> >> >> >>> > >> > # Native memory allocation (malloc) failed to allocate
> >> >> >>> > >> > 331350016
> >> >> >>> bytes
> >> >> >>> > >> for
> >> >> >>> > >> > committing reserved memory.
> >> >> >>> > >> > # An error report file with more information is saved as:
> >> >> >>> > >> > #
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
> >> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> >> >> >>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0)
> failed;
> >> >> >>> > error='Cannot
> >> >> >>> > >> > allocate memory' (errno=12)
> >> >> >>> > >> > #
> >> >> >>> > >> > <<<
> >> >> >>> > >> >
> >> >> >>> > >> > and
> >> >> >>> > >> >
> >> >> >>> > >> >>>>
> >> >> >>> > >> > -------------------------------------------------------
> >> >> >>> > >> >  T E S T S
> >> >> >>> > >> > -------------------------------------------------------
> >> >> >>> > >> > Build step 'Invoke top-level Maven targets' marked build
> as
> >> >> >>> > >> > failure
> >> >> >>> > >> > <<<
> >> >> >>> > >> >
> >> >> >>> > >> > Both of these issues are limited to the host "ubuntu-us1".
> >> Let
> >> >> >>> > >> > me
> >> >> >>> just
> >> >> >>> > >> > remove him from the pool (on Phoenix-master) and see if
> that
> >> >> >>> > >> > helps
> >> >> >>> at
> >> >> >>> > >> all.
> >> >> >>> > >> >
> >> >> >>> > >> > I also see some sporadic failures of some Flume tests
> >> >> >>> > >> >
> >> >> >>> > >> >>>>
> >> >> >>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT
> >> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
> >> elapsed:
> >> >> >>> 0.004
> >> >> >>> > sec
> >> >> >>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
> >> >> >>> > >> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed:
> 0.004
> >> sec
> >> >> >>> <<<
> >> >> >>> > >> ERROR!
> >> >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to
> >> save
> >> >> >>> > >> > in
> >> >> >>> any
> >> >> >>> > >> > storage directories while saving namespace.
> >> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any
> >> storage
> >> >> >>> > directories
> >> >> >>> > >> > while saving namespace.
> >> >> >>> > >> >
> >> >> >>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT
> >> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
> >> elapsed:
> >> >> >>> 0.005
> >> >> >>> > sec
> >> >> >>> > >> > <<< FAILURE! - in
> >> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT
> >> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT  Time
> >> elapsed:
> >> >> >>> 0.004
> >> >> >>> > sec
> >> >> >>> > >> > <<< ERROR!
> >> >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to
> >> save
> >> >> >>> > >> > in
> >> >> >>> any
> >> >> >>> > >> > storage directories while saving namespace.
> >> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any
> >> storage
> >> >> >>> > directories
> >> >> >>> > >> > while saving namespace.
> >> >> >>> > >> > <<<
> >> >> >>> > >> >
> >> >> >>> > >> > I'm not sure what the error message means at a glance.
> >> >> >>> > >> >
> >> >> >>> > >> > For Phoenix-HBase-1.1:
> >> >> >>> > >> >
> >> >> >>> > >> >>>>
> >> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> >> >> >>> > >> java.lang.NoSuchMethodError:
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >> >>> > >> >         at
> >> >> >>> > >>
> >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> >> >> >>> > >> >         at
> >> >> >>> > >>
> >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> >> >> >>> > >> >         at
> >> >> >>> > >>
> >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> >> >> >>> > >> >         ... 4 more
> >> >> >>> > >> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
> >> >> >>> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer(2279):
> >> error
> >> >> >>> > telling
> >> >> >>> > >> > master we are up
> >> >> >>> > >> > com.google.protobuf.ServiceException:
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
> >> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> >> >> >>> > >> java.lang.NoSuchMethodError:
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >> >>> > >> >         at
> >> >> >>> > >>
> >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> >> >> >>> > >> >         at
> >> >> >>> > >>
> >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> >> >> >>> > >> >         at
> >> >> >>> > >>
> >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> >> >> >>> > >> >         ... 4 more
> >> >> >>> > >> >
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
> >> >> >>> > >> >         at
> >> java.security.AccessController.doPrivileged(Native
> >> >> >>> Method)
> >> >> >>> > >> >         at
> >> javax.security.auth.Subject.doAs(Subject.java:356)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
> >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >> >> >>> > >> > Caused by:
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
> >> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> >> >> >>> > >> java.lang.NoSuchMethodError:
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >> >>> > >> >         at
> >> >> >>> > >>
> >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> >> >> >>> > >> >         at
> >> >> >>> > >>
> >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> >> >> >>> > >> >         at
> >> >> >>> > >>
> >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> >> >> >>> > >> >         ... 4 more
> >> >> >>> > >> >
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> >
> >> >> >>> >
> >> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
> >> >> >>> > >> >         at
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >>
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
> >> >> >>> > >> >         ... 13 more
> >> >> >>> > >> > <<<
> >> >> >>> > >> >
> >> >> >>> > >> > We have hit-or-miss on this error message which keeps
> >> >> >>> hbase:namespace
> >> >> >>> > >> from
> >> >> >>> > >> > being assigned (as the RS's can never report into the
> >> hmaster).
> >> >> >>> This
> >> >> >>> > is
> >> >> >>> > >> > happening across a couple of the nodes (ubuntu-[3,4,6]). I
> >> had
> >> >> >>> tried
> >> >> >>> > to
> >> >> >>> > >> look
> >> >> >>> > >> > into this one over the weekend (and was lead to a JDK8
> built
> >> >> >>> > >> > jar,
> >> >> >>> > >> running on
> >> >> >>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the
> >> >> >>> > >> hbase-server-1.1.3.jar
> >> >> >>> > >> > from central, I see it was built with 1.7.0_80 (which I
> >> think
> >> >> >>> > >> > means
> >> >> >>> > the
> >> >> >>> > >> JDK8
> >> >> >>> > >> > thought is a red-herring). I'm really confused by this
> one,
> >> >> >>> actually.
> >> >> >>> > >> > Something must be amiss here.
> >> >> >>> > >> >
> >> >> >>> > >> > For Phoenix-HBase-1.0:
> >> >> >>> > >> >
> >> >> >>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT
> >> failure,
> >> >> >>> > >> > and
> >> >> >>> > >> timeouts
> >> >> >>> > >> > on ubuntu-us1. There is one crash on H10, but that might
> >> just
> >> >> >>> > >> > be
> >> >> >>> bad
> >> >> >>> > >> luck.
> >> >> >>> > >> >
> >> >> >>> > >> > For Phoenix-HBase-0.98:
> >> >> >>> > >> >
> >> >> >>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1.
> >> >> >>> > >> >
> >> >> >>> > >> >
> >> >> >>> > >> > James Taylor wrote:
> >> >> >>> > >> >>
> >> >> >>> > >> >> Anyone know why our Jenkins builds keep failing? Is it
> >> >> >>> environmental
> >> >> >>> > and
> >> >> >>> > >> >> is
> >> >> >>> > >> >> there anything we can do about it?
> >> >> >>> > >> >>
> >> >> >>> > >> >> Thanks,
> >> >> >>> > >> >> James
> >> >> >>> > >> >>
> >> >> >>> > >> >
> >> >> >>> > >>
> >> >> >>> >
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Jenkins build failures?

Posted by James Taylor <ja...@apache.org>.

Looks like all Jenkins builds are failing, but it seems environmental? Do
we need to exclude some particular kind of host(s)?

On Wed, May 4, 2016 at 5:25 PM, James Taylor <ja...@apache.org> wrote:

> Thanks, Sergey!
>
> On Wed, May 4, 2016 at 5:22 PM, Sergey Soldatov <se...@gmail.com>
> wrote:
>
>> James,
>> Ah, didn't notice that timeouts are not shown in the final report as
>> failures. It seems that the build is using JDK 1.7 and test run OOM
>> with PermGen space. Fixed in PHOENIX-2879
>>
>> Thanks,
>> Sergey
>>
>> On Wed, May 4, 2016 at 1:48 PM, James Taylor <ja...@apache.org>
>> wrote:
>> > Sergey, on master branch (which is HBase 1.2):
>> > https://builds.apache.org/job/Phoenix-master/1214/console
>> >
>> > On Wed, May 4, 2016 at 1:31 PM, Sergey Soldatov <
>> sergeysoldatov@gmail.com>
>> > wrote:
>> >>
>> >> James,
>> >> Regarding HivePhoenixStoreIT. Are you talking about
>> >> Phoenix-4.x-HBase-1.0  job? Last build passed it successfully.
>> >>
>> >>
>> >> On Wed, May 4, 2016 at 10:15 AM, James Taylor <ja...@apache.org>
>> >> wrote:
>> >> > Our Jenkins builds have improved, but we're seeing some issues:
>> >> > - timeouts with the new org.apache.phoenix.hive.HivePhoenixStoreIT
>> test.
>> >> > - consistent failure with 4.x-HBase-1.1 build. I suspect that Jenkins
>> >> > build
>> >> > is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for quite a
>> >> > while.
>> >> > There's likely some changes that were made to the other Jenkins build
>> >> > scripts that weren't made to this one
>> >> > - flapping of
>> >> > the
>> >> >
>> org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
>> >> > test in 0.98 and 1.0
>> >> > - no email sent for 0.98 build (as far as I can tell)
>> >> >
>> >> > If folks have time to look into these, that'd be much appreciated.
>> >> >
>> >> >     James
>> >> >
>> >> >
>> >> >
>> >> > On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <
>> jamestaylor@apache.org>
>> >> > wrote:
>> >> >
>> >> >> The defaults when tests are running are much lower than the standard
>> >> >> Phoenix defaults (see QueryServicesTestImpl and
>> >> >> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why the
>> >> >> HashJoinIT and SortMergeJoinIT tests (I think these are the
>> culprits)
>> >> >> do
>> >> >> not seem to adhere to these (or maybe override them?). They fail
>> for me
>> >> >> on
>> >> >> my Mac, but they do pass on a Linux box. Would be awesome if someone
>> >> >> could
>> >> >> investigate and submit a patch to fix these.
>> >> >>
>> >> >> Thanks,
>> >> >> James
>> >> >>
>> >> >> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <nd...@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >>> The default thread pool sizes for HDFS, HBase, ZK, and the Phoenix
>> >> >>> client
>> >> >>> are all contributing to this huge thread count.
>> >> >>>
>> >> >>> A good starting point would be to take a jstack of the IT process
>> and
>> >> >>> count, group by threads with similar name. Reconfigure to reduce
>> all
>> >> >>> those
>> >> >>> groups to something like 10 each, see if the test still runs
>> reliably
>> >> >>> on
>> >> >>> local hardware.
>> >> >>>
>> >> >>> On Friday, April 29, 2016, Sergey Soldatov <
>> sergeysoldatov@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>> > but the way, we need to do something with those OOMs and "unable
>> to
>> >> >>> > create new native thread" in ITs. It's quite strange to see in 10
>> >> >>> > lines test such kind of failures. Especially when queries for
>> table
>> >> >>> > with less than 10 rows generate over 2500 threads. Does anybody
>> know
>> >> >>> > whether it's zk related issue?
>> >> >>> >
>> >> >>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor
>> >> >>> > <jamestaylor@apache.org
>> >> >>> > <javascript:;>> wrote:
>> >> >>> > > A patch would be much appreciated, Sergey.
>> >> >>> > >
>> >> >>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <
>> >> >>> > sergeysoldatov@gmail.com <javascript:;>>
>> >> >>> > > wrote:
>> >> >>> > >
>> >> >>> > >> As for flume module - flume-ng is coming with commons-io 2.1
>> >> >>> > >> while
>> >> >>> > >> hadoop & hbase require org.apache.commons.io.Charsets which
>> was
>> >> >>> > >> introduced in 2.3. Easy way is to move dependency on flume-ng
>> >> >>> > >> after
>> >> >>> > >> the dependencies on hbase/hadoop.
>> >> >>> > >>
>> >> >>> > >> The last thing about ConcurrentHashMap - it definitely means
>> that
>> >> >>> > >> the
>> >> >>> > >> code was compiled with 1.8 since 1.7 returns a simple Set
>> while
>> >> >>> > >> 1.8
>> >> >>> > >> returns KeySetView
>> >> >>> > >>
>> >> >>> > >>
>> >> >>> > >>
>> >> >>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <
>> josh.elser@gmail.com
>> >> >>> > <javascript:;>> wrote:
>> >> >>> > >> > *tl;dr*
>> >> >>> > >> >
>> >> >>> > >> > * I'm removing ubuntu-us1 from all pools
>> >> >>> > >> > * Phoenix-Flume ITs look busted
>> >> >>> > >> > * UpsertValuesIT looks busted
>> >> >>> > >> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in
>> its
>> >> >>> > entirety.
>> >> >>> > >> >
>> >> >>> > >> > Details below...
>> >> >>> > >> >
>> >> >>> > >> > It looks like we have a bunch of different reasons for the
>> >> >>> failures.
>> >> >>> > >> > Starting with Phoenix-master:
>> >> >>> > >> >
>> >> >>> > >> >>>>
>> >> >>> > >> > org.apache.phoenix.schema.NewerTableAlreadyExistsException:
>> >> >>> > >> > ERROR
>> >> >>> 1013
>> >> >>> > >> > (42M04): Table already exists. tableName=T
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
>> >> >>> > >> > <<<
>> >> >>> > >> >
>> >> >>> > >> > I've seen this coming out of a few different tests (I think
>> >> >>> > >> > I've
>> >> >>> also
>> >> >>> > run
>> >> >>> > >> > into it on my own, but that's another thing)
>> >> >>> > >> >
>> >> >>> > >> > Some of them look like the Jenkins build host is just
>> >> >>> > >> > over-taxed:
>> >> >>> > >> >
>> >> >>> > >> >>>>
>> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> >> >>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0) failed;
>> >> >>> > error='Cannot
>> >> >>> > >> > allocate memory' (errno=12)
>> >> >>> > >> > #
>> >> >>> > >> > # There is insufficient memory for the Java Runtime
>> Environment
>> >> >>> > >> > to
>> >> >>> > >> continue.
>> >> >>> > >> > # Native memory allocation (malloc) failed to allocate
>> >> >>> > >> > 331350016
>> >> >>> bytes
>> >> >>> > >> for
>> >> >>> > >> > committing reserved memory.
>> >> >>> > >> > # An error report file with more information is saved as:
>> >> >>> > >> > #
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
>> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> >> >>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0) failed;
>> >> >>> > error='Cannot
>> >> >>> > >> > allocate memory' (errno=12)
>> >> >>> > >> > #
>> >> >>> > >> > <<<
>> >> >>> > >> >
>> >> >>> > >> > and
>> >> >>> > >> >
>> >> >>> > >> >>>>
>> >> >>> > >> > -------------------------------------------------------
>> >> >>> > >> >  T E S T S
>> >> >>> > >> > -------------------------------------------------------
>> >> >>> > >> > Build step 'Invoke top-level Maven targets' marked build as
>> >> >>> > >> > failure
>> >> >>> > >> > <<<
>> >> >>> > >> >
>> >> >>> > >> > Both of these issues are limited to the host "ubuntu-us1".
>> Let
>> >> >>> > >> > me
>> >> >>> just
>> >> >>> > >> > remove him from the pool (on Phoenix-master) and see if that
>> >> >>> > >> > helps
>> >> >>> at
>> >> >>> > >> all.
>> >> >>> > >> >
>> >> >>> > >> > I also see some sporadic failures of some Flume tests
>> >> >>> > >> >
>> >> >>> > >> >>>>
>> >> >>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT
>> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>> elapsed:
>> >> >>> 0.004
>> >> >>> > sec
>> >> >>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
>> >> >>> > >> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed: 0.004
>> sec
>> >> >>> <<<
>> >> >>> > >> ERROR!
>> >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to
>> save
>> >> >>> > >> > in
>> >> >>> any
>> >> >>> > >> > storage directories while saving namespace.
>> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any
>> storage
>> >> >>> > directories
>> >> >>> > >> > while saving namespace.
>> >> >>> > >> >
>> >> >>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT
>> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>> elapsed:
>> >> >>> 0.005
>> >> >>> > sec
>> >> >>> > >> > <<< FAILURE! - in
>> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT
>> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT  Time
>> elapsed:
>> >> >>> 0.004
>> >> >>> > sec
>> >> >>> > >> > <<< ERROR!
>> >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to
>> save
>> >> >>> > >> > in
>> >> >>> any
>> >> >>> > >> > storage directories while saving namespace.
>> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any
>> storage
>> >> >>> > directories
>> >> >>> > >> > while saving namespace.
>> >> >>> > >> > <<<
>> >> >>> > >> >
>> >> >>> > >> > I'm not sure what the error message means at a glance.
>> >> >>> > >> >
>> >> >>> > >> > For Phoenix-HBase-1.1:
>> >> >>> > >> >
>> >> >>> > >> >>>>
>> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> >> >>> > >> java.lang.NoSuchMethodError:
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >> >>> > >> >         at
>> >> >>> > >>
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> >> >>> > >> >         at
>> >> >>> > >>
>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> >> >>> > >> >         at
>> >> >>> > >>
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> >> >>> > >> >         ... 4 more
>> >> >>> > >> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
>> >> >>> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer(2279):
>> error
>> >> >>> > telling
>> >> >>> > >> > master we are up
>> >> >>> > >> > com.google.protobuf.ServiceException:
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> >> >>> > >> java.lang.NoSuchMethodError:
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >> >>> > >> >         at
>> >> >>> > >>
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> >> >>> > >> >         at
>> >> >>> > >>
>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> >> >>> > >> >         at
>> >> >>> > >>
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> >> >>> > >> >         ... 4 more
>> >> >>> > >> >
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>> >> >>> > >> >         at
>> java.security.AccessController.doPrivileged(Native
>> >> >>> Method)
>> >> >>> > >> >         at
>> javax.security.auth.Subject.doAs(Subject.java:356)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> >> >>> > >> > Caused by:
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> >> >>> > >> java.lang.NoSuchMethodError:
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >> >>> > >> >         at
>> >> >>> > >>
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> >> >>> > >> >         at
>> >> >>> > >>
>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> >> >>> > >> >         at
>> >> >>> > >>
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> >> >>> > >> >         ... 4 more
>> >> >>> > >> >
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> >
>> >> >>> >
>> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
>> >> >>> > >> >         at
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
>> >> >>> > >> >         ... 13 more
>> >> >>> > >> > <<<
>> >> >>> > >> >
>> >> >>> > >> > We have hit-or-miss on this error message which keeps
>> >> >>> hbase:namespace
>> >> >>> > >> from
>> >> >>> > >> > being assigned (as the RS's can never report into the
>> hmaster).
>> >> >>> This
>> >> >>> > is
>> >> >>> > >> > happening across a couple of the nodes (ubuntu-[3,4,6]). I
>> had
>> >> >>> tried
>> >> >>> > to
>> >> >>> > >> look
>> >> >>> > >> > into this one over the weekend (and was lead to a JDK8 built
>> >> >>> > >> > jar,
>> >> >>> > >> running on
>> >> >>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the
>> >> >>> > >> hbase-server-1.1.3.jar
>> >> >>> > >> > from central, I see it was built with 1.7.0_80 (which I
>> think
>> >> >>> > >> > means
>> >> >>> > the
>> >> >>> > >> JDK8
>> >> >>> > >> > thought is a red-herring). I'm really confused by this one,
>> >> >>> actually.
>> >> >>> > >> > Something must be amiss here.
>> >> >>> > >> >
>> >> >>> > >> > For Phoenix-HBase-1.0:
>> >> >>> > >> >
>> >> >>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT
>> failure,
>> >> >>> > >> > and
>> >> >>> > >> timeouts
>> >> >>> > >> > on ubuntu-us1. There is one crash on H10, but that might
>> just
>> >> >>> > >> > be
>> >> >>> bad
>> >> >>> > >> luck.
>> >> >>> > >> >
>> >> >>> > >> > For Phoenix-HBase-0.98:
>> >> >>> > >> >
>> >> >>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1.
>> >> >>> > >> >
>> >> >>> > >> >
>> >> >>> > >> > James Taylor wrote:
>> >> >>> > >> >>
>> >> >>> > >> >> Anyone know why our Jenkins builds keep failing? Is it
>> >> >>> environmental
>> >> >>> > and
>> >> >>> > >> >> is
>> >> >>> > >> >> there anything we can do about it?
>> >> >>> > >> >>
>> >> >>> > >> >> Thanks,
>> >> >>> > >> >> James
>> >> >>> > >> >>
>> >> >>> > >> >
>> >> >>> > >>
>> >> >>> >
>> >> >>>
>> >> >>
>> >> >>
>> >
>> >
>>
>
>

Re: Jenkins build failures?

Posted by James Taylor <ja...@apache.org>.

Thanks, Sergey!

On Wed, May 4, 2016 at 5:22 PM, Sergey Soldatov <se...@gmail.com>
wrote:

> James,
> Ah, didn't notice that timeouts are not shown in the final report as
> failures. It seems that the build is using JDK 1.7 and test run OOM
> with PermGen space. Fixed in PHOENIX-2879
>
> Thanks,
> Sergey
>
> On Wed, May 4, 2016 at 1:48 PM, James Taylor <ja...@apache.org>
> wrote:
> > Sergey, on master branch (which is HBase 1.2):
> > https://builds.apache.org/job/Phoenix-master/1214/console
> >
> > On Wed, May 4, 2016 at 1:31 PM, Sergey Soldatov <
> sergeysoldatov@gmail.com>
> > wrote:
> >>
> >> James,
> >> Regarding HivePhoenixStoreIT. Are you talking about
> >> Phoenix-4.x-HBase-1.0  job? Last build passed it successfully.
> >>
> >>
> >> On Wed, May 4, 2016 at 10:15 AM, James Taylor <ja...@apache.org>
> >> wrote:
> >> > Our Jenkins builds have improved, but we're seeing some issues:
> >> > - timeouts with the new org.apache.phoenix.hive.HivePhoenixStoreIT
> test.
> >> > - consistent failure with 4.x-HBase-1.1 build. I suspect that Jenkins
> >> > build
> >> > is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for quite a
> >> > while.
> >> > There's likely some changes that were made to the other Jenkins build
> >> > scripts that weren't made to this one
> >> > - flapping of
> >> > the
> >> >
> org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
> >> > test in 0.98 and 1.0
> >> > - no email sent for 0.98 build (as far as I can tell)
> >> >
> >> > If folks have time to look into these, that'd be much appreciated.
> >> >
> >> >     James
> >> >
> >> >
> >> >
> >> > On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <
> jamestaylor@apache.org>
> >> > wrote:
> >> >
> >> >> The defaults when tests are running are much lower than the standard
> >> >> Phoenix defaults (see QueryServicesTestImpl and
> >> >> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why the
> >> >> HashJoinIT and SortMergeJoinIT tests (I think these are the culprits)
> >> >> do
> >> >> not seem to adhere to these (or maybe override them?). They fail for
> me
> >> >> on
> >> >> my Mac, but they do pass on a Linux box. Would be awesome if someone
> >> >> could
> >> >> investigate and submit a patch to fix these.
> >> >>
> >> >> Thanks,
> >> >> James
> >> >>
> >> >> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <nd...@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> The default thread pool sizes for HDFS, HBase, ZK, and the Phoenix
> >> >>> client
> >> >>> are all contributing to this huge thread count.
> >> >>>
> >> >>> A good starting point would be to take a jstack of the IT process
> and
> >> >>> count, group by threads with similar name. Reconfigure to reduce all
> >> >>> those
> >> >>> groups to something like 10 each, see if the test still runs
> reliably
> >> >>> on
> >> >>> local hardware.
> >> >>>
> >> >>> On Friday, April 29, 2016, Sergey Soldatov <
> sergeysoldatov@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > but the way, we need to do something with those OOMs and "unable
> to
> >> >>> > create new native thread" in ITs. It's quite strange to see in 10
> >> >>> > lines test such kind of failures. Especially when queries for
> table
> >> >>> > with less than 10 rows generate over 2500 threads. Does anybody
> know
> >> >>> > whether it's zk related issue?
> >> >>> >
> >> >>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor
> >> >>> > <jamestaylor@apache.org
> >> >>> > <javascript:;>> wrote:
> >> >>> > > A patch would be much appreciated, Sergey.
> >> >>> > >
> >> >>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <
> >> >>> > sergeysoldatov@gmail.com <javascript:;>>
> >> >>> > > wrote:
> >> >>> > >
> >> >>> > >> As for flume module - flume-ng is coming with commons-io 2.1
> >> >>> > >> while
> >> >>> > >> hadoop & hbase require org.apache.commons.io.Charsets which was
> >> >>> > >> introduced in 2.3. Easy way is to move dependency on flume-ng
> >> >>> > >> after
> >> >>> > >> the dependencies on hbase/hadoop.
> >> >>> > >>
> >> >>> > >> The last thing about ConcurrentHashMap - it definitely means
> that
> >> >>> > >> the
> >> >>> > >> code was compiled with 1.8 since 1.7 returns a simple Set while
> >> >>> > >> 1.8
> >> >>> > >> returns KeySetView
> >> >>> > >>
> >> >>> > >>
> >> >>> > >>
> >> >>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <
> josh.elser@gmail.com
> >> >>> > <javascript:;>> wrote:
> >> >>> > >> > *tl;dr*
> >> >>> > >> >
> >> >>> > >> > * I'm removing ubuntu-us1 from all pools
> >> >>> > >> > * Phoenix-Flume ITs look busted
> >> >>> > >> > * UpsertValuesIT looks busted
> >> >>> > >> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in
> its
> >> >>> > entirety.
> >> >>> > >> >
> >> >>> > >> > Details below...
> >> >>> > >> >
> >> >>> > >> > It looks like we have a bunch of different reasons for the
> >> >>> failures.
> >> >>> > >> > Starting with Phoenix-master:
> >> >>> > >> >
> >> >>> > >> >>>>
> >> >>> > >> > org.apache.phoenix.schema.NewerTableAlreadyExistsException:
> >> >>> > >> > ERROR
> >> >>> 1013
> >> >>> > >> > (42M04): Table already exists. tableName=T
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
> >> >>> > >> > <<<
> >> >>> > >> >
> >> >>> > >> > I've seen this coming out of a few different tests (I think
> >> >>> > >> > I've
> >> >>> also
> >> >>> > run
> >> >>> > >> > into it on my own, but that's another thing)
> >> >>> > >> >
> >> >>> > >> > Some of them look like the Jenkins build host is just
> >> >>> > >> > over-taxed:
> >> >>> > >> >
> >> >>> > >> >>>>
> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> >> >>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0) failed;
> >> >>> > error='Cannot
> >> >>> > >> > allocate memory' (errno=12)
> >> >>> > >> > #
> >> >>> > >> > # There is insufficient memory for the Java Runtime
> Environment
> >> >>> > >> > to
> >> >>> > >> continue.
> >> >>> > >> > # Native memory allocation (malloc) failed to allocate
> >> >>> > >> > 331350016
> >> >>> bytes
> >> >>> > >> for
> >> >>> > >> > committing reserved memory.
> >> >>> > >> > # An error report file with more information is saved as:
> >> >>> > >> > #
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> >> >>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0) failed;
> >> >>> > error='Cannot
> >> >>> > >> > allocate memory' (errno=12)
> >> >>> > >> > #
> >> >>> > >> > <<<
> >> >>> > >> >
> >> >>> > >> > and
> >> >>> > >> >
> >> >>> > >> >>>>
> >> >>> > >> > -------------------------------------------------------
> >> >>> > >> >  T E S T S
> >> >>> > >> > -------------------------------------------------------
> >> >>> > >> > Build step 'Invoke top-level Maven targets' marked build as
> >> >>> > >> > failure
> >> >>> > >> > <<<
> >> >>> > >> >
> >> >>> > >> > Both of these issues are limited to the host "ubuntu-us1".
> Let
> >> >>> > >> > me
> >> >>> just
> >> >>> > >> > remove him from the pool (on Phoenix-master) and see if that
> >> >>> > >> > helps
> >> >>> at
> >> >>> > >> all.
> >> >>> > >> >
> >> >>> > >> > I also see some sporadic failures of some Flume tests
> >> >>> > >> >
> >> >>> > >> >>>>
> >> >>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT
> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
> elapsed:
> >> >>> 0.004
> >> >>> > sec
> >> >>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
> >> >>> > >> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed: 0.004
> sec
> >> >>> <<<
> >> >>> > >> ERROR!
> >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to
> save
> >> >>> > >> > in
> >> >>> any
> >> >>> > >> > storage directories while saving namespace.
> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any storage
> >> >>> > directories
> >> >>> > >> > while saving namespace.
> >> >>> > >> >
> >> >>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT
> >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
> elapsed:
> >> >>> 0.005
> >> >>> > sec
> >> >>> > >> > <<< FAILURE! - in
> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT
> >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT  Time
> elapsed:
> >> >>> 0.004
> >> >>> > sec
> >> >>> > >> > <<< ERROR!
> >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to
> save
> >> >>> > >> > in
> >> >>> any
> >> >>> > >> > storage directories while saving namespace.
> >> >>> > >> > Caused by: java.io.IOException: Failed to save in any storage
> >> >>> > directories
> >> >>> > >> > while saving namespace.
> >> >>> > >> > <<<
> >> >>> > >> >
> >> >>> > >> > I'm not sure what the error message means at a glance.
> >> >>> > >> >
> >> >>> > >> > For Phoenix-HBase-1.1:
> >> >>> > >> >
> >> >>> > >> >>>>
> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> >> >>> > >> java.lang.NoSuchMethodError:
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >>> > >> >         at
> >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> >> >>> > >> >         at
> >> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >> >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> >> >>> > >> >         at
> >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> >> >>> > >> >         ... 4 more
> >> >>> > >> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
> >> >>> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer(2279):
> error
> >> >>> > telling
> >> >>> > >> > master we are up
> >> >>> > >> > com.google.protobuf.ServiceException:
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> >> >>> > >> java.lang.NoSuchMethodError:
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >>> > >> >         at
> >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> >> >>> > >> >         at
> >> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >> >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> >> >>> > >> >         at
> >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> >> >>> > >> >         ... 4 more
> >> >>> > >> >
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
> >> >>> > >> >         at java.security.AccessController.doPrivileged(Native
> >> >>> Method)
> >> >>> > >> >         at javax.security.auth.Subject.doAs(Subject.java:356)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >> >>> > >> > Caused by:
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> >> >>> > >> java.lang.NoSuchMethodError:
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >>> > >> >         at
> >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> >> >>> > >> >         at
> >> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >> >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> >> >>> > >> >         at
> >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> >> >>> > >> >         ... 4 more
> >> >>> > >> >
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> >
> >> >>> >
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
> >> >>> > >> >         at
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>>
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
> >> >>> > >> >         ... 13 more
> >> >>> > >> > <<<
> >> >>> > >> >
> >> >>> > >> > We have hit-or-miss on this error message which keeps
> >> >>> hbase:namespace
> >> >>> > >> from
> >> >>> > >> > being assigned (as the RS's can never report into the
> hmaster).
> >> >>> This
> >> >>> > is
> >> >>> > >> > happening across a couple of the nodes (ubuntu-[3,4,6]). I
> had
> >> >>> tried
> >> >>> > to
> >> >>> > >> look
> >> >>> > >> > into this one over the weekend (and was lead to a JDK8 built
> >> >>> > >> > jar,
> >> >>> > >> running on
> >> >>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the
> >> >>> > >> hbase-server-1.1.3.jar
> >> >>> > >> > from central, I see it was built with 1.7.0_80 (which I think
> >> >>> > >> > means
> >> >>> > the
> >> >>> > >> JDK8
> >> >>> > >> > thought is a red-herring). I'm really confused by this one,
> >> >>> actually.
> >> >>> > >> > Something must be amiss here.
> >> >>> > >> >
> >> >>> > >> > For Phoenix-HBase-1.0:
> >> >>> > >> >
> >> >>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT
> failure,
> >> >>> > >> > and
> >> >>> > >> timeouts
> >> >>> > >> > on ubuntu-us1. There is one crash on H10, but that might just
> >> >>> > >> > be
> >> >>> bad
> >> >>> > >> luck.
> >> >>> > >> >
> >> >>> > >> > For Phoenix-HBase-0.98:
> >> >>> > >> >
> >> >>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1.
> >> >>> > >> >
> >> >>> > >> >
> >> >>> > >> > James Taylor wrote:
> >> >>> > >> >>
> >> >>> > >> >> Anyone know why our Jenkins builds keep failing? Is it
> >> >>> environmental
> >> >>> > and
> >> >>> > >> >> is
> >> >>> > >> >> there anything we can do about it?
> >> >>> > >> >>
> >> >>> > >> >> Thanks,
> >> >>> > >> >> James
> >> >>> > >> >>
> >> >>> > >> >
> >> >>> > >>
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >
> >
>

Re: Jenkins build failures?

Posted by Sergey Soldatov <se...@gmail.com>.

James,
Ah, didn't notice that timeouts are not shown in the final report as
failures. It seems that the build is using JDK 1.7 and test run OOM
with PermGen space. Fixed in PHOENIX-2879

Thanks,
Sergey

On Wed, May 4, 2016 at 1:48 PM, James Taylor <ja...@apache.org> wrote:
> Sergey, on master branch (which is HBase 1.2):
> https://builds.apache.org/job/Phoenix-master/1214/console
>
> On Wed, May 4, 2016 at 1:31 PM, Sergey Soldatov <se...@gmail.com>
> wrote:
>>
>> James,
>> Regarding HivePhoenixStoreIT. Are you talking about
>> Phoenix-4.x-HBase-1.0  job? Last build passed it successfully.
>>
>>
>> On Wed, May 4, 2016 at 10:15 AM, James Taylor <ja...@apache.org>
>> wrote:
>> > Our Jenkins builds have improved, but we're seeing some issues:
>> > - timeouts with the new org.apache.phoenix.hive.HivePhoenixStoreIT test.
>> > - consistent failure with 4.x-HBase-1.1 build. I suspect that Jenkins
>> > build
>> > is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for quite a
>> > while.
>> > There's likely some changes that were made to the other Jenkins build
>> > scripts that weren't made to this one
>> > - flapping of
>> > the
>> > org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
>> > test in 0.98 and 1.0
>> > - no email sent for 0.98 build (as far as I can tell)
>> >
>> > If folks have time to look into these, that'd be much appreciated.
>> >
>> >     James
>> >
>> >
>> >
>> > On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <ja...@apache.org>
>> > wrote:
>> >
>> >> The defaults when tests are running are much lower than the standard
>> >> Phoenix defaults (see QueryServicesTestImpl and
>> >> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why the
>> >> HashJoinIT and SortMergeJoinIT tests (I think these are the culprits)
>> >> do
>> >> not seem to adhere to these (or maybe override them?). They fail for me
>> >> on
>> >> my Mac, but they do pass on a Linux box. Would be awesome if someone
>> >> could
>> >> investigate and submit a patch to fix these.
>> >>
>> >> Thanks,
>> >> James
>> >>
>> >> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <nd...@gmail.com>
>> >> wrote:
>> >>
>> >>> The default thread pool sizes for HDFS, HBase, ZK, and the Phoenix
>> >>> client
>> >>> are all contributing to this huge thread count.
>> >>>
>> >>> A good starting point would be to take a jstack of the IT process and
>> >>> count, group by threads with similar name. Reconfigure to reduce all
>> >>> those
>> >>> groups to something like 10 each, see if the test still runs reliably
>> >>> on
>> >>> local hardware.
>> >>>
>> >>> On Friday, April 29, 2016, Sergey Soldatov <se...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > but the way, we need to do something with those OOMs and "unable to
>> >>> > create new native thread" in ITs. It's quite strange to see in 10
>> >>> > lines test such kind of failures. Especially when queries for table
>> >>> > with less than 10 rows generate over 2500 threads. Does anybody know
>> >>> > whether it's zk related issue?
>> >>> >
>> >>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor
>> >>> > <jamestaylor@apache.org
>> >>> > <javascript:;>> wrote:
>> >>> > > A patch would be much appreciated, Sergey.
>> >>> > >
>> >>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <
>> >>> > sergeysoldatov@gmail.com <javascript:;>>
>> >>> > > wrote:
>> >>> > >
>> >>> > >> As for flume module - flume-ng is coming with commons-io 2.1
>> >>> > >> while
>> >>> > >> hadoop & hbase require org.apache.commons.io.Charsets which was
>> >>> > >> introduced in 2.3. Easy way is to move dependency on flume-ng
>> >>> > >> after
>> >>> > >> the dependencies on hbase/hadoop.
>> >>> > >>
>> >>> > >> The last thing about ConcurrentHashMap - it definitely means that
>> >>> > >> the
>> >>> > >> code was compiled with 1.8 since 1.7 returns a simple Set while
>> >>> > >> 1.8
>> >>> > >> returns KeySetView
>> >>> > >>
>> >>> > >>
>> >>> > >>
>> >>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <josh.elser@gmail.com
>> >>> > <javascript:;>> wrote:
>> >>> > >> > *tl;dr*
>> >>> > >> >
>> >>> > >> > * I'm removing ubuntu-us1 from all pools
>> >>> > >> > * Phoenix-Flume ITs look busted
>> >>> > >> > * UpsertValuesIT looks busted
>> >>> > >> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in its
>> >>> > entirety.
>> >>> > >> >
>> >>> > >> > Details below...
>> >>> > >> >
>> >>> > >> > It looks like we have a bunch of different reasons for the
>> >>> failures.
>> >>> > >> > Starting with Phoenix-master:
>> >>> > >> >
>> >>> > >> >>>>
>> >>> > >> > org.apache.phoenix.schema.NewerTableAlreadyExistsException:
>> >>> > >> > ERROR
>> >>> 1013
>> >>> > >> > (42M04): Table already exists. tableName=T
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
>> >>> > >> > <<<
>> >>> > >> >
>> >>> > >> > I've seen this coming out of a few different tests (I think
>> >>> > >> > I've
>> >>> also
>> >>> > run
>> >>> > >> > into it on my own, but that's another thing)
>> >>> > >> >
>> >>> > >> > Some of them look like the Jenkins build host is just
>> >>> > >> > over-taxed:
>> >>> > >> >
>> >>> > >> >>>>
>> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> >>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0) failed;
>> >>> > error='Cannot
>> >>> > >> > allocate memory' (errno=12)
>> >>> > >> > #
>> >>> > >> > # There is insufficient memory for the Java Runtime Environment
>> >>> > >> > to
>> >>> > >> continue.
>> >>> > >> > # Native memory allocation (malloc) failed to allocate
>> >>> > >> > 331350016
>> >>> bytes
>> >>> > >> for
>> >>> > >> > committing reserved memory.
>> >>> > >> > # An error report file with more information is saved as:
>> >>> > >> > #
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
>> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> >>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0) failed;
>> >>> > error='Cannot
>> >>> > >> > allocate memory' (errno=12)
>> >>> > >> > #
>> >>> > >> > <<<
>> >>> > >> >
>> >>> > >> > and
>> >>> > >> >
>> >>> > >> >>>>
>> >>> > >> > -------------------------------------------------------
>> >>> > >> >  T E S T S
>> >>> > >> > -------------------------------------------------------
>> >>> > >> > Build step 'Invoke top-level Maven targets' marked build as
>> >>> > >> > failure
>> >>> > >> > <<<
>> >>> > >> >
>> >>> > >> > Both of these issues are limited to the host "ubuntu-us1". Let
>> >>> > >> > me
>> >>> just
>> >>> > >> > remove him from the pool (on Phoenix-master) and see if that
>> >>> > >> > helps
>> >>> at
>> >>> > >> all.
>> >>> > >> >
>> >>> > >> > I also see some sporadic failures of some Flume tests
>> >>> > >> >
>> >>> > >> >>>>
>> >>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT
>> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>> >>> 0.004
>> >>> > sec
>> >>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
>> >>> > >> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed: 0.004 sec
>> >>> <<<
>> >>> > >> ERROR!
>> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save
>> >>> > >> > in
>> >>> any
>> >>> > >> > storage directories while saving namespace.
>> >>> > >> > Caused by: java.io.IOException: Failed to save in any storage
>> >>> > directories
>> >>> > >> > while saving namespace.
>> >>> > >> >
>> >>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT
>> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>> >>> 0.005
>> >>> > sec
>> >>> > >> > <<< FAILURE! - in
>> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT
>> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT  Time elapsed:
>> >>> 0.004
>> >>> > sec
>> >>> > >> > <<< ERROR!
>> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save
>> >>> > >> > in
>> >>> any
>> >>> > >> > storage directories while saving namespace.
>> >>> > >> > Caused by: java.io.IOException: Failed to save in any storage
>> >>> > directories
>> >>> > >> > while saving namespace.
>> >>> > >> > <<<
>> >>> > >> >
>> >>> > >> > I'm not sure what the error message means at a glance.
>> >>> > >> >
>> >>> > >> > For Phoenix-HBase-1.1:
>> >>> > >> >
>> >>> > >> >>>>
>> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> >>> > >> java.lang.NoSuchMethodError:
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >>> > >> >         at
>> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> >>> > >> >         at
>> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> >>> > >> > Caused by: java.lang.NoSuchMethodError:
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> >>> > >> >         at
>> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> >>> > >> >         ... 4 more
>> >>> > >> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
>> >>> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer(2279): error
>> >>> > telling
>> >>> > >> > master we are up
>> >>> > >> > com.google.protobuf.ServiceException:
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> >>> > >> java.lang.NoSuchMethodError:
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >>> > >> >         at
>> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> >>> > >> >         at
>> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> >>> > >> > Caused by: java.lang.NoSuchMethodError:
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> >>> > >> >         at
>> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> >>> > >> >         ... 4 more
>> >>> > >> >
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>> >>> > >> >         at java.security.AccessController.doPrivileged(Native
>> >>> Method)
>> >>> > >> >         at javax.security.auth.Subject.doAs(Subject.java:356)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> >>> > >> > Caused by:
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> >>> > >> java.lang.NoSuchMethodError:
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >>> > >> >         at
>> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> >>> > >> >         at
>> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
>> >>> > >> > Caused by: java.lang.NoSuchMethodError:
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> >>> > >> >         at
>> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> >>> > >> >         ... 4 more
>> >>> > >> >
>> >>> > >> >         at
>> >>> > >> >
>> >>> >
>> >>> > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
>> >>> > >> >         at
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
>> >>> > >> >         ... 13 more
>> >>> > >> > <<<
>> >>> > >> >
>> >>> > >> > We have hit-or-miss on this error message which keeps
>> >>> hbase:namespace
>> >>> > >> from
>> >>> > >> > being assigned (as the RS's can never report into the hmaster).
>> >>> This
>> >>> > is
>> >>> > >> > happening across a couple of the nodes (ubuntu-[3,4,6]). I had
>> >>> tried
>> >>> > to
>> >>> > >> look
>> >>> > >> > into this one over the weekend (and was lead to a JDK8 built
>> >>> > >> > jar,
>> >>> > >> running on
>> >>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the
>> >>> > >> hbase-server-1.1.3.jar
>> >>> > >> > from central, I see it was built with 1.7.0_80 (which I think
>> >>> > >> > means
>> >>> > the
>> >>> > >> JDK8
>> >>> > >> > thought is a red-herring). I'm really confused by this one,
>> >>> actually.
>> >>> > >> > Something must be amiss here.
>> >>> > >> >
>> >>> > >> > For Phoenix-HBase-1.0:
>> >>> > >> >
>> >>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT failure,
>> >>> > >> > and
>> >>> > >> timeouts
>> >>> > >> > on ubuntu-us1. There is one crash on H10, but that might just
>> >>> > >> > be
>> >>> bad
>> >>> > >> luck.
>> >>> > >> >
>> >>> > >> > For Phoenix-HBase-0.98:
>> >>> > >> >
>> >>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1.
>> >>> > >> >
>> >>> > >> >
>> >>> > >> > James Taylor wrote:
>> >>> > >> >>
>> >>> > >> >> Anyone know why our Jenkins builds keep failing? Is it
>> >>> environmental
>> >>> > and
>> >>> > >> >> is
>> >>> > >> >> there anything we can do about it?
>> >>> > >> >>
>> >>> > >> >> Thanks,
>> >>> > >> >> James
>> >>> > >> >>
>> >>> > >> >
>> >>> > >>
>> >>> >
>> >>>
>> >>
>> >>
>
>

Re: Jenkins build failures?

Posted by James Taylor <ja...@apache.org>.

Sergey, on master branch (which is HBase 1.2):
https://builds.apache.org/job/Phoenix-master/1214/console

On Wed, May 4, 2016 at 1:31 PM, Sergey Soldatov <se...@gmail.com>
wrote:

> James,
> Regarding HivePhoenixStoreIT. Are you talking about
> Phoenix-4.x-HBase-1.0  job? Last build passed it successfully.
>
>
> On Wed, May 4, 2016 at 10:15 AM, James Taylor <ja...@apache.org>
> wrote:
> > Our Jenkins builds have improved, but we're seeing some issues:
> > - timeouts with the new org.apache.phoenix.hive.HivePhoenixStoreIT test.
> > - consistent failure with 4.x-HBase-1.1 build. I suspect that Jenkins
> build
> > is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for quite a
> while.
> > There's likely some changes that were made to the other Jenkins build
> > scripts that weren't made to this one
> > - flapping of
> > the
> org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
> > test in 0.98 and 1.0
> > - no email sent for 0.98 build (as far as I can tell)
> >
> > If folks have time to look into these, that'd be much appreciated.
> >
> >     James
> >
> >
> >
> > On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <ja...@apache.org>
> > wrote:
> >
> >> The defaults when tests are running are much lower than the standard
> >> Phoenix defaults (see QueryServicesTestImpl and
> >> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why the
> >> HashJoinIT and SortMergeJoinIT tests (I think these are the culprits) do
> >> not seem to adhere to these (or maybe override them?). They fail for me
> on
> >> my Mac, but they do pass on a Linux box. Would be awesome if someone
> could
> >> investigate and submit a patch to fix these.
> >>
> >> Thanks,
> >> James
> >>
> >> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <nd...@gmail.com>
> wrote:
> >>
> >>> The default thread pool sizes for HDFS, HBase, ZK, and the Phoenix
> client
> >>> are all contributing to this huge thread count.
> >>>
> >>> A good starting point would be to take a jstack of the IT process and
> >>> count, group by threads with similar name. Reconfigure to reduce all
> those
> >>> groups to something like 10 each, see if the test still runs reliably
> on
> >>> local hardware.
> >>>
> >>> On Friday, April 29, 2016, Sergey Soldatov <se...@gmail.com>
> >>> wrote:
> >>>
> >>> > but the way, we need to do something with those OOMs and "unable to
> >>> > create new native thread" in ITs. It's quite strange to see in 10
> >>> > lines test such kind of failures. Especially when queries for table
> >>> > with less than 10 rows generate over 2500 threads. Does anybody know
> >>> > whether it's zk related issue?
> >>> >
> >>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor <
> jamestaylor@apache.org
> >>> > <javascript:;>> wrote:
> >>> > > A patch would be much appreciated, Sergey.
> >>> > >
> >>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <
> >>> > sergeysoldatov@gmail.com <javascript:;>>
> >>> > > wrote:
> >>> > >
> >>> > >> As for flume module - flume-ng is coming with commons-io 2.1 while
> >>> > >> hadoop & hbase require org.apache.commons.io.Charsets which was
> >>> > >> introduced in 2.3. Easy way is to move dependency on flume-ng
> after
> >>> > >> the dependencies on hbase/hadoop.
> >>> > >>
> >>> > >> The last thing about ConcurrentHashMap - it definitely means that
> the
> >>> > >> code was compiled with 1.8 since 1.7 returns a simple Set while
> 1.8
> >>> > >> returns KeySetView
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <josh.elser@gmail.com
> >>> > <javascript:;>> wrote:
> >>> > >> > *tl;dr*
> >>> > >> >
> >>> > >> > * I'm removing ubuntu-us1 from all pools
> >>> > >> > * Phoenix-Flume ITs look busted
> >>> > >> > * UpsertValuesIT looks busted
> >>> > >> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in its
> >>> > entirety.
> >>> > >> >
> >>> > >> > Details below...
> >>> > >> >
> >>> > >> > It looks like we have a bunch of different reasons for the
> >>> failures.
> >>> > >> > Starting with Phoenix-master:
> >>> > >> >
> >>> > >> >>>>
> >>> > >> > org.apache.phoenix.schema.NewerTableAlreadyExistsException:
> ERROR
> >>> 1013
> >>> > >> > (42M04): Table already exists. tableName=T
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
> >>> > >> > <<<
> >>> > >> >
> >>> > >> > I've seen this coming out of a few different tests (I think I've
> >>> also
> >>> > run
> >>> > >> > into it on my own, but that's another thing)
> >>> > >> >
> >>> > >> > Some of them look like the Jenkins build host is just
> over-taxed:
> >>> > >> >
> >>> > >> >>>>
> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> >>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0) failed;
> >>> > error='Cannot
> >>> > >> > allocate memory' (errno=12)
> >>> > >> > #
> >>> > >> > # There is insufficient memory for the Java Runtime Environment
> to
> >>> > >> continue.
> >>> > >> > # Native memory allocation (malloc) failed to allocate 331350016
> >>> bytes
> >>> > >> for
> >>> > >> > committing reserved memory.
> >>> > >> > # An error report file with more information is saved as:
> >>> > >> > #
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> >>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0) failed;
> >>> > error='Cannot
> >>> > >> > allocate memory' (errno=12)
> >>> > >> > #
> >>> > >> > <<<
> >>> > >> >
> >>> > >> > and
> >>> > >> >
> >>> > >> >>>>
> >>> > >> > -------------------------------------------------------
> >>> > >> >  T E S T S
> >>> > >> > -------------------------------------------------------
> >>> > >> > Build step 'Invoke top-level Maven targets' marked build as
> failure
> >>> > >> > <<<
> >>> > >> >
> >>> > >> > Both of these issues are limited to the host "ubuntu-us1". Let
> me
> >>> just
> >>> > >> > remove him from the pool (on Phoenix-master) and see if that
> helps
> >>> at
> >>> > >> all.
> >>> > >> >
> >>> > >> > I also see some sporadic failures of some Flume tests
> >>> > >> >
> >>> > >> >>>>
> >>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT
> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> >>> 0.004
> >>> > sec
> >>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
> >>> > >> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed: 0.004 sec
> >>> <<<
> >>> > >> ERROR!
> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save
> in
> >>> any
> >>> > >> > storage directories while saving namespace.
> >>> > >> > Caused by: java.io.IOException: Failed to save in any storage
> >>> > directories
> >>> > >> > while saving namespace.
> >>> > >> >
> >>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT
> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> >>> 0.005
> >>> > sec
> >>> > >> > <<< FAILURE! - in
> org.apache.phoenix.flume.RegexEventSerializerIT
> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT  Time elapsed:
> >>> 0.004
> >>> > sec
> >>> > >> > <<< ERROR!
> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save
> in
> >>> any
> >>> > >> > storage directories while saving namespace.
> >>> > >> > Caused by: java.io.IOException: Failed to save in any storage
> >>> > directories
> >>> > >> > while saving namespace.
> >>> > >> > <<<
> >>> > >> >
> >>> > >> > I'm not sure what the error message means at a glance.
> >>> > >> >
> >>> > >> > For Phoenix-HBase-1.1:
> >>> > >> >
> >>> > >> >>>>
> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> >>> > >> java.lang.NoSuchMethodError:
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >>> > >> >         at
> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> >>> > >> >         at
> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >>> > >> >         at
> >>> > >> >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> >>> > >> >         at
> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> >>> > >> >         ... 4 more
> >>> > >> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
> >>> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer(2279): error
> >>> > telling
> >>> > >> > master we are up
> >>> > >> > com.google.protobuf.ServiceException:
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> >>> > >> java.lang.NoSuchMethodError:
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >>> > >> >         at
> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> >>> > >> >         at
> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >>> > >> >         at
> >>> > >> >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> >>> > >> >         at
> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> >>> > >> >         ... 4 more
> >>> > >> >
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
> >>> > >> >         at java.security.AccessController.doPrivileged(Native
> >>> Method)
> >>> > >> >         at javax.security.auth.Subject.doAs(Subject.java:356)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >>> > >> > Caused by:
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
> >>> > >> java.lang.NoSuchMethodError:
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >>> > >> >         at
> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
> >>> > >> >         at
> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> >>> > >> >         at
> >>> > >> >
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> >>> > >> >         at java.lang.Thread.run(Thread.java:745)
> >>> > >> > Caused by: java.lang.NoSuchMethodError:
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> >>> > >> >         at
> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
> >>> > >> >         ... 4 more
> >>> > >> >
> >>> > >> >         at
> >>> > >> >
> >>> >
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
> >>> > >> >         at
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
> >>> > >> >         ... 13 more
> >>> > >> > <<<
> >>> > >> >
> >>> > >> > We have hit-or-miss on this error message which keeps
> >>> hbase:namespace
> >>> > >> from
> >>> > >> > being assigned (as the RS's can never report into the hmaster).
> >>> This
> >>> > is
> >>> > >> > happening across a couple of the nodes (ubuntu-[3,4,6]). I had
> >>> tried
> >>> > to
> >>> > >> look
> >>> > >> > into this one over the weekend (and was lead to a JDK8 built
> jar,
> >>> > >> running on
> >>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the
> >>> > >> hbase-server-1.1.3.jar
> >>> > >> > from central, I see it was built with 1.7.0_80 (which I think
> means
> >>> > the
> >>> > >> JDK8
> >>> > >> > thought is a red-herring). I'm really confused by this one,
> >>> actually.
> >>> > >> > Something must be amiss here.
> >>> > >> >
> >>> > >> > For Phoenix-HBase-1.0:
> >>> > >> >
> >>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT failure,
> and
> >>> > >> timeouts
> >>> > >> > on ubuntu-us1. There is one crash on H10, but that might just be
> >>> bad
> >>> > >> luck.
> >>> > >> >
> >>> > >> > For Phoenix-HBase-0.98:
> >>> > >> >
> >>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1.
> >>> > >> >
> >>> > >> >
> >>> > >> > James Taylor wrote:
> >>> > >> >>
> >>> > >> >> Anyone know why our Jenkins builds keep failing? Is it
> >>> environmental
> >>> > and
> >>> > >> >> is
> >>> > >> >> there anything we can do about it?
> >>> > >> >>
> >>> > >> >> Thanks,
> >>> > >> >> James
> >>> > >> >>
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> >>
> >>
>

Re: Jenkins build failures?

Posted by Sergey Soldatov <se...@gmail.com>.

James,
Regarding HivePhoenixStoreIT. Are you talking about
Phoenix-4.x-HBase-1.0  job? Last build passed it successfully.


On Wed, May 4, 2016 at 10:15 AM, James Taylor <ja...@apache.org> wrote:
> Our Jenkins builds have improved, but we're seeing some issues:
> - timeouts with the new org.apache.phoenix.hive.HivePhoenixStoreIT test.
> - consistent failure with 4.x-HBase-1.1 build. I suspect that Jenkins build
> is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for quite a while.
> There's likely some changes that were made to the other Jenkins build
> scripts that weren't made to this one
> - flapping of
> the org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex
> test in 0.98 and 1.0
> - no email sent for 0.98 build (as far as I can tell)
>
> If folks have time to look into these, that'd be much appreciated.
>
>     James
>
>
>
> On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <ja...@apache.org>
> wrote:
>
>> The defaults when tests are running are much lower than the standard
>> Phoenix defaults (see QueryServicesTestImpl and
>> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why the
>> HashJoinIT and SortMergeJoinIT tests (I think these are the culprits) do
>> not seem to adhere to these (or maybe override them?). They fail for me on
>> my Mac, but they do pass on a Linux box. Would be awesome if someone could
>> investigate and submit a patch to fix these.
>>
>> Thanks,
>> James
>>
>> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>>
>>> The default thread pool sizes for HDFS, HBase, ZK, and the Phoenix client
>>> are all contributing to this huge thread count.
>>>
>>> A good starting point would be to take a jstack of the IT process and
>>> count, group by threads with similar name. Reconfigure to reduce all those
>>> groups to something like 10 each, see if the test still runs reliably on
>>> local hardware.
>>>
>>> On Friday, April 29, 2016, Sergey Soldatov <se...@gmail.com>
>>> wrote:
>>>
>>> > but the way, we need to do something with those OOMs and "unable to
>>> > create new native thread" in ITs. It's quite strange to see in 10
>>> > lines test such kind of failures. Especially when queries for table
>>> > with less than 10 rows generate over 2500 threads. Does anybody know
>>> > whether it's zk related issue?
>>> >
>>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor <jamestaylor@apache.org
>>> > <javascript:;>> wrote:
>>> > > A patch would be much appreciated, Sergey.
>>> > >
>>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <
>>> > sergeysoldatov@gmail.com <javascript:;>>
>>> > > wrote:
>>> > >
>>> > >> As for flume module - flume-ng is coming with commons-io 2.1 while
>>> > >> hadoop & hbase require org.apache.commons.io.Charsets which was
>>> > >> introduced in 2.3. Easy way is to move dependency on flume-ng after
>>> > >> the dependencies on hbase/hadoop.
>>> > >>
>>> > >> The last thing about ConcurrentHashMap - it definitely means that the
>>> > >> code was compiled with 1.8 since 1.7 returns a simple Set while 1.8
>>> > >> returns KeySetView
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <josh.elser@gmail.com
>>> > <javascript:;>> wrote:
>>> > >> > *tl;dr*
>>> > >> >
>>> > >> > * I'm removing ubuntu-us1 from all pools
>>> > >> > * Phoenix-Flume ITs look busted
>>> > >> > * UpsertValuesIT looks busted
>>> > >> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in its
>>> > entirety.
>>> > >> >
>>> > >> > Details below...
>>> > >> >
>>> > >> > It looks like we have a bunch of different reasons for the
>>> failures.
>>> > >> > Starting with Phoenix-master:
>>> > >> >
>>> > >> >>>>
>>> > >> > org.apache.phoenix.schema.NewerTableAlreadyExistsException: ERROR
>>> 1013
>>> > >> > (42M04): Table already exists. tableName=T
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
>>> > >> > <<<
>>> > >> >
>>> > >> > I've seen this coming out of a few different tests (I think I've
>>> also
>>> > run
>>> > >> > into it on my own, but that's another thing)
>>> > >> >
>>> > >> > Some of them look like the Jenkins build host is just over-taxed:
>>> > >> >
>>> > >> >>>>
>>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0) failed;
>>> > error='Cannot
>>> > >> > allocate memory' (errno=12)
>>> > >> > #
>>> > >> > # There is insufficient memory for the Java Runtime Environment to
>>> > >> continue.
>>> > >> > # Native memory allocation (malloc) failed to allocate 331350016
>>> bytes
>>> > >> for
>>> > >> > committing reserved memory.
>>> > >> > # An error report file with more information is saved as:
>>> > >> > #
>>> > >> >
>>> > >>
>>> >
>>> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
>>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0) failed;
>>> > error='Cannot
>>> > >> > allocate memory' (errno=12)
>>> > >> > #
>>> > >> > <<<
>>> > >> >
>>> > >> > and
>>> > >> >
>>> > >> >>>>
>>> > >> > -------------------------------------------------------
>>> > >> >  T E S T S
>>> > >> > -------------------------------------------------------
>>> > >> > Build step 'Invoke top-level Maven targets' marked build as failure
>>> > >> > <<<
>>> > >> >
>>> > >> > Both of these issues are limited to the host "ubuntu-us1". Let me
>>> just
>>> > >> > remove him from the pool (on Phoenix-master) and see if that helps
>>> at
>>> > >> all.
>>> > >> >
>>> > >> > I also see some sporadic failures of some Flume tests
>>> > >> >
>>> > >> >>>>
>>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT
>>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>>> 0.004
>>> > sec
>>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
>>> > >> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed: 0.004 sec
>>> <<<
>>> > >> ERROR!
>>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save in
>>> any
>>> > >> > storage directories while saving namespace.
>>> > >> > Caused by: java.io.IOException: Failed to save in any storage
>>> > directories
>>> > >> > while saving namespace.
>>> > >> >
>>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT
>>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>>> 0.005
>>> > sec
>>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.RegexEventSerializerIT
>>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT  Time elapsed:
>>> 0.004
>>> > sec
>>> > >> > <<< ERROR!
>>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save in
>>> any
>>> > >> > storage directories while saving namespace.
>>> > >> > Caused by: java.io.IOException: Failed to save in any storage
>>> > directories
>>> > >> > while saving namespace.
>>> > >> > <<<
>>> > >> >
>>> > >> > I'm not sure what the error message means at a glance.
>>> > >> >
>>> > >> > For Phoenix-HBase-1.1:
>>> > >> >
>>> > >> >>>>
>>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>>> > >> java.lang.NoSuchMethodError:
>>> > >> >
>>> > >>
>>> >
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>> > >> >         at
>>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>>> > >> >         at
>>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>>> > >> >         at
>>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>>> > >> >         at java.lang.Thread.run(Thread.java:745)
>>> > >> > Caused by: java.lang.NoSuchMethodError:
>>> > >> >
>>> > >>
>>> >
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>>> > >> >         at
>>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>>> > >> >         ... 4 more
>>> > >> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
>>> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer(2279): error
>>> > telling
>>> > >> > master we are up
>>> > >> > com.google.protobuf.ServiceException:
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>>> > >> java.lang.NoSuchMethodError:
>>> > >> >
>>> > >>
>>> >
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>> > >> >         at
>>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>>> > >> >         at
>>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>>> > >> >         at
>>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>>> > >> >         at java.lang.Thread.run(Thread.java:745)
>>> > >> > Caused by: java.lang.NoSuchMethodError:
>>> > >> >
>>> > >>
>>> >
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>>> > >> >         at
>>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>>> > >> >         ... 4 more
>>> > >> >
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>>> > >> >         at java.security.AccessController.doPrivileged(Native
>>> Method)
>>> > >> >         at javax.security.auth.Subject.doAs(Subject.java:356)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>>> > >> >         at java.lang.Thread.run(Thread.java:745)
>>> > >> > Caused by:
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException:
>>> > >> java.lang.NoSuchMethodError:
>>> > >> >
>>> > >>
>>> >
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>> > >> >         at
>>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>>> > >> >         at
>>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>>> > >> >         at
>>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>>> > >> >         at java.lang.Thread.run(Thread.java:745)
>>> > >> > Caused by: java.lang.NoSuchMethodError:
>>> > >> >
>>> > >>
>>> >
>>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>>> > >> >         at
>>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>>> > >> >         ... 4 more
>>> > >> >
>>> > >> >         at
>>> > >> >
>>> > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
>>> > >> >         at
>>> > >> >
>>> > >>
>>> >
>>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
>>> > >> >         ... 13 more
>>> > >> > <<<
>>> > >> >
>>> > >> > We have hit-or-miss on this error message which keeps
>>> hbase:namespace
>>> > >> from
>>> > >> > being assigned (as the RS's can never report into the hmaster).
>>> This
>>> > is
>>> > >> > happening across a couple of the nodes (ubuntu-[3,4,6]). I had
>>> tried
>>> > to
>>> > >> look
>>> > >> > into this one over the weekend (and was lead to a JDK8 built jar,
>>> > >> running on
>>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the
>>> > >> hbase-server-1.1.3.jar
>>> > >> > from central, I see it was built with 1.7.0_80 (which I think means
>>> > the
>>> > >> JDK8
>>> > >> > thought is a red-herring). I'm really confused by this one,
>>> actually.
>>> > >> > Something must be amiss here.
>>> > >> >
>>> > >> > For Phoenix-HBase-1.0:
>>> > >> >
>>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT failure, and
>>> > >> timeouts
>>> > >> > on ubuntu-us1. There is one crash on H10, but that might just be
>>> bad
>>> > >> luck.
>>> > >> >
>>> > >> > For Phoenix-HBase-0.98:
>>> > >> >
>>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1.
>>> > >> >
>>> > >> >
>>> > >> > James Taylor wrote:
>>> > >> >>
>>> > >> >> Anyone know why our Jenkins builds keep failing? Is it
>>> environmental
>>> > and
>>> > >> >> is
>>> > >> >> there anything we can do about it?
>>> > >> >>
>>> > >> >> Thanks,
>>> > >> >> James
>>> > >> >>
>>> > >> >
>>> > >>
>>> >
>>>
>>
>>