You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Terry P." <te...@gmail.com> on 2013/08/02 23:41:38 UTC

How to pre-split a table for UUID rowkeys

Greetings folks,
Have a bit of a non-typical Accumulo use case using Accumulo as a backend
data store for a search index to provide fault tolerance should the index
get corrupted.  Max docs stored in Accumulo will be under 1 billion at full
volume.

The search index is used to "find" the data a user is interested in, and
the search index then retrieves the document from Accumulo using its RowKey
which was gotten from the search index.  The RowKey is a java.util.UUID
string that has had the '-' dashes stripped out.

I have a 3 node cluster and as a quick test have ingested 5 million 1K
documents into it, yet they all went to a single TabletServer.  I was kind
of surprised -- I knew this would be the case for a row key using a
monotonically increasing number, but I thought with a UUID type rowkey the
entries would have been spread across the TabletServers at least some, even
without pre-splitting the table.

Clearly my understanding of how Accumulo spreads the data out is lacking.
 Can anyone shed more light on it?  And possibly recommend a table split
strategy for a 3-node cluster such as I have described?

Many thanks in advance,
Terry

Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
Looking over the bug, it certainly seems to be the problem I'm seeing.  Usually I can terminate the tserver and restart it and not see the problem.  I think sometimes it's taken me 2 restarts to get the server up and running, but I don't recall ever needing 3.  So it may be a race condition.  I had assumed that "something" needed to clean itself up before the tserver would start properly.

From: Keith Turner <ke...@deenlo.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Thursday, August 8, 2013 11:16 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

I thought I fixed some lock issue for 1.4.4., I looked at fixes for 1.4.4.  You may be running into ACCUMULO-1277[1].  I just looked at the 1.4.3 code to see how it would be behave.  I think it would timeout like you are seeing.   If we can confirm this, then it would be worthwhile posting your log messages about waiting and "could not obtain lock" on the ticket so that its easier to find the issue via google.

https://issues.apache.org/jira/browse/ACCUMULO-1277


On Thu, Aug 8, 2013 at 10:03 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
I'm trying to see if I can post the entire log somewhere.  In the interim, this is a copy of the error as it appears in the log file.

2013-08-01 10:15:55,980 [tabletserver.TabletServer] INFO : Tablet server starting on 10.1.3.227
2013-08-01 10:15:56,087 [util.FileSystemMonitor] INFO : Filesystem monitor started
2013-08-01 10:15:56,121 [tabletserver.NativeMap] INFO : Loaded native map shared library /opt/accumulo/accumulo-current/lib/native/map/libNativeMap-Linux-tile-64.so
2013-08-01 10:15:57,394 [tabletserver.TabletServer] INFO : port = 9997
2013-08-01 10:15:57,493 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:02,504 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:07,517 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:12,528 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:17,539 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:22,550 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:27,566 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:32,582 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:37,594 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:42,607 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:47,617 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:52,628 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:57,639 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:02,650 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:07,662 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:12,672 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:17,690 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:22,701 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:27,711 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:32,724 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:37,735 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:42,745 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:47,763 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:52,774 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:57,775 [tabletserver.TabletServer] INFO : Too many retries, exiting.
2013-08-01 10:17:57,778 [tabletserver.TabletServer] INFO : Could not obtain tablet server lock, exiting.
java.lang.RuntimeException: Too many retries, exiting.
at org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681)
at org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703)
at org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.accumulo.start.Main$1.run(Main.java:89)
at java.lang.Thread.run(Thread.java:636)
2013-08-01 10:17:57,786 [tabletserver.TabletServer] ERROR: Uncaught exception in TabletServer.main, exiting
java.lang.RuntimeException: java.lang.RuntimeException: Too many retries, exiting.
at org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2684)
at org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703)
at org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.accumulo.start.Main$1.run(Main.java:89)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.RuntimeException: Too many retries, exiting.
at org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681)
... 8 more

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Wednesday, August 7, 2013 7:25 PM

To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Can you post the full logs from the tablet servers somewhere and send a link?



On Tue, Aug 6, 2013 at 10:40 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
It's from one of the tablet servers, but looking at one of the zookeeper servers, it's exactly the same

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:35 PM

To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Is that on the ZK server or the TabletServer? Can we also see the other?


On Tue, Aug 6, 2013 at 10:33 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
ACCEPT     icmp --  anywhere             anywhere            icmp echo-reply
ACCEPT     icmp --  anywhere             anywhere            icmp echo-request
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:nrpe
ACCEPT     udp  --  anywhere             anywhere            udp dpt:domain

Chain FORWARD (policy DROP)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

From: Brendan Heussler <bh...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:27 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>

Subject: Re: Communication issue between zookeeper and accumulo

What is the output of iptables --list?



Brendan


On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Not sure what you mean.  I get the error "Fatal ip6_tables not found."  I'm assuming that means disabled?

From: <Ott>, "Charles H." <CH...@saic.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:18 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: RE: Communication issue between zookeeper and accumulo

And iptables?

From: user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<ma...@accumulo.apache.org> [mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org] On Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 12:54 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

Yes, it is disabled, so that's not the problem.

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 12:48 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Hi Ray!

Can you confirm that IPv6 is disabled?

On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
I'm not sure if I can provide those due to the contract I'm working.  I really don't want to diverge this conversation from the original question I'm asking (which is a problem even running one tablet server per machine) but are you saying that setting tserver.port.search = true shouldn't be done?  I found this to be an undocumented way of running more than one tablet server per system.  I'm still not convinced that this leads to stability issues on tablet servers.  As I said, it's undocumented.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 11:12 AM

To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Interesting.  You could not get similar performance improvements by increasing the size of the JVM, the number of threads, or the number of tablets per server?

If you have details about what configurations you've tried and the performance numbers you found, please open a ticket.  This would indicate that we have some unnecessary bottleneck in the tserver.

-Eric


On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?





--
Sean


Re: Communication issue between zookeeper and accumulo

Posted by Keith Turner <ke...@deenlo.com>.
I thought I fixed some lock issue for 1.4.4., I looked at fixes for 1.4.4.
 You may be running into ACCUMULO-1277[1].  I just looked at the 1.4.3 code
to see how it would be behave.  I think it would timeout like you are
seeing.   If we can confirm this, then it would be worthwhile posting your
log messages about waiting and "could not obtain lock" on the ticket so
that its easier to find the issue via google.

https://issues.apache.org/jira/browse/ACCUMULO-1277


On Thu, Aug 8, 2013 at 10:03 AM, Ray Pfaff <ra...@apx-labs.com> wrote:

>  I'm trying to see if I can post the entire log somewhere.  In the
> interim, this is a copy of the error as it appears in the log file.
>
>  2013-08-01 10:15:55,980 [tabletserver.TabletServer] INFO : Tablet server
> starting on 10.1.3.227
> 2013-08-01 10:15:56,087 [util.FileSystemMonitor] INFO : Filesystem monitor
> started
> 2013-08-01 10:15:56,121 [tabletserver.NativeMap] INFO : Loaded native map
> shared library
> /opt/accumulo/accumulo-current/lib/native/map/libNativeMap-Linux-tile-64.so
> 2013-08-01 10:15:57,394 [tabletserver.TabletServer] INFO : port = 9997
> 2013-08-01 10:15:57,493 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:02,504 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:07,517 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:12,528 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:17,539 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:22,550 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:27,566 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:32,582 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:37,594 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:42,607 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:47,617 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:52,628 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:16:57,639 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:02,650 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:07,662 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:12,672 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:17,690 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:22,701 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:27,711 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:32,724 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:37,735 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:42,745 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:47,763 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:52,774 [tabletserver.TabletServer] INFO : Waiting for
> tablet server lock
> 2013-08-01 10:17:57,775 [tabletserver.TabletServer] INFO : Too many
> retries, exiting.
> 2013-08-01 10:17:57,778 [tabletserver.TabletServer] INFO : Could not
> obtain tablet server lock, exiting.
> java.lang.RuntimeException: Too many retries, exiting.
> at
> org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681)
> at
> org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703)
> at
> org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.apache.accumulo.start.Main$1.run(Main.java:89)
> at java.lang.Thread.run(Thread.java:636)
> 2013-08-01 10:17:57,786 [tabletserver.TabletServer] ERROR: Uncaught
> exception in TabletServer.main, exiting
> java.lang.RuntimeException: java.lang.RuntimeException: Too many retries,
> exiting.
> at
> org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2684)
> at
> org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703)
> at
> org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.apache.accumulo.start.Main$1.run(Main.java:89)
> at java.lang.Thread.run(Thread.java:636)
> Caused by: java.lang.RuntimeException: Too many retries, exiting.
> at
> org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681)
> ... 8 more
>
>   From: Sean Busbey <bu...@cloudera.com>
> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Date: Wednesday, August 7, 2013 7:25 PM
>
> To: Accumulo User List <us...@accumulo.apache.org>
> Subject: Re: Communication issue between zookeeper and accumulo
>
>   Can you post the full logs from the tablet servers somewhere and send a
> link?
>
>
>
> On Tue, Aug 6, 2013 at 10:40 AM, Ray Pfaff <ra...@apx-labs.com> wrote:
>
>>  It's from one of the tablet servers, but looking at one of the
>> zookeeper servers, it's exactly the same
>>
>>   From: Sean Busbey <bu...@cloudera.com>
>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>  Date: Tuesday, August 6, 2013 1:35 PM
>>
>> To: Accumulo User List <us...@accumulo.apache.org>
>> Subject: Re: Communication issue between zookeeper and accumulo
>>
>>   Is that on the ZK server or the TabletServer? Can we also see the
>> other?
>>
>>
>> On Tue, Aug 6, 2013 at 10:33 AM, Ray Pfaff <ra...@apx-labs.com>wrote:
>>
>>>  Chain INPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>> ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
>>> ACCEPT     icmp --  anywhere             anywhere            icmp
>>> echo-reply
>>> ACCEPT     icmp --  anywhere             anywhere            icmp
>>> echo-request
>>> ACCEPT     tcp  --  anywhere             anywhere            tcp
>>> dpt:nrpe
>>> ACCEPT     udp  --  anywhere             anywhere            udp
>>> dpt:domain
>>>
>>>  Chain FORWARD (policy DROP)
>>> target     prot opt source               destination
>>>
>>>  Chain OUTPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>>   From: Brendan Heussler <bh...@gmail.com>
>>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> Date: Tuesday, August 6, 2013 1:27 PM
>>> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>>
>>> Subject: Re: Communication issue between zookeeper and accumulo
>>>
>>>   What is the output of iptables --list?
>>>
>>>
>>>
>>> Brendan
>>>
>>>
>>> On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com>wrote:
>>>
>>>>  Not sure what you mean.  I get the error "Fatal ip6_tables not
>>>> found."  I'm assuming that means disabled?
>>>>
>>>>   From: <Ott>, "Charles H." <CH...@saic.com>
>>>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>>> Date: Tuesday, August 6, 2013 1:18 PM
>>>> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>>> Subject: RE: Communication issue between zookeeper and accumulo
>>>>
>>>>   And iptables?****
>>>>
>>>> ** **
>>>>
>>>> *From:* user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org [
>>>> mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<us...@accumulo.apache.org>]
>>>> *On Behalf Of *Ray Pfaff
>>>> *Sent:* Tuesday, August 06, 2013 12:54 PM
>>>> *To:* user@accumulo.apache.org
>>>> *Subject:* Re: Communication issue between zookeeper and accumulo****
>>>>
>>>> ** **
>>>>
>>>> Yes, it is disabled, so that's not the problem.****
>>>>
>>>> ** **
>>>>
>>>> *From: *Sean Busbey <bu...@cloudera.com>
>>>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>>> *Date: *Tuesday, August 6, 2013 12:48 PM
>>>> *To: *Accumulo User List <us...@accumulo.apache.org>
>>>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>>>
>>>> ** **
>>>>
>>>> Hi Ray! ****
>>>>
>>>> ** **
>>>>
>>>> Can you confirm that IPv6 is disabled?****
>>>>
>>>> ** **
>>>>
>>>> On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>
>>>> wrote:****
>>>>
>>>> I'm not sure if I can provide those due to the contract I'm working.  I
>>>> really don't want to diverge this conversation from the original question
>>>> I'm asking (which is a problem even running one tablet server per machine)
>>>> but are you saying that setting tserver.port.search = true shouldn't be
>>>> done?  I found this to be an undocumented way of running more than one
>>>> tablet server per system.  I'm still not convinced that this leads to
>>>> stability issues on tablet servers.  As I said, it's undocumented.****
>>>>
>>>> ** **
>>>>
>>>> *From: *Eric Newton <er...@gmail.com>
>>>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>****
>>>>
>>>> *Date: *Tuesday, August 6, 2013 11:12 AM ****
>>>>
>>>>
>>>> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>>>
>>>> ** **
>>>>
>>>> Interesting.  You could not get similar performance improvements by
>>>> increasing the size of the JVM, the number of threads, or the number of
>>>> tablets per server? ****
>>>>
>>>> ** **
>>>>
>>>> If you have details about what configurations you've tried and the
>>>> performance numbers you found, please open a ticket.  This would indicate
>>>> that we have some unnecessary bottleneck in the tserver.****
>>>>
>>>> ** **
>>>>
>>>> -Eric****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>
>>>> wrote:****
>>>>
>>>> Because we found this to be the optimal number of tablet servers in our
>>>> testing.  It performs better than one per machine.  I'm not convinced that
>>>> the stability issues make it worthwhile.****
>>>>
>>>> Doesn't affect my problem anyway.  I get this error whether I run one
>>>> or four tablet servers.  Running four just makes it a bigger issue to get
>>>> back up after failure.****
>>>>
>>>> ** **
>>>>
>>>> *From: *Eric Newton <er...@gmail.com>
>>>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>>> *Date: *Tuesday, August 6, 2013 10:56 AM
>>>> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>>  I'm running 4 tservers per machine dedicated to the tablet servers****
>>>>
>>>>  ** **
>>>>
>>>> Why?****
>>>>
>>>> ** **
>>>>
>>>>
>>>>
>
>
>  --
> Sean
>

Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
I can see if I can do that.

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Wednesday, August 7, 2013 7:25 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Can you post the full logs from the tablet servers somewhere and send a link?



On Tue, Aug 6, 2013 at 10:40 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
It's from one of the tablet servers, but looking at one of the zookeeper servers, it's exactly the same

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:35 PM

To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Is that on the ZK server or the TabletServer? Can we also see the other?


On Tue, Aug 6, 2013 at 10:33 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
ACCEPT     icmp --  anywhere             anywhere            icmp echo-reply
ACCEPT     icmp --  anywhere             anywhere            icmp echo-request
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:nrpe
ACCEPT     udp  --  anywhere             anywhere            udp dpt:domain

Chain FORWARD (policy DROP)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

From: Brendan Heussler <bh...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:27 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>

Subject: Re: Communication issue between zookeeper and accumulo

What is the output of iptables --list?



Brendan


On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Not sure what you mean.  I get the error "Fatal ip6_tables not found."  I'm assuming that means disabled?

From: <Ott>, "Charles H." <CH...@saic.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:18 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: RE: Communication issue between zookeeper and accumulo

And iptables?

From: user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<ma...@accumulo.apache.org> [mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org] On Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 12:54 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

Yes, it is disabled, so that's not the problem.

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 12:48 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Hi Ray!

Can you confirm that IPv6 is disabled?

On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
I'm not sure if I can provide those due to the contract I'm working.  I really don't want to diverge this conversation from the original question I'm asking (which is a problem even running one tablet server per machine) but are you saying that setting tserver.port.search = true shouldn't be done?  I found this to be an undocumented way of running more than one tablet server per system.  I'm still not convinced that this leads to stability issues on tablet servers.  As I said, it's undocumented.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 11:12 AM

To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Interesting.  You could not get similar performance improvements by increasing the size of the JVM, the number of threads, or the number of tablets per server?

If you have details about what configurations you've tried and the performance numbers you found, please open a ticket.  This would indicate that we have some unnecessary bottleneck in the tserver.

-Eric


On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?





--
Sean

Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
I'm trying to see if I can post the entire log somewhere.  In the interim, this is a copy of the error as it appears in the log file.

2013-08-01 10:15:55,980 [tabletserver.TabletServer] INFO : Tablet server starting on 10.1.3.227
2013-08-01 10:15:56,087 [util.FileSystemMonitor] INFO : Filesystem monitor started
2013-08-01 10:15:56,121 [tabletserver.NativeMap] INFO : Loaded native map shared library /opt/accumulo/accumulo-current/lib/native/map/libNativeMap-Linux-tile-64.so
2013-08-01 10:15:57,394 [tabletserver.TabletServer] INFO : port = 9997
2013-08-01 10:15:57,493 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:02,504 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:07,517 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:12,528 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:17,539 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:22,550 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:27,566 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:32,582 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:37,594 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:42,607 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:47,617 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:52,628 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:16:57,639 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:02,650 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:07,662 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:12,672 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:17,690 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:22,701 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:27,711 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:32,724 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:37,735 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:42,745 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:47,763 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:52,774 [tabletserver.TabletServer] INFO : Waiting for tablet server lock
2013-08-01 10:17:57,775 [tabletserver.TabletServer] INFO : Too many retries, exiting.
2013-08-01 10:17:57,778 [tabletserver.TabletServer] INFO : Could not obtain tablet server lock, exiting.
java.lang.RuntimeException: Too many retries, exiting.
at org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681)
at org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703)
at org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.accumulo.start.Main$1.run(Main.java:89)
at java.lang.Thread.run(Thread.java:636)
2013-08-01 10:17:57,786 [tabletserver.TabletServer] ERROR: Uncaught exception in TabletServer.main, exiting
java.lang.RuntimeException: java.lang.RuntimeException: Too many retries, exiting.
at org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2684)
at org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703)
at org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.accumulo.start.Main$1.run(Main.java:89)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.RuntimeException: Too many retries, exiting.
at org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681)
... 8 more

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Wednesday, August 7, 2013 7:25 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Can you post the full logs from the tablet servers somewhere and send a link?



On Tue, Aug 6, 2013 at 10:40 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
It's from one of the tablet servers, but looking at one of the zookeeper servers, it's exactly the same

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:35 PM

To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Is that on the ZK server or the TabletServer? Can we also see the other?


On Tue, Aug 6, 2013 at 10:33 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
ACCEPT     icmp --  anywhere             anywhere            icmp echo-reply
ACCEPT     icmp --  anywhere             anywhere            icmp echo-request
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:nrpe
ACCEPT     udp  --  anywhere             anywhere            udp dpt:domain

Chain FORWARD (policy DROP)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

From: Brendan Heussler <bh...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:27 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>

Subject: Re: Communication issue between zookeeper and accumulo

What is the output of iptables --list?



Brendan


On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Not sure what you mean.  I get the error "Fatal ip6_tables not found."  I'm assuming that means disabled?

From: <Ott>, "Charles H." <CH...@saic.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:18 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: RE: Communication issue between zookeeper and accumulo

And iptables?

From: user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<ma...@accumulo.apache.org> [mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org] On Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 12:54 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

Yes, it is disabled, so that's not the problem.

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 12:48 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Hi Ray!

Can you confirm that IPv6 is disabled?

On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
I'm not sure if I can provide those due to the contract I'm working.  I really don't want to diverge this conversation from the original question I'm asking (which is a problem even running one tablet server per machine) but are you saying that setting tserver.port.search = true shouldn't be done?  I found this to be an undocumented way of running more than one tablet server per system.  I'm still not convinced that this leads to stability issues on tablet servers.  As I said, it's undocumented.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 11:12 AM

To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Interesting.  You could not get similar performance improvements by increasing the size of the JVM, the number of threads, or the number of tablets per server?

If you have details about what configurations you've tried and the performance numbers you found, please open a ticket.  This would indicate that we have some unnecessary bottleneck in the tserver.

-Eric


On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?





--
Sean

Re: Communication issue between zookeeper and accumulo

Posted by Sean Busbey <bu...@cloudera.com>.
Can you post the full logs from the tablet servers somewhere and send a
link?



On Tue, Aug 6, 2013 at 10:40 AM, Ray Pfaff <ra...@apx-labs.com> wrote:

>  It's from one of the tablet servers, but looking at one of the zookeeper
> servers, it's exactly the same
>
>   From: Sean Busbey <bu...@cloudera.com>
> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Date: Tuesday, August 6, 2013 1:35 PM
>
> To: Accumulo User List <us...@accumulo.apache.org>
> Subject: Re: Communication issue between zookeeper and accumulo
>
>   Is that on the ZK server or the TabletServer? Can we also see the
> other?
>
>
> On Tue, Aug 6, 2013 at 10:33 AM, Ray Pfaff <ra...@apx-labs.com> wrote:
>
>>  Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>> ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
>> ACCEPT     icmp --  anywhere             anywhere            icmp
>> echo-reply
>> ACCEPT     icmp --  anywhere             anywhere            icmp
>> echo-request
>> ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:nrpe
>> ACCEPT     udp  --  anywhere             anywhere            udp
>> dpt:domain
>>
>>  Chain FORWARD (policy DROP)
>> target     prot opt source               destination
>>
>>  Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>>   From: Brendan Heussler <bh...@gmail.com>
>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>> Date: Tuesday, August 6, 2013 1:27 PM
>> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>
>> Subject: Re: Communication issue between zookeeper and accumulo
>>
>>   What is the output of iptables --list?
>>
>>
>>
>> Brendan
>>
>>
>> On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com> wrote:
>>
>>>  Not sure what you mean.  I get the error "Fatal ip6_tables not found."
>>>  I'm assuming that means disabled?
>>>
>>>   From: <Ott>, "Charles H." <CH...@saic.com>
>>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> Date: Tuesday, August 6, 2013 1:18 PM
>>> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> Subject: RE: Communication issue between zookeeper and accumulo
>>>
>>>   And iptables?****
>>>
>>> ** **
>>>
>>> *From:* user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org [
>>> mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<us...@accumulo.apache.org>]
>>> *On Behalf Of *Ray Pfaff
>>> *Sent:* Tuesday, August 06, 2013 12:54 PM
>>> *To:* user@accumulo.apache.org
>>> *Subject:* Re: Communication issue between zookeeper and accumulo****
>>>
>>> ** **
>>>
>>> Yes, it is disabled, so that's not the problem.****
>>>
>>> ** **
>>>
>>> *From: *Sean Busbey <bu...@cloudera.com>
>>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> *Date: *Tuesday, August 6, 2013 12:48 PM
>>> *To: *Accumulo User List <us...@accumulo.apache.org>
>>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>>
>>> ** **
>>>
>>> Hi Ray! ****
>>>
>>> ** **
>>>
>>> Can you confirm that IPv6 is disabled?****
>>>
>>> ** **
>>>
>>> On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>
>>> wrote:****
>>>
>>> I'm not sure if I can provide those due to the contract I'm working.  I
>>> really don't want to diverge this conversation from the original question
>>> I'm asking (which is a problem even running one tablet server per machine)
>>> but are you saying that setting tserver.port.search = true shouldn't be
>>> done?  I found this to be an undocumented way of running more than one
>>> tablet server per system.  I'm still not convinced that this leads to
>>> stability issues on tablet servers.  As I said, it's undocumented.****
>>>
>>> ** **
>>>
>>> *From: *Eric Newton <er...@gmail.com>
>>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>****
>>>
>>> *Date: *Tuesday, August 6, 2013 11:12 AM ****
>>>
>>>
>>> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>>
>>> ** **
>>>
>>> Interesting.  You could not get similar performance improvements by
>>> increasing the size of the JVM, the number of threads, or the number of
>>> tablets per server? ****
>>>
>>> ** **
>>>
>>> If you have details about what configurations you've tried and the
>>> performance numbers you found, please open a ticket.  This would indicate
>>> that we have some unnecessary bottleneck in the tserver.****
>>>
>>> ** **
>>>
>>> -Eric****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>
>>> wrote:****
>>>
>>> Because we found this to be the optimal number of tablet servers in our
>>> testing.  It performs better than one per machine.  I'm not convinced that
>>> the stability issues make it worthwhile.****
>>>
>>> Doesn't affect my problem anyway.  I get this error whether I run one or
>>> four tablet servers.  Running four just makes it a bigger issue to get back
>>> up after failure.****
>>>
>>> ** **
>>>
>>> *From: *Eric Newton <er...@gmail.com>
>>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> *Date: *Tuesday, August 6, 2013 10:56 AM
>>> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>>  I'm running 4 tservers per machine dedicated to the tablet servers****
>>>
>>>  ** **
>>>
>>> Why?****
>>>
>>> ** **
>>>
>>>
>>>


-- 
Sean

Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
It's not 100, it's 1024.  So if you're asking if it's been raised… yes.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Wednesday, August 7, 2013 2:09 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Have you set maxClientCnxns as described in the README?  You will need to restart zookeeper for this to have an effect.

-Eric



On Tue, Aug 6, 2013 at 1:40 PM, Ray Pfaff <ra...@apx-labs.com>> wrote:
It's from one of the tablet servers, but looking at one of the zookeeper servers, it's exactly the same

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:35 PM

To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Is that on the ZK server or the TabletServer? Can we also see the other?


On Tue, Aug 6, 2013 at 10:33 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
ACCEPT     icmp --  anywhere             anywhere            icmp echo-reply
ACCEPT     icmp --  anywhere             anywhere            icmp echo-request
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:nrpe
ACCEPT     udp  --  anywhere             anywhere            udp dpt:domain

Chain FORWARD (policy DROP)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

From: Brendan Heussler <bh...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:27 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>

Subject: Re: Communication issue between zookeeper and accumulo

What is the output of iptables --list?



Brendan


On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Not sure what you mean.  I get the error "Fatal ip6_tables not found."  I'm assuming that means disabled?

From: <Ott>, "Charles H." <CH...@saic.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:18 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: RE: Communication issue between zookeeper and accumulo

And iptables?

From: user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<ma...@accumulo.apache.org> [mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org] On Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 12:54 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

Yes, it is disabled, so that's not the problem.

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 12:48 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Hi Ray!

Can you confirm that IPv6 is disabled?

On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
I'm not sure if I can provide those due to the contract I'm working.  I really don't want to diverge this conversation from the original question I'm asking (which is a problem even running one tablet server per machine) but are you saying that setting tserver.port.search = true shouldn't be done?  I found this to be an undocumented way of running more than one tablet server per system.  I'm still not convinced that this leads to stability issues on tablet servers.  As I said, it's undocumented.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 11:12 AM

To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Interesting.  You could not get similar performance improvements by increasing the size of the JVM, the number of threads, or the number of tablets per server?

If you have details about what configurations you've tried and the performance numbers you found, please open a ticket.  This would indicate that we have some unnecessary bottleneck in the tserver.

-Eric


On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?





--
Sean




--
Sean


Re: Communication issue between zookeeper and accumulo

Posted by Eric Newton <er...@gmail.com>.
Have you set maxClientCnxns as described in the README?  You will need to
restart zookeeper for this to have an effect.

-Eric



On Tue, Aug 6, 2013 at 1:40 PM, Ray Pfaff <ra...@apx-labs.com> wrote:

>  It's from one of the tablet servers, but looking at one of the zookeeper
> servers, it's exactly the same
>
>   From: Sean Busbey <bu...@cloudera.com>
> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Date: Tuesday, August 6, 2013 1:35 PM
>
> To: Accumulo User List <us...@accumulo.apache.org>
> Subject: Re: Communication issue between zookeeper and accumulo
>
>   Is that on the ZK server or the TabletServer? Can we also see the
> other?
>
>
> On Tue, Aug 6, 2013 at 10:33 AM, Ray Pfaff <ra...@apx-labs.com> wrote:
>
>>  Chain INPUT (policy ACCEPT)
>> target     prot opt source               destination
>> ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
>> ACCEPT     icmp --  anywhere             anywhere            icmp
>> echo-reply
>> ACCEPT     icmp --  anywhere             anywhere            icmp
>> echo-request
>> ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:nrpe
>> ACCEPT     udp  --  anywhere             anywhere            udp
>> dpt:domain
>>
>>  Chain FORWARD (policy DROP)
>> target     prot opt source               destination
>>
>>  Chain OUTPUT (policy ACCEPT)
>> target     prot opt source               destination
>>
>>   From: Brendan Heussler <bh...@gmail.com>
>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>> Date: Tuesday, August 6, 2013 1:27 PM
>> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>
>> Subject: Re: Communication issue between zookeeper and accumulo
>>
>>   What is the output of iptables --list?
>>
>>
>>
>> Brendan
>>
>>
>> On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com> wrote:
>>
>>>  Not sure what you mean.  I get the error "Fatal ip6_tables not found."
>>>  I'm assuming that means disabled?
>>>
>>>   From: <Ott>, "Charles H." <CH...@saic.com>
>>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> Date: Tuesday, August 6, 2013 1:18 PM
>>> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> Subject: RE: Communication issue between zookeeper and accumulo
>>>
>>>   And iptables?****
>>>
>>> ** **
>>>
>>> *From:* user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org [
>>> mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<us...@accumulo.apache.org>]
>>> *On Behalf Of *Ray Pfaff
>>> *Sent:* Tuesday, August 06, 2013 12:54 PM
>>> *To:* user@accumulo.apache.org
>>> *Subject:* Re: Communication issue between zookeeper and accumulo****
>>>
>>> ** **
>>>
>>> Yes, it is disabled, so that's not the problem.****
>>>
>>> ** **
>>>
>>> *From: *Sean Busbey <bu...@cloudera.com>
>>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> *Date: *Tuesday, August 6, 2013 12:48 PM
>>> *To: *Accumulo User List <us...@accumulo.apache.org>
>>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>>
>>> ** **
>>>
>>> Hi Ray! ****
>>>
>>> ** **
>>>
>>> Can you confirm that IPv6 is disabled?****
>>>
>>> ** **
>>>
>>> On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>
>>> wrote:****
>>>
>>> I'm not sure if I can provide those due to the contract I'm working.  I
>>> really don't want to diverge this conversation from the original question
>>> I'm asking (which is a problem even running one tablet server per machine)
>>> but are you saying that setting tserver.port.search = true shouldn't be
>>> done?  I found this to be an undocumented way of running more than one
>>> tablet server per system.  I'm still not convinced that this leads to
>>> stability issues on tablet servers.  As I said, it's undocumented.****
>>>
>>> ** **
>>>
>>> *From: *Eric Newton <er...@gmail.com>
>>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>****
>>>
>>> *Date: *Tuesday, August 6, 2013 11:12 AM ****
>>>
>>>
>>> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>>
>>> ** **
>>>
>>> Interesting.  You could not get similar performance improvements by
>>> increasing the size of the JVM, the number of threads, or the number of
>>> tablets per server? ****
>>>
>>> ** **
>>>
>>> If you have details about what configurations you've tried and the
>>> performance numbers you found, please open a ticket.  This would indicate
>>> that we have some unnecessary bottleneck in the tserver.****
>>>
>>> ** **
>>>
>>> -Eric****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>
>>> wrote:****
>>>
>>> Because we found this to be the optimal number of tablet servers in our
>>> testing.  It performs better than one per machine.  I'm not convinced that
>>> the stability issues make it worthwhile.****
>>>
>>> Doesn't affect my problem anyway.  I get this error whether I run one or
>>> four tablet servers.  Running four just makes it a bigger issue to get back
>>> up after failure.****
>>>
>>> ** **
>>>
>>> *From: *Eric Newton <er...@gmail.com>
>>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> *Date: *Tuesday, August 6, 2013 10:56 AM
>>> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>>  I'm running 4 tservers per machine dedicated to the tablet servers****
>>>
>>>  ** **
>>>
>>> Why?****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>>
>>>
>>> ****
>>>
>>> ** **
>>>
>>> -- ****
>>>
>>> Sean****
>>>
>>
>>
>
>
>  --
> Sean
>

Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
It's from one of the tablet servers, but looking at one of the zookeeper servers, it's exactly the same

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:35 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Is that on the ZK server or the TabletServer? Can we also see the other?


On Tue, Aug 6, 2013 at 10:33 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
ACCEPT     icmp --  anywhere             anywhere            icmp echo-reply
ACCEPT     icmp --  anywhere             anywhere            icmp echo-request
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:nrpe
ACCEPT     udp  --  anywhere             anywhere            udp dpt:domain

Chain FORWARD (policy DROP)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

From: Brendan Heussler <bh...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:27 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>

Subject: Re: Communication issue between zookeeper and accumulo

What is the output of iptables --list?



Brendan


On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Not sure what you mean.  I get the error "Fatal ip6_tables not found."  I'm assuming that means disabled?

From: <Ott>, "Charles H." <CH...@saic.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:18 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: RE: Communication issue between zookeeper and accumulo

And iptables?

From: user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<ma...@accumulo.apache.org> [mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org] On Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 12:54 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

Yes, it is disabled, so that's not the problem.

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 12:48 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Hi Ray!

Can you confirm that IPv6 is disabled?

On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
I'm not sure if I can provide those due to the contract I'm working.  I really don't want to diverge this conversation from the original question I'm asking (which is a problem even running one tablet server per machine) but are you saying that setting tserver.port.search = true shouldn't be done?  I found this to be an undocumented way of running more than one tablet server per system.  I'm still not convinced that this leads to stability issues on tablet servers.  As I said, it's undocumented.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 11:12 AM

To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Interesting.  You could not get similar performance improvements by increasing the size of the JVM, the number of threads, or the number of tablets per server?

If you have details about what configurations you've tried and the performance numbers you found, please open a ticket.  This would indicate that we have some unnecessary bottleneck in the tserver.

-Eric


On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?





--
Sean




--
Sean

Re: Communication issue between zookeeper and accumulo

Posted by Sean Busbey <bu...@cloudera.com>.
Is that on the ZK server or the TabletServer? Can we also see the other?


On Tue, Aug 6, 2013 at 10:33 AM, Ray Pfaff <ra...@apx-labs.com> wrote:

>  Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
> ACCEPT     icmp --  anywhere             anywhere            icmp
> echo-reply
> ACCEPT     icmp --  anywhere             anywhere            icmp
> echo-request
> ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:nrpe
> ACCEPT     udp  --  anywhere             anywhere            udp
> dpt:domain
>
>  Chain FORWARD (policy DROP)
> target     prot opt source               destination
>
>  Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
>
>   From: Brendan Heussler <bh...@gmail.com>
> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Date: Tuesday, August 6, 2013 1:27 PM
> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>
> Subject: Re: Communication issue between zookeeper and accumulo
>
>   What is the output of iptables --list?
>
>
>
> Brendan
>
>
> On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com> wrote:
>
>>  Not sure what you mean.  I get the error "Fatal ip6_tables not found."
>>  I'm assuming that means disabled?
>>
>>   From: <Ott>, "Charles H." <CH...@saic.com>
>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>> Date: Tuesday, August 6, 2013 1:18 PM
>> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>> Subject: RE: Communication issue between zookeeper and accumulo
>>
>>   And iptables?****
>>
>> ** **
>>
>> *From:* user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org [
>> mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<us...@accumulo.apache.org>]
>> *On Behalf Of *Ray Pfaff
>> *Sent:* Tuesday, August 06, 2013 12:54 PM
>> *To:* user@accumulo.apache.org
>> *Subject:* Re: Communication issue between zookeeper and accumulo****
>>
>> ** **
>>
>> Yes, it is disabled, so that's not the problem.****
>>
>> ** **
>>
>> *From: *Sean Busbey <bu...@cloudera.com>
>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>> *Date: *Tuesday, August 6, 2013 12:48 PM
>> *To: *Accumulo User List <us...@accumulo.apache.org>
>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>
>> ** **
>>
>> Hi Ray! ****
>>
>> ** **
>>
>> Can you confirm that IPv6 is disabled?****
>>
>> ** **
>>
>> On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com> wrote:
>> ****
>>
>> I'm not sure if I can provide those due to the contract I'm working.  I
>> really don't want to diverge this conversation from the original question
>> I'm asking (which is a problem even running one tablet server per machine)
>> but are you saying that setting tserver.port.search = true shouldn't be
>> done?  I found this to be an undocumented way of running more than one
>> tablet server per system.  I'm still not convinced that this leads to
>> stability issues on tablet servers.  As I said, it's undocumented.****
>>
>> ** **
>>
>> *From: *Eric Newton <er...@gmail.com>
>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>****
>>
>> *Date: *Tuesday, August 6, 2013 11:12 AM ****
>>
>>
>> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>
>> ** **
>>
>> Interesting.  You could not get similar performance improvements by
>> increasing the size of the JVM, the number of threads, or the number of
>> tablets per server? ****
>>
>> ** **
>>
>> If you have details about what configurations you've tried and the
>> performance numbers you found, please open a ticket.  This would indicate
>> that we have some unnecessary bottleneck in the tserver.****
>>
>> ** **
>>
>> -Eric****
>>
>> ** **
>>
>> ** **
>>
>> On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>
>> wrote:****
>>
>> Because we found this to be the optimal number of tablet servers in our
>> testing.  It performs better than one per machine.  I'm not convinced that
>> the stability issues make it worthwhile.****
>>
>> Doesn't affect my problem anyway.  I get this error whether I run one or
>> four tablet servers.  Running four just makes it a bigger issue to get back
>> up after failure.****
>>
>> ** **
>>
>> *From: *Eric Newton <er...@gmail.com>
>> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>> *Date: *Tuesday, August 6, 2013 10:56 AM
>> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
>> *Subject: *Re: Communication issue between zookeeper and accumulo****
>>
>> ** **
>>
>> ** **
>>
>>  I'm running 4 tservers per machine dedicated to the tablet servers****
>>
>>  ** **
>>
>> Why?****
>>
>> ** **
>>
>> ** **
>>
>>
>>
>> ****
>>
>> ** **
>>
>> -- ****
>>
>> Sean****
>>
>
>


-- 
Sean

Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh
ACCEPT     icmp --  anywhere             anywhere            icmp echo-reply
ACCEPT     icmp --  anywhere             anywhere            icmp echo-request
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:nrpe
ACCEPT     udp  --  anywhere             anywhere            udp dpt:domain

Chain FORWARD (policy DROP)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

From: Brendan Heussler <bh...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:27 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

What is the output of iptables --list?



Brendan


On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Not sure what you mean.  I get the error "Fatal ip6_tables not found."  I'm assuming that means disabled?

From: <Ott>, "Charles H." <CH...@saic.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:18 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: RE: Communication issue between zookeeper and accumulo

And iptables?

From: user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<ma...@accumulo.apache.org> [mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org] On Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 12:54 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

Yes, it is disabled, so that's not the problem.

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 12:48 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Hi Ray!

Can you confirm that IPv6 is disabled?

On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
I'm not sure if I can provide those due to the contract I'm working.  I really don't want to diverge this conversation from the original question I'm asking (which is a problem even running one tablet server per machine) but are you saying that setting tserver.port.search = true shouldn't be done?  I found this to be an undocumented way of running more than one tablet server per system.  I'm still not convinced that this leads to stability issues on tablet servers.  As I said, it's undocumented.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 11:12 AM

To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Interesting.  You could not get similar performance improvements by increasing the size of the JVM, the number of threads, or the number of tablets per server?

If you have details about what configurations you've tried and the performance numbers you found, please open a ticket.  This would indicate that we have some unnecessary bottleneck in the tserver.

-Eric


On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?





--
Sean


Re: Communication issue between zookeeper and accumulo

Posted by Brendan Heussler <bh...@gmail.com>.
What is the output of iptables --list?



Brendan


On Tue, Aug 6, 2013 at 1:25 PM, Ray Pfaff <ra...@apx-labs.com> wrote:

>  Not sure what you mean.  I get the error "Fatal ip6_tables not found."
>  I'm assuming that means disabled?
>
>   From: <Ott>, "Charles H." <CH...@saic.com>
> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Date: Tuesday, August 6, 2013 1:18 PM
> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Subject: RE: Communication issue between zookeeper and accumulo
>
>   And iptables?****
>
> ** **
>
> *From:* user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org [
> mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<us...@accumulo.apache.org>]
> *On Behalf Of *Ray Pfaff
> *Sent:* Tuesday, August 06, 2013 12:54 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: Communication issue between zookeeper and accumulo****
>
> ** **
>
> Yes, it is disabled, so that's not the problem.****
>
> ** **
>
> *From: *Sean Busbey <bu...@cloudera.com>
> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
> *Date: *Tuesday, August 6, 2013 12:48 PM
> *To: *Accumulo User List <us...@accumulo.apache.org>
> *Subject: *Re: Communication issue between zookeeper and accumulo****
>
> ** **
>
> Hi Ray! ****
>
> ** **
>
> Can you confirm that IPv6 is disabled?****
>
> ** **
>
> On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com> wrote:*
> ***
>
> I'm not sure if I can provide those due to the contract I'm working.  I
> really don't want to diverge this conversation from the original question
> I'm asking (which is a problem even running one tablet server per machine)
> but are you saying that setting tserver.port.search = true shouldn't be
> done?  I found this to be an undocumented way of running more than one
> tablet server per system.  I'm still not convinced that this leads to
> stability issues on tablet servers.  As I said, it's undocumented.****
>
> ** **
>
> *From: *Eric Newton <er...@gmail.com>
> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>****
>
> *Date: *Tuesday, August 6, 2013 11:12 AM ****
>
>
> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
> *Subject: *Re: Communication issue between zookeeper and accumulo****
>
> ** **
>
> Interesting.  You could not get similar performance improvements by
> increasing the size of the JVM, the number of threads, or the number of
> tablets per server? ****
>
> ** **
>
> If you have details about what configurations you've tried and the
> performance numbers you found, please open a ticket.  This would indicate
> that we have some unnecessary bottleneck in the tserver.****
>
> ** **
>
> -Eric****
>
> ** **
>
> ** **
>
> On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com> wrote:
> ****
>
> Because we found this to be the optimal number of tablet servers in our
> testing.  It performs better than one per machine.  I'm not convinced that
> the stability issues make it worthwhile.****
>
> Doesn't affect my problem anyway.  I get this error whether I run one or
> four tablet servers.  Running four just makes it a bigger issue to get back
> up after failure.****
>
> ** **
>
> *From: *Eric Newton <er...@gmail.com>
> *Reply-To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
> *Date: *Tuesday, August 6, 2013 10:56 AM
> *To: *"user@accumulo.apache.org" <us...@accumulo.apache.org>
> *Subject: *Re: Communication issue between zookeeper and accumulo****
>
> ** **
>
> ** **
>
>  I'm running 4 tservers per machine dedicated to the tablet servers****
>
>  ** **
>
> Why?****
>
> ** **
>
> ** **
>
>
>
> ****
>
> ** **
>
> -- ****
>
> Sean****
>

Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
Not sure what you mean.  I get the error "Fatal ip6_tables not found."  I'm assuming that means disabled?

From: <Ott>, "Charles H." <CH...@saic.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 1:18 PM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: RE: Communication issue between zookeeper and accumulo

And iptables?

From: user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org<ma...@accumulo.apache.org> [mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org] On Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 12:54 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

Yes, it is disabled, so that's not the problem.

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 12:48 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Hi Ray!

Can you confirm that IPv6 is disabled?

On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
I'm not sure if I can provide those due to the contract I'm working.  I really don't want to diverge this conversation from the original question I'm asking (which is a problem even running one tablet server per machine) but are you saying that setting tserver.port.search = true shouldn't be done?  I found this to be an undocumented way of running more than one tablet server per system.  I'm still not convinced that this leads to stability issues on tablet servers.  As I said, it's undocumented.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 11:12 AM

To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Interesting.  You could not get similar performance improvements by increasing the size of the JVM, the number of threads, or the number of tablets per server?

If you have details about what configurations you've tried and the performance numbers you found, please open a ticket.  This would indicate that we have some unnecessary bottleneck in the tserver.

-Eric


On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?





--
Sean

RE: Communication issue between zookeeper and accumulo

Posted by "Ott, Charles H." <CH...@saic.com>.
And iptables?

 

From: user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org
[mailto:user-return-2837-CHARLES.H.OTT=saic.com@accumulo.apache.org] On
Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 12:54 PM
To: user@accumulo.apache.org
Subject: Re: Communication issue between zookeeper and accumulo

 

Yes, it is disabled, so that's not the problem.

 

From: Sean Busbey <bu...@cloudera.com>
Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
Date: Tuesday, August 6, 2013 12:48 PM
To: Accumulo User List <us...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

 

Hi Ray! 

 

Can you confirm that IPv6 is disabled?

 

On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>
wrote:

I'm not sure if I can provide those due to the contract I'm working.  I
really don't want to diverge this conversation from the original
question I'm asking (which is a problem even running one tablet server
per machine) but are you saying that setting tserver.port.search = true
shouldn't be done?  I found this to be an undocumented way of running
more than one tablet server per system.  I'm still not convinced that
this leads to stability issues on tablet servers.  As I said, it's
undocumented.

 

From: Eric Newton <er...@gmail.com>
Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>

Date: Tuesday, August 6, 2013 11:12 AM 


To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

 

Interesting.  You could not get similar performance improvements by
increasing the size of the JVM, the number of threads, or the number of
tablets per server? 

 

If you have details about what configurations you've tried and the
performance numbers you found, please open a ticket.  This would
indicate that we have some unnecessary bottleneck in the tserver.

 

-Eric

 

 

On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>
wrote:

Because we found this to be the optimal number of tablet servers in our
testing.  It performs better than one per machine.  I'm not convinced
that the stability issues make it worthwhile.

Doesn't affect my problem anyway.  I get this error whether I run one or
four tablet servers.  Running four just makes it a bigger issue to get
back up after failure.

 

From: Eric Newton <er...@gmail.com>
Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
Subject: Re: Communication issue between zookeeper and accumulo

 

 

	I'm running 4 tservers per machine dedicated to the tablet
servers

 

Why?

 

 





 

-- 

Sean


Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
Yes, it is disabled, so that's not the problem.

From: Sean Busbey <bu...@cloudera.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 12:48 PM
To: Accumulo User List <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Hi Ray!

Can you confirm that IPv6 is disabled?


On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
I'm not sure if I can provide those due to the contract I'm working.  I really don't want to diverge this conversation from the original question I'm asking (which is a problem even running one tablet server per machine) but are you saying that setting tserver.port.search = true shouldn't be done?  I found this to be an undocumented way of running more than one tablet server per system.  I'm still not convinced that this leads to stability issues on tablet servers.  As I said, it's undocumented.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 11:12 AM

To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Interesting.  You could not get similar performance improvements by increasing the size of the JVM, the number of threads, or the number of tablets per server?

If you have details about what configurations you've tried and the performance numbers you found, please open a ticket.  This would indicate that we have some unnecessary bottleneck in the tserver.

-Eric



On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?





--
Sean

Re: Communication issue between zookeeper and accumulo

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Ray!

Can you confirm that IPv6 is disabled?


On Tue, Aug 6, 2013 at 9:19 AM, Ray Pfaff <ra...@apx-labs.com> wrote:

>  I'm not sure if I can provide those due to the contract I'm working.  I
> really don't want to diverge this conversation from the original question
> I'm asking (which is a problem even running one tablet server per machine)
> but are you saying that setting tserver.port.search = true shouldn't be
> done?  I found this to be an undocumented way of running more than one
> tablet server per system.  I'm still not convinced that this leads to
> stability issues on tablet servers.  As I said, it's undocumented.
>
>   From: Eric Newton <er...@gmail.com>
> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Date: Tuesday, August 6, 2013 11:12 AM
>
> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Subject: Re: Communication issue between zookeeper and accumulo
>
>   Interesting.  You could not get similar performance improvements by
> increasing the size of the JVM, the number of threads, or the number of
> tablets per server?
>
>  If you have details about what configurations you've tried and the
> performance numbers you found, please open a ticket.  This would indicate
> that we have some unnecessary bottleneck in the tserver.
>
>  -Eric
>
>
>
> On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com> wrote:
>
>>  Because we found this to be the optimal number of tablet servers in our
>> testing.  It performs better than one per machine.  I'm not convinced that
>> the stability issues make it worthwhile.
>> Doesn't affect my problem anyway.  I get this error whether I run one or
>> four tablet servers.  Running four just makes it a bigger issue to get back
>> up after failure.
>>
>>   From: Eric Newton <er...@gmail.com>
>> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>> Date: Tuesday, August 6, 2013 10:56 AM
>> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
>> Subject: Re: Communication issue between zookeeper and accumulo
>>
>>
>>   I'm running 4 tservers per machine dedicated to the tablet servers
>>>
>>
>>  Why?
>>
>>
>


-- 
Sean

Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
I'm not sure if I can provide those due to the contract I'm working.  I really don't want to diverge this conversation from the original question I'm asking (which is a problem even running one tablet server per machine) but are you saying that setting tserver.port.search = true shouldn't be done?  I found this to be an undocumented way of running more than one tablet server per system.  I'm still not convinced that this leads to stability issues on tablet servers.  As I said, it's undocumented.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 11:12 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo

Interesting.  You could not get similar performance improvements by increasing the size of the JVM, the number of threads, or the number of tablets per server?

If you have details about what configurations you've tried and the performance numbers you found, please open a ticket.  This would indicate that we have some unnecessary bottleneck in the tserver.

-Eric



On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com>> wrote:
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?



Re: Communication issue between zookeeper and accumulo

Posted by Eric Newton <er...@gmail.com>.
Interesting.  You could not get similar performance improvements by
increasing the size of the JVM, the number of threads, or the number of
tablets per server?

If you have details about what configurations you've tried and the
performance numbers you found, please open a ticket.  This would indicate
that we have some unnecessary bottleneck in the tserver.

-Eric



On Tue, Aug 6, 2013 at 11:00 AM, Ray Pfaff <ra...@apx-labs.com> wrote:

>  Because we found this to be the optimal number of tablet servers in our
> testing.  It performs better than one per machine.  I'm not convinced that
> the stability issues make it worthwhile.
> Doesn't affect my problem anyway.  I get this error whether I run one or
> four tablet servers.  Running four just makes it a bigger issue to get back
> up after failure.
>
>   From: Eric Newton <er...@gmail.com>
> Reply-To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Date: Tuesday, August 6, 2013 10:56 AM
> To: "user@accumulo.apache.org" <us...@accumulo.apache.org>
> Subject: Re: Communication issue between zookeeper and accumulo
>
>
>   I'm running 4 tservers per machine dedicated to the tablet servers
>>
>
>  Why?
>
>

Re: Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
Because we found this to be the optimal number of tablet servers in our testing.  It performs better than one per machine.  I'm not convinced that the stability issues make it worthwhile.
Doesn't affect my problem anyway.  I get this error whether I run one or four tablet servers.  Running four just makes it a bigger issue to get back up after failure.

From: Eric Newton <er...@gmail.com>>
Reply-To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Date: Tuesday, August 6, 2013 10:56 AM
To: "user@accumulo.apache.org<ma...@accumulo.apache.org>" <us...@accumulo.apache.org>>
Subject: Re: Communication issue between zookeeper and accumulo


I'm running 4 tservers per machine dedicated to the tablet servers

Why?


Re: Communication issue between zookeeper and accumulo

Posted by Eric Newton <er...@gmail.com>.
> I'm running 4 tservers per machine dedicated to the tablet servers
>

Why?

RE: Communication issue between zookeeper and accumulo

Posted by "Ott, Charles H." <CH...@saic.com>.
Last time I had errors like this, someone mentioned disabling IPv6.  I
disabled IPv6 on the all slaves, the master, and zookeeper.  After that
everything worked fine.

 

When doing junit testing with zookeeper as dependency, I noticed that
zookeeper was throwing errors in the console when my localhost was
mapped to :::1 (or whatever the IPv6 localhost default is).   The error
from zookeeper was that the 'address family not supported'.  I was able
to get around it when I switch localhost -> 127.0.0.1

 

I would try disabling ipv6 for your cluster and see if that resolves.

 

 

 

From: user-return-2830-CHARLES.H.OTT=saic.com@accumulo.apache.org
[mailto:user-return-2830-CHARLES.H.OTT=saic.com@accumulo.apache.org] On
Behalf Of Ray Pfaff
Sent: Tuesday, August 06, 2013 10:38 AM
To: user@accumulo.apache.org
Subject: Communication issue between zookeeper and accumulo

 

 

I'm running zookeeper 1.4.3 and zookeeper 3.3.5 and I seem to have
occasional communication errors between the tablet servers and
zookeeper.  Sometimes when I restart a tablet server, I get the
following error in my log:

INFO : Waiting for tablet server lock

(repeats numerous times)

INFO:Too many retries, exiting.

 

At this point the tserver process is still running, but it registers as
dead to the master.  I have to manually terminate the tserver and then
restart it.  Usually by the second or third try, I no longer get the
"exiting" error and the server will begin to do work.  I'm running 4
tservers per machine dedicated to the tablet servers, so this makes for
a pretty "manual" method of restarting them.

 

I've looked at the code and the process is executing a Zoolock.trylock
and failing.  It then sleeps and tries again, ultimately terminating the
try lock method after 60 attempts.  I also note that Jira-954 looks
almost exactly the same, if not the same as this error.  However, it's
listed as having been fixed in 1.4.3.

 

Is there some step in configuring either zookeeper or the tsservers that
I've missed that will get rid of this?


Communication issue between zookeeper and accumulo

Posted by Ray Pfaff <ra...@apx-labs.com>.
I'm running zookeeper 1.4.3 and zookeeper 3.3.5 and I seem to have occasional communication errors between the tablet servers and zookeeper.  Sometimes when I restart a tablet server, I get the following error in my log:

INFO : Waiting for tablet server lock

(repeats numerous times)
INFO:Too many retries, exiting.

At this point the tserver process is still running, but it registers as dead to the master.  I have to manually terminate the tserver and then restart it.  Usually by the second or third try, I no longer get the "exiting" error and the server will begin to do work.  I'm running 4 tservers per machine dedicated to the tablet servers, so this makes for a pretty "manual" method of restarting them.

I've looked at the code and the process is executing a Zoolock.trylock and failing.  It then sleeps and tries again, ultimately terminating the try lock method after 60 attempts.  I also note that Jira-954 looks almost exactly the same, if not the same as this error.  However, it's listed as having been fixed in 1.4.3.

Is there some step in configuring either zookeeper or the tsservers that I've missed that will get rid of this?

Re: How to pre-split a table for UUID rowkeys

Posted by "Terry P." <te...@gmail.com>.
Thanks Eric. I came into work today after kicking off a 100 million test
data load and was pleasantly surprised to find the following distribution:

server1: 33.6 million docs,
server2: 32.8 million docs
server3: 33.6 million docs

So it looks like my 5 million record load just didn't get big enough to
need to split (and now I recall that my 5M load was using 500 byte records
rather than 1000 byte as I was later informed is closer to reality).

With a replication factor of 3 and 3 nodes, total consumed space is 248GB,
so compression looks to be about 18% for this random data.  Real data will
compress better I'm sure, as this test data is just the RandomBatchWriter
tweaked to use a UUID as the RowKey to better match our app instead of the
monotonically increasing number.  Hopefully later today the full test suite
will be ready so I can ingest real data.

Thanks for the addsplits syntax example -- I like that idea more than
working with a splits file, as it's easier to script and one less
dependency if you will.  I'll presplit with that and re-test and see if the
distribution occurs sooner than I was seeing last week.

Thanks again Eric, the info you and the other folks on this list give out
every week is invaluable.





On Fri, Aug 2, 2013 at 5:35 PM, Eric Newton <er...@gmail.com> wrote:

> Apparently 5M 1K documents isn't enough to split the tablet.  I'm guessing
> that your documents are compressing well, or you are able to fit them all
> in memory.  You could try flushing the table and see if it splits.
>
> shell > flush -t table -w
>
> Or, you could just add splits if you know the UUIDs are uniformly
> distributed:
>
> shell > addsplits -t table 1 2 3 4 5 6 7 8 9 a b c d e f
>
> Or, if you just want accumulo to split at a certain size under the 1G
> default:
>
> shell > config -t table -s table.split.threshold=10M
>
> -Eric
>
>
>
> On Fri, Aug 2, 2013 at 5:41 PM, Terry P. <te...@gmail.com> wrote:
>
>> Greetings folks,
>> Have a bit of a non-typical Accumulo use case using Accumulo as a backend
>> data store for a search index to provide fault tolerance should the index
>> get corrupted.  Max docs stored in Accumulo will be under 1 billion at full
>> volume.
>>
>> The search index is used to "find" the data a user is interested in, and
>> the search index then retrieves the document from Accumulo using its RowKey
>> which was gotten from the search index.  The RowKey is a java.util.UUID
>> string that has had the '-' dashes stripped out.
>>
>> I have a 3 node cluster and as a quick test have ingested 5 million 1K
>> documents into it, yet they all went to a single TabletServer.  I was kind
>> of surprised -- I knew this would be the case for a row key using a
>> monotonically increasing number, but I thought with a UUID type rowkey the
>> entries would have been spread across the TabletServers at least some, even
>> without pre-splitting the table.
>>
>> Clearly my understanding of how Accumulo spreads the data out is lacking.
>>  Can anyone shed more light on it?  And possibly recommend a table split
>> strategy for a 3-node cluster such as I have described?
>>
>> Many thanks in advance,
>> Terry
>>
>
>

Re: How to pre-split a table for UUID rowkeys

Posted by Eric Newton <er...@gmail.com>.
Apparently 5M 1K documents isn't enough to split the tablet.  I'm guessing
that your documents are compressing well, or you are able to fit them all
in memory.  You could try flushing the table and see if it splits.

shell > flush -t table -w

Or, you could just add splits if you know the UUIDs are uniformly
distributed:

shell > addsplits -t table 1 2 3 4 5 6 7 8 9 a b c d e f

Or, if you just want accumulo to split at a certain size under the 1G
default:

shell > config -t table -s table.split.threshold=10M

-Eric



On Fri, Aug 2, 2013 at 5:41 PM, Terry P. <te...@gmail.com> wrote:

> Greetings folks,
> Have a bit of a non-typical Accumulo use case using Accumulo as a backend
> data store for a search index to provide fault tolerance should the index
> get corrupted.  Max docs stored in Accumulo will be under 1 billion at full
> volume.
>
> The search index is used to "find" the data a user is interested in, and
> the search index then retrieves the document from Accumulo using its RowKey
> which was gotten from the search index.  The RowKey is a java.util.UUID
> string that has had the '-' dashes stripped out.
>
> I have a 3 node cluster and as a quick test have ingested 5 million 1K
> documents into it, yet they all went to a single TabletServer.  I was kind
> of surprised -- I knew this would be the case for a row key using a
> monotonically increasing number, but I thought with a UUID type rowkey the
> entries would have been spread across the TabletServers at least some, even
> without pre-splitting the table.
>
> Clearly my understanding of how Accumulo spreads the data out is lacking.
>  Can anyone shed more light on it?  And possibly recommend a table split
> strategy for a 3-node cluster such as I have described?
>
> Many thanks in advance,
> Terry
>