You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Geoffry Roberts <th...@gmail.com> on 2014/10/06 23:26:38 UTC
Remotely Accumulo
I have been happily working with Acc, but today things changed. No errors
Until now I ran everything server side, which meant the URL was
localhost:2181, and life was good. Today tried running some of the same
code as a remote client, which means <host name>:2181. Things hang when
BatchWriter tries to commit anything and Scan hangs when it tries to
iterate through a Map.
Let's focus on the scan part:
scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
for(Entry<Key,Value> entry : scan) {
def row = entry.getKey().getRow();
def value = entry.getValue();
println "value=" + value;
}
This is what appears in the console :
17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
response for sessionid: 0x148c6f03388005e after 21ms
17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
response for sessionid: 0x148c6f03388005e after 21ms
<and on and on>
The only difference between success and a hang is a URL change, and of
course being remote.
I don't believe this is a firewall issue. I shutdown the firewall.
Am I missing something?
Thanks all.
--
There are ways and there are ways,
Geoffry Roberts
Re: Remotely Accumulo
Posted by Sean Busbey <bu...@cloudera.com>.
Hi Geoffry!
What version of Accumulo are you using?
Can you check your DNS on the cluster?
1) Does 'hostname' return the name you expect from the client? (the client
must be able to see all ZK servers and all tablet servers in the cluster)
2) Do your cluster config files contain the same host names that would be
returned by the above command on each server?
3) Does forward and reverse DNS work for each host for the name referenced
in your config files?
On Mon, Oct 6, 2014 at 4:26 PM, Geoffry Roberts <th...@gmail.com>
wrote:
>
> I have been happily working with Acc, but today things changed. No errors
>
> Until now I ran everything server side, which meant the URL was
> localhost:2181, and life was good. Today tried running some of the same
> code as a remote client, which means <host name>:2181. Things hang when
> BatchWriter tries to commit anything and Scan hangs when it tries to
> iterate through a Map.
>
> Let's focus on the scan part:
>
> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
> for(Entry<Key,Value> entry : scan) {
> def row = entry.getKey().getRow();
> def value = entry.getValue();
> println "value=" + value;
> }
>
> This is what appears in the console :
>
> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> <and on and on>
>
>
> The only difference between success and a hang is a URL change, and of
> course being remote.
>
> I don't believe this is a firewall issue. I shutdown the firewall.
>
> Am I missing something?
>
> Thanks all.
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>
--
Sean
Re: Remotely Accumulo
Posted by Geoffry Roberts <th...@gmail.com>.
So start-here.sh does it. Thanks for pointing that out. I was looking all
through the shell commands .
I did try, from the master, start-all.sh and it worked for starting the
tserver, but I noticed that on the master, it increased the number of
processes labeled "Main" from the usual five to seven.
>From accumulo-site.xml, everything memory related:
<property>
<name>tserver.memory.maps.max</name>
<value>256M</value>
</property>
<property>
<name>tserver.memory.maps.native.enabled</name>
<value>false</value>
</property>
<property>
<name>tserver.cache.data.size</name>
<value>50M</value>
</property>
<property>
<name>tserver.cache.index.size</name>
<value>100M</value>
</property>
<property>
<name>tserver.walog.max.size</name>
<value>512M</value>
</property>
On Thu, Oct 9, 2014 at 10:54 AM, Josh Elser <jo...@gmail.com> wrote:
> You can use start-here.sh on the host in question or `start-server.sh
> $hostname tserver`. FWIW, re-invoking start-all should just ignored the
> hosts which already have processes running and just start a tserver on the
> host that died.
>
> 2G should be enough to get a connector and read a table. TBH, 256M should
> be enough for that.
>
> Also, the JVM OOME doesn't include timestamps, there's isn't much more to
> glean from that message other than "it died because it ran out of heap".
>
> What does your accumulo-site.xml look like?
>
> Geoffry Roberts wrote:
>
>> I found the message in tserver*.out. tserver*.err has 0 in it.
>>
>> I posted last night, life was good, sat down this morning and saw that
>> another tserver had crashed, over night, with no activity. ?? In
>> tserver*.out it again says out of heap space.
>>
>> ACCUMULO_TSERVER_OPTS=-Xmx2G -Xms1G. I would have thought it sufficient.
>>
>> The fact that the log entries lack timestamps, but have hashmarks makes
>> makes me wonder if I am reading things correctly.
>>
>> #
>>
>> # java.lang.OutOfMemoryError: Java heap space
>>
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>
>> # Executing /bin/sh -c "kill -9 3241"...
>>
>>
>> Is there a way to start a particular tablet server?
>>
>>
>> On Wed, Oct 8, 2014 at 6:55 PM, Eric Newton <eric.newton@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> Did you find the message in the tserver*.out, terver*.err or the
>> monitor page?
>>
>> (Thanks for the follow-up message.)
>>
>> On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts
>> <threadedblue@gmail.com <ma...@gmail.com>> wrote:
>>
>> Just for the record, I finally got to the bottom of things. One
>> of my Tservers was running out of memory. I hadn't noticed. I
>> had my SA allocate a lttle more--each node now has 6G up from
>> 2G--and things are working better.
>>
>> On Oct 8, 2014 10:09 AM, "Josh Elser" <josh.elser@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> Jstack is a tool which can be used to tell a java process to
>> dump the current stack traces for all of its threads. It's
>> usually included with the JDK. `kill -3 $pid` also does the
>> same. If the output can't be respected automatically to your
>> shell, check the stdout for the process you gave as an
>> argument.
>>
>> When your client is sitting waiting on data from the
>> tabletserver, you can get the stack traces from the tserver
>> and you should be able to find a thread with scan in the
>> name, along with your client's IP, and we can help debug
>> exactly what the server is doing that is preventing it from
>> returning data to your client.
>>
>> On Oct 8, 2014 9:43 AM, "Geoffry Roberts"
>> <threadedblue@gmail.com <ma...@gmail.com>>
>> wrote:
>>
>> Thanks Josh. But what do you mean my "jstack'ing"? I'm
>> unfamiliar with that term. A better question would be
>> how can one troubleshoot such a thing?
>>
>> btw
>> I am the sole user on this cluster.
>>
>> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser
>> <josh.elser@gmail.com <ma...@gmail.com>>
>> wrote:
>>
>> Ok, this record:
>>
>> tcp 0 0 0.0.0.0:9997
>> <http://0.0.0.0:9997> 0.0.0.0:*
>> LISTEN
>>
>> Means that your is listening on the correct port on
>> all interfaces.
>> There shouldn't be issues connecting to the tserver.
>> This is also
>> confirmed by the fact that you authenticated and got
>> a Connector (this
>> does an RPC to the tserver).
>>
>> So, your tserver is up, and your client can
>> communicate with it. The
>> real question is why is the scan hanging. Perhaps
>> jstack'ing the
>> tserver when your client is blocked waiting for
>> results.
>>
>> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts
>> <threadedblue@gmail.com
>> <ma...@gmail.com>> wrote:
>> > "...it's when
>> > you make a Connector, and your client will talk
>> to a tabletserver to
>> > authenticate, that your program should hang. It
>> would be good to
>> > verify that."
>> >
>> >
>> > My program should hang? Would you expand? That
>> is exactly what it is
>> > doing. I am able to get a connector. But when I
>> try to iterate the result
>> > of a scan, that's when it hangs.
>> >
>> >
>> >
>> >
>> > Here's what comes from netstat:
>> >
>> >
>> > $ netstat -na | grep 9997
>> >
>> > tcp 0 0 0.0.0.0:9997
>> <http://0.0.0.0:9997> 0.0.0.0:*
>> > LISTEN
>> >
>> > tcp 0 0 204.9.140.36:35679
>> <http://204.9.140.36:35679> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53146
>> <http://204.9.140.36:53146> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33896
>> <http://204.9.140.36:33896> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53282
>> <http://204.9.140.36:53282> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53188
>> <http://204.9.140.36:53188> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35609
>> <http://204.9.140.36:35609> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33901
>> <http://204.9.140.36:33901> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35588
>> <http://204.9.140.36:35588> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33877
>> <http://204.9.140.36:33877> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33946
>> <http://204.9.140.36:33946> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53167
>> <http://204.9.140.36:53167> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33949
>> <http://204.9.140.36:33949> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > ESTABLISHED
>> >
>> > tcp 0 0 204.9.140.36:35546
>> <http://204.9.140.36:35546> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33852
>> <http://204.9.140.36:33852> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53125
>> <http://204.9.140.36:53125> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33922
>> <http://204.9.140.36:33922> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33747
>> <http://204.9.140.36:33747> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33961
>> <http://204.9.140.36:33961> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33793
>> <http://204.9.140.36:33793> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35768
>> <http://204.9.140.36:35768> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33917
>> <http://204.9.140.36:33917> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33814
>> <http://204.9.140.36:33814> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35567
>> <http://204.9.140.36:35567> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33444
>> <http://204.9.140.36:33444> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > FIN_WAIT2
>> >
>> > tcp 0 0 204.9.140.36:35701
>> <http://204.9.140.36:35701> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33969
>> <http://204.9.140.36:33969> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53258
>> <http://204.9.140.36:53258> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33831
>> <http://204.9.140.36:33831> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53210
>> <http://204.9.140.36:53210> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53104
>> <http://204.9.140.36:53104> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33789
>> <http://204.9.140.36:33789> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33856
>> <http://204.9.140.36:33856> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53237
>> <http://204.9.140.36:53237> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33835
>> <http://204.9.140.36:33835> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35651
>> <http://204.9.140.36:35651> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33938
>> <http://204.9.140.36:33938> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33041
>> <http://204.9.140.36:33041> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > ESTABLISHED
>> >
>> > tcp 0 0 204.9.140.36:53285
>> <http://204.9.140.36:53285> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53305
>> <http://204.9.140.36:53305> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33768
>> <http://204.9.140.36:33768> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35630
>> <http://204.9.140.36:35630> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33754
>> <http://204.9.140.36:33754> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35745
>> <http://204.9.140.36:35745> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35724
>> <http://204.9.140.36:35724> 204.9.140.36:9997
>> <http://204.9.140.36:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:9997
>> <http://204.9.140.36:9997> 204.9.140.36:33041
>> <http://204.9.140.36:33041>
>> > ESTABLISHED
>> >
>> > tcp 0 0 204.9.140.36:53083
>> <http://204.9.140.36:53083> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:50623
>> <http://204.9.140.36:50623> 204.9.140.37:9997
>> <http://204.9.140.37:9997>
>> > ESTABLISHED
>> >
>> > tcp 0 0 204.9.140.36:33772
>> <http://204.9.140.36:33772> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33732
>> <http://204.9.140.36:33732> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33874
>> <http://204.9.140.36:33874> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33810
>> <http://204.9.140.36:33810> 204.9.140.38:9997
>> <http://204.9.140.38:9997>
>> > TIME_WAIT
>> >
>> >
>> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser
>> <josh.elser@gmail.com <ma...@gmail.com>>
>>
>> wrote:
>> >>
>> >> Can you provide the output from netstat, lsof or
>> /proc/$pid/fd for the
>> >> tserver? Assuming you haven't altered
>> tserv.port.client in
>> >> accumulo-site.xml, we want the line for port 9997.
>> >>
>> >> From my laptop running a tserver on localhost:
>> >>
>> >> $ netstat -na | grep 9997
>> >> tcp4 0 0 127.0.0.1.9997 *.*
>> LISTEN
>> >>
>> >> Depending on the tool you use, you can grep out
>> the pid of the tserver
>> >> or just that port itself.
>> >>
>> >> Just so you know, ZK binds to all available
>> interfaces when it starts,
>> >> so it should work seamlessly with localhost or
>> the FQDN for the host.
>> >> As such, it shouldn't matter what you provide to
>> the
>> >> ZooKeeperInstance. That should connect in all
>> cases for you, it's when
>> >> you make a Connector, and your client will talk
>> to a tabletserver to
>> >> authenticate, that your program should hang. It
>> would be good to
>> >> verify that.
>> >>
>> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts
>> <threadedblue@gmail.com <mailto:
>> threadedblue@gmail.com>>
>> >> wrote:
>> >> > All,
>> >> >
>> >> > Thanks for the responses.
>> >> >
>> >> > Is this a problem for Accumulo?
>> >> > Reverse DNS is yielding my ISP's host name.
>> You know the drill, my IP in
>> >> > reverse followed by their domain name, as
>> opposed to my FQDN, which what
>> >> > I
>> >> > use in my config files.
>> >> >
>> >> > Running Accumulo 1.5.1
>> >> > I have only one interface.
>> >> > I have the FQDN in both master and slaves
>> files for both Hadoop and
>> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml
>> where the Zookeepers are
>> >> > referenced.
>> >> > Also, I am passing in all Zk FQDN when I
>> instantiate ZookeeperInstance.
>> >> > Forward DNS works
>> >> > Reverse DNS... well (See above).
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs
>> <afuchs@apache.org <ma...@apache.org>> wrote:
>> >> >>
>> >> >> Accumulo tservers typically listen on a
>> single interface. If you have a
>> >> >> server with multiple interfaces (e.g.
>> loopback and eth0), you might
>> >> >> have a
>> >> >> problem in which the tablet servers are not
>> listening on externally
>> >> >> reachable interfaces. Tablet servers will
>> list the interfaces that they
>> >> >> are
>> >> >> listening to when they boot, and you can also
>> use tools like lsof to
>> >> >> find
>> >> >> them.
>> >> >>
>> >> >> If that is indeed the problem, then you might
>> just need to change you
>> >> >> conf/slaves file to use <hostname> instead of
>> localhost, and then
>> >> >> restart.
>> >> >>
>> >> >> Adam
>> >> >>
>> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts"
>> <threadedblue@gmail.com <mailto:
>> threadedblue@gmail.com>>
>>
>> >> >> wrote:
>> >> >>>
>> >> >>>
>> >> >>> I have been happily working with Acc, but
>> today things changed. No
>> >> >>> errors
>> >> >>>
>> >> >>> Until now I ran everything server side,
>> which meant the URL was
>> >> >>> localhost:2181, and life was good. Today
>> tried running some of the
>> >> >>> same
>> >> >>> code as a remote client, which means <host
>> name>:2181. Things hang
>> >> >>> when
>> >> >>> BatchWriter tries to commit anything and
>> Scan hangs when it tries to
>> >> >>> iterate
>> >> >>> through a Map.
>> >> >>>
>> >> >>> Let's focus on the scan part:
>> >> >>>
>> >> >>> scan.fetchColumnFamily(new Text("colfY"));
>> // This executes then
>> >> >>> hangs.
>> >> >>> for(Entry<Key,Value> entry : scan) {
>> >> >>> def row = entry.getKey().getRow();
>> >> >>> def value = entry.getValue();
>> >> >>> println "value=" + value;
>> >> >>> }
>> >> >>>
>> >> >>> This is what appears in the console :
>> >> >>>
>> >> >>> 17:22:39.802 C{0} M DEBUG
>> org.apache.zookeeper.ClientCnxn - Got ping
>> >> >>> response for sessionid: 0x148c6f03388005e
>> after 21ms
>> >> >>>
>> >> >>> 17:22:49.803 C{0} M DEBUG
>> org.apache.zookeeper.ClientCnxn - Got ping
>> >> >>> response for sessionid: 0x148c6f03388005e
>> after 21ms
>> >> >>>
>> >> >>> <and on and on>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> The only difference between success and a
>> hang is a URL change, and of
>> >> >>> course being remote.
>> >> >>>
>> >> >>> I don't believe this is a firewall issue. I
>> shutdown the firewall.
>> >> >>>
>> >> >>> Am I missing something?
>> >> >>>
>> >> >>> Thanks all.
>> >> >>>
>> >> >>> --
>> >> >>> There are ways and there are ways,
>> >> >>>
>> >> >>> Geoffry Roberts
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > There are ways and there are ways,
>> >> >
>> >> > Geoffry Roberts
>> >
>> >
>> >
>> >
>> > --
>> > There are ways and there are ways,
>> >
>> > Geoffry Roberts
>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
--
There are ways and there are ways,
Geoffry Roberts
Re: Remotely Accumulo
Posted by Josh Elser <jo...@gmail.com>.
You can use start-here.sh on the host in question or `start-server.sh
$hostname tserver`. FWIW, re-invoking start-all should just ignored the
hosts which already have processes running and just start a tserver on
the host that died.
2G should be enough to get a connector and read a table. TBH, 256M
should be enough for that.
Also, the JVM OOME doesn't include timestamps, there's isn't much more
to glean from that message other than "it died because it ran out of heap".
What does your accumulo-site.xml look like?
Geoffry Roberts wrote:
> I found the message in tserver*.out. tserver*.err has 0 in it.
>
> I posted last night, life was good, sat down this morning and saw that
> another tserver had crashed, over night, with no activity. ?? In
> tserver*.out it again says out of heap space.
>
> ACCUMULO_TSERVER_OPTS=-Xmx2G -Xms1G. I would have thought it sufficient.
>
> The fact that the log entries lack timestamps, but have hashmarks makes
> makes me wonder if I am reading things correctly.
>
> #
>
> # java.lang.OutOfMemoryError: Java heap space
>
> # -XX:OnOutOfMemoryError="kill -9 %p"
>
> # Executing /bin/sh -c "kill -9 3241"...
>
>
> Is there a way to start a particular tablet server?
>
>
> On Wed, Oct 8, 2014 at 6:55 PM, Eric Newton <eric.newton@gmail.com
> <ma...@gmail.com>> wrote:
>
> Did you find the message in the tserver*.out, terver*.err or the
> monitor page?
>
> (Thanks for the follow-up message.)
>
> On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts
> <threadedblue@gmail.com <ma...@gmail.com>> wrote:
>
> Just for the record, I finally got to the bottom of things. One
> of my Tservers was running out of memory. I hadn't noticed. I
> had my SA allocate a lttle more--each node now has 6G up from
> 2G--and things are working better.
>
> On Oct 8, 2014 10:09 AM, "Josh Elser" <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
> Jstack is a tool which can be used to tell a java process to
> dump the current stack traces for all of its threads. It's
> usually included with the JDK. `kill -3 $pid` also does the
> same. If the output can't be respected automatically to your
> shell, check the stdout for the process you gave as an
> argument.
>
> When your client is sitting waiting on data from the
> tabletserver, you can get the stack traces from the tserver
> and you should be able to find a thread with scan in the
> name, along with your client's IP, and we can help debug
> exactly what the server is doing that is preventing it from
> returning data to your client.
>
> On Oct 8, 2014 9:43 AM, "Geoffry Roberts"
> <threadedblue@gmail.com <ma...@gmail.com>> wrote:
>
> Thanks Josh. But what do you mean my "jstack'ing"? I'm
> unfamiliar with that term. A better question would be
> how can one troubleshoot such a thing?
>
> btw
> I am the sole user on this cluster.
>
> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser
> <josh.elser@gmail.com <ma...@gmail.com>> wrote:
>
> Ok, this record:
>
> tcp 0 0 0.0.0.0:9997
> <http://0.0.0.0:9997> 0.0.0.0:*
> LISTEN
>
> Means that your is listening on the correct port on
> all interfaces.
> There shouldn't be issues connecting to the tserver.
> This is also
> confirmed by the fact that you authenticated and got
> a Connector (this
> does an RPC to the tserver).
>
> So, your tserver is up, and your client can
> communicate with it. The
> real question is why is the scan hanging. Perhaps
> jstack'ing the
> tserver when your client is blocked waiting for results.
>
> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts
> <threadedblue@gmail.com
> <ma...@gmail.com>> wrote:
> > "...it's when
> > you make a Connector, and your client will talk
> to a tabletserver to
> > authenticate, that your program should hang. It
> would be good to
> > verify that."
> >
> >
> > My program should hang? Would you expand? That
> is exactly what it is
> > doing. I am able to get a connector. But when I
> try to iterate the result
> > of a scan, that's when it hangs.
> >
> >
> >
> >
> > Here's what comes from netstat:
> >
> >
> > $ netstat -na | grep 9997
> >
> > tcp 0 0 0.0.0.0:9997
> <http://0.0.0.0:9997> 0.0.0.0:*
> > LISTEN
> >
> > tcp 0 0 204.9.140.36:35679
> <http://204.9.140.36:35679> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53146
> <http://204.9.140.36:53146> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33896
> <http://204.9.140.36:33896> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53282
> <http://204.9.140.36:53282> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53188
> <http://204.9.140.36:53188> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35609
> <http://204.9.140.36:35609> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33901
> <http://204.9.140.36:33901> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35588
> <http://204.9.140.36:35588> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33877
> <http://204.9.140.36:33877> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33946
> <http://204.9.140.36:33946> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53167
> <http://204.9.140.36:53167> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33949
> <http://204.9.140.36:33949> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > ESTABLISHED
> >
> > tcp 0 0 204.9.140.36:35546
> <http://204.9.140.36:35546> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33852
> <http://204.9.140.36:33852> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53125
> <http://204.9.140.36:53125> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33922
> <http://204.9.140.36:33922> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33747
> <http://204.9.140.36:33747> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33961
> <http://204.9.140.36:33961> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33793
> <http://204.9.140.36:33793> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35768
> <http://204.9.140.36:35768> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33917
> <http://204.9.140.36:33917> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33814
> <http://204.9.140.36:33814> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35567
> <http://204.9.140.36:35567> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33444
> <http://204.9.140.36:33444> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > FIN_WAIT2
> >
> > tcp 0 0 204.9.140.36:35701
> <http://204.9.140.36:35701> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33969
> <http://204.9.140.36:33969> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53258
> <http://204.9.140.36:53258> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33831
> <http://204.9.140.36:33831> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53210
> <http://204.9.140.36:53210> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53104
> <http://204.9.140.36:53104> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33789
> <http://204.9.140.36:33789> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33856
> <http://204.9.140.36:33856> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53237
> <http://204.9.140.36:53237> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33835
> <http://204.9.140.36:33835> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35651
> <http://204.9.140.36:35651> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33938
> <http://204.9.140.36:33938> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33041
> <http://204.9.140.36:33041> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > ESTABLISHED
> >
> > tcp 0 0 204.9.140.36:53285
> <http://204.9.140.36:53285> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53305
> <http://204.9.140.36:53305> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33768
> <http://204.9.140.36:33768> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35630
> <http://204.9.140.36:35630> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33754
> <http://204.9.140.36:33754> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35745
> <http://204.9.140.36:35745> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35724
> <http://204.9.140.36:35724> 204.9.140.36:9997
> <http://204.9.140.36:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:9997
> <http://204.9.140.36:9997> 204.9.140.36:33041
> <http://204.9.140.36:33041>
> > ESTABLISHED
> >
> > tcp 0 0 204.9.140.36:53083
> <http://204.9.140.36:53083> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:50623
> <http://204.9.140.36:50623> 204.9.140.37:9997
> <http://204.9.140.37:9997>
> > ESTABLISHED
> >
> > tcp 0 0 204.9.140.36:33772
> <http://204.9.140.36:33772> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33732
> <http://204.9.140.36:33732> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33874
> <http://204.9.140.36:33874> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33810
> <http://204.9.140.36:33810> 204.9.140.38:9997
> <http://204.9.140.38:9997>
> > TIME_WAIT
> >
> >
> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser
> <josh.elser@gmail.com <ma...@gmail.com>>
> wrote:
> >>
> >> Can you provide the output from netstat, lsof or
> /proc/$pid/fd for the
> >> tserver? Assuming you haven't altered
> tserv.port.client in
> >> accumulo-site.xml, we want the line for port 9997.
> >>
> >> From my laptop running a tserver on localhost:
> >>
> >> $ netstat -na | grep 9997
> >> tcp4 0 0 127.0.0.1.9997 *.*
> LISTEN
> >>
> >> Depending on the tool you use, you can grep out
> the pid of the tserver
> >> or just that port itself.
> >>
> >> Just so you know, ZK binds to all available
> interfaces when it starts,
> >> so it should work seamlessly with localhost or
> the FQDN for the host.
> >> As such, it shouldn't matter what you provide to the
> >> ZooKeeperInstance. That should connect in all
> cases for you, it's when
> >> you make a Connector, and your client will talk
> to a tabletserver to
> >> authenticate, that your program should hang. It
> would be good to
> >> verify that.
> >>
> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts
> <threadedblue@gmail.com <ma...@gmail.com>>
> >> wrote:
> >> > All,
> >> >
> >> > Thanks for the responses.
> >> >
> >> > Is this a problem for Accumulo?
> >> > Reverse DNS is yielding my ISP's host name.
> You know the drill, my IP in
> >> > reverse followed by their domain name, as
> opposed to my FQDN, which what
> >> > I
> >> > use in my config files.
> >> >
> >> > Running Accumulo 1.5.1
> >> > I have only one interface.
> >> > I have the FQDN in both master and slaves
> files for both Hadoop and
> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml
> where the Zookeepers are
> >> > referenced.
> >> > Also, I am passing in all Zk FQDN when I
> instantiate ZookeeperInstance.
> >> > Forward DNS works
> >> > Reverse DNS... well (See above).
> >> >
> >> >
> >> >
> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs
> <afuchs@apache.org <ma...@apache.org>> wrote:
> >> >>
> >> >> Accumulo tservers typically listen on a
> single interface. If you have a
> >> >> server with multiple interfaces (e.g.
> loopback and eth0), you might
> >> >> have a
> >> >> problem in which the tablet servers are not
> listening on externally
> >> >> reachable interfaces. Tablet servers will
> list the interfaces that they
> >> >> are
> >> >> listening to when they boot, and you can also
> use tools like lsof to
> >> >> find
> >> >> them.
> >> >>
> >> >> If that is indeed the problem, then you might
> just need to change you
> >> >> conf/slaves file to use <hostname> instead of
> localhost, and then
> >> >> restart.
> >> >>
> >> >> Adam
> >> >>
> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts"
> <threadedblue@gmail.com <ma...@gmail.com>>
> >> >> wrote:
> >> >>>
> >> >>>
> >> >>> I have been happily working with Acc, but
> today things changed. No
> >> >>> errors
> >> >>>
> >> >>> Until now I ran everything server side,
> which meant the URL was
> >> >>> localhost:2181, and life was good. Today
> tried running some of the
> >> >>> same
> >> >>> code as a remote client, which means <host
> name>:2181. Things hang
> >> >>> when
> >> >>> BatchWriter tries to commit anything and
> Scan hangs when it tries to
> >> >>> iterate
> >> >>> through a Map.
> >> >>>
> >> >>> Let's focus on the scan part:
> >> >>>
> >> >>> scan.fetchColumnFamily(new Text("colfY"));
> // This executes then
> >> >>> hangs.
> >> >>> for(Entry<Key,Value> entry : scan) {
> >> >>> def row = entry.getKey().getRow();
> >> >>> def value = entry.getValue();
> >> >>> println "value=" + value;
> >> >>> }
> >> >>>
> >> >>> This is what appears in the console :
> >> >>>
> >> >>> 17:22:39.802 C{0} M DEBUG
> org.apache.zookeeper.ClientCnxn - Got ping
> >> >>> response for sessionid: 0x148c6f03388005e
> after 21ms
> >> >>>
> >> >>> 17:22:49.803 C{0} M DEBUG
> org.apache.zookeeper.ClientCnxn - Got ping
> >> >>> response for sessionid: 0x148c6f03388005e
> after 21ms
> >> >>>
> >> >>> <and on and on>
> >> >>>
> >> >>>
> >> >>>
> >> >>> The only difference between success and a
> hang is a URL change, and of
> >> >>> course being remote.
> >> >>>
> >> >>> I don't believe this is a firewall issue. I
> shutdown the firewall.
> >> >>>
> >> >>> Am I missing something?
> >> >>>
> >> >>> Thanks all.
> >> >>>
> >> >>> --
> >> >>> There are ways and there are ways,
> >> >>>
> >> >>> Geoffry Roberts
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > There are ways and there are ways,
> >> >
> >> > Geoffry Roberts
> >
> >
> >
> >
> > --
> > There are ways and there are ways,
> >
> > Geoffry Roberts
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
Re: Remotely Accumulo
Posted by Geoffry Roberts <th...@gmail.com>.
I found the message in tserver*.out. tserver*.err has 0 in it.
I posted last night, life was good, sat down this morning and saw that
another tserver had crashed, over night, with no activity. ?? In tserver*.out
it again says out of heap space.
ACCUMULO_TSERVER_OPTS=-Xmx2G -Xms1G. I would have thought it sufficient.
The fact that the log entries lack timestamps, but have hashmarks makes
makes me wonder if I am reading things correctly.
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 3241"...
Is there a way to start a particular tablet server?
On Wed, Oct 8, 2014 at 6:55 PM, Eric Newton <er...@gmail.com> wrote:
> Did you find the message in the tserver*.out, terver*.err or the monitor
> page?
>
> (Thanks for the follow-up message.)
>
> On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts <th...@gmail.com>
> wrote:
>
>> Just for the record, I finally got to the bottom of things. One of my
>> Tservers was running out of memory. I hadn't noticed. I had my SA
>> allocate a lttle more--each node now has 6G up from 2G--and things are
>> working better.
>> On Oct 8, 2014 10:09 AM, "Josh Elser" <jo...@gmail.com> wrote:
>>
>>> Jstack is a tool which can be used to tell a java process to dump the
>>> current stack traces for all of its threads. It's usually included with the
>>> JDK. `kill -3 $pid` also does the same. If the output can't be respected
>>> automatically to your shell, check the stdout for the process you gave as
>>> an argument.
>>>
>>> When your client is sitting waiting on data from the tabletserver, you
>>> can get the stack traces from the tserver and you should be able to find a
>>> thread with scan in the name, along with your client's IP, and we can help
>>> debug exactly what the server is doing that is preventing it from returning
>>> data to your client.
>>> On Oct 8, 2014 9:43 AM, "Geoffry Roberts" <th...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Josh. But what do you mean my "jstack'ing"? I'm unfamiliar
>>>> with that term. A better question would be how can one troubleshoot such a
>>>> thing?
>>>>
>>>> btw
>>>> I am the sole user on this cluster.
>>>>
>>>> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Ok, this record:
>>>>>
>>>>> tcp 0 0 0.0.0.0:9997 0.0.0.0:*
>>>>> LISTEN
>>>>>
>>>>> Means that your is listening on the correct port on all interfaces.
>>>>> There shouldn't be issues connecting to the tserver. This is also
>>>>> confirmed by the fact that you authenticated and got a Connector (this
>>>>> does an RPC to the tserver).
>>>>>
>>>>> So, your tserver is up, and your client can communicate with it. The
>>>>> real question is why is the scan hanging. Perhaps jstack'ing the
>>>>> tserver when your client is blocked waiting for results.
>>>>>
>>>>> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <
>>>>> threadedblue@gmail.com> wrote:
>>>>> > "...it's when
>>>>> > you make a Connector, and your client will talk to a tabletserver to
>>>>> > authenticate, that your program should hang. It would be good to
>>>>> > verify that."
>>>>> >
>>>>> >
>>>>> > My program should hang? Would you expand? That is exactly what it
>>>>> is
>>>>> > doing. I am able to get a connector. But when I try to iterate the
>>>>> result
>>>>> > of a scan, that's when it hangs.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Here's what comes from netstat:
>>>>> >
>>>>> >
>>>>> > $ netstat -na | grep 9997
>>>>> >
>>>>> > tcp 0 0 0.0.0.0:9997 0.0.0.0:*
>>>>> > LISTEN
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35679 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53146 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33896 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53282 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53188 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35609 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33901 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35588 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33877 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33946 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53167 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33949 204.9.140.38:9997
>>>>> > ESTABLISHED
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35546 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33852 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53125 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33922 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33747 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33961 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33793 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35768 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33917 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33814 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35567 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33444 204.9.140.38:9997
>>>>> > FIN_WAIT2
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35701 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33969 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53258 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33831 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53210 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53104 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33789 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33856 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53237 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33835 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35651 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33938 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33041 204.9.140.36:9997
>>>>> > ESTABLISHED
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53285 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53305 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33768 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35630 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33754 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35745 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:35724 204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:9997 204.9.140.36:33041
>>>>> > ESTABLISHED
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:53083 204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:50623 204.9.140.37:9997
>>>>> > ESTABLISHED
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33772 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33732 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33874 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp 0 0 204.9.140.36:33810 204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> >
>>>>> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for
>>>>> the
>>>>> >> tserver? Assuming you haven't altered tserv.port.client in
>>>>> >> accumulo-site.xml, we want the line for port 9997.
>>>>> >>
>>>>> >> From my laptop running a tserver on localhost:
>>>>> >>
>>>>> >> $ netstat -na | grep 9997
>>>>> >> tcp4 0 0 127.0.0.1.9997 *.*
>>>>> LISTEN
>>>>> >>
>>>>> >> Depending on the tool you use, you can grep out the pid of the
>>>>> tserver
>>>>> >> or just that port itself.
>>>>> >>
>>>>> >> Just so you know, ZK binds to all available interfaces when it
>>>>> starts,
>>>>> >> so it should work seamlessly with localhost or the FQDN for the
>>>>> host.
>>>>> >> As such, it shouldn't matter what you provide to the
>>>>> >> ZooKeeperInstance. That should connect in all cases for you, it's
>>>>> when
>>>>> >> you make a Connector, and your client will talk to a tabletserver to
>>>>> >> authenticate, that your program should hang. It would be good to
>>>>> >> verify that.
>>>>> >>
>>>>> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
>>>>> threadedblue@gmail.com>
>>>>> >> wrote:
>>>>> >> > All,
>>>>> >> >
>>>>> >> > Thanks for the responses.
>>>>> >> >
>>>>> >> > Is this a problem for Accumulo?
>>>>> >> > Reverse DNS is yielding my ISP's host name. You know the drill,
>>>>> my IP in
>>>>> >> > reverse followed by their domain name, as opposed to my FQDN,
>>>>> which what
>>>>> >> > I
>>>>> >> > use in my config files.
>>>>> >> >
>>>>> >> > Running Accumulo 1.5.1
>>>>> >> > I have only one interface.
>>>>> >> > I have the FQDN in both master and slaves files for both Hadoop
>>>>> and
>>>>> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the
>>>>> Zookeepers are
>>>>> >> > referenced.
>>>>> >> > Also, I am passing in all Zk FQDN when I instantiate
>>>>> ZookeeperInstance.
>>>>> >> > Forward DNS works
>>>>> >> > Reverse DNS... well (See above).
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> Accumulo tservers typically listen on a single interface. If you
>>>>> have a
>>>>> >> >> server with multiple interfaces (e.g. loopback and eth0), you
>>>>> might
>>>>> >> >> have a
>>>>> >> >> problem in which the tablet servers are not listening on
>>>>> externally
>>>>> >> >> reachable interfaces. Tablet servers will list the interfaces
>>>>> that they
>>>>> >> >> are
>>>>> >> >> listening to when they boot, and you can also use tools like
>>>>> lsof to
>>>>> >> >> find
>>>>> >> >> them.
>>>>> >> >>
>>>>> >> >> If that is indeed the problem, then you might just need to
>>>>> change you
>>>>> >> >> conf/slaves file to use <hostname> instead of localhost, and then
>>>>> >> >> restart.
>>>>> >> >>
>>>>> >> >> Adam
>>>>> >> >>
>>>>> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <
>>>>> threadedblue@gmail.com>
>>>>> >> >> wrote:
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> I have been happily working with Acc, but today things
>>>>> changed. No
>>>>> >> >>> errors
>>>>> >> >>>
>>>>> >> >>> Until now I ran everything server side, which meant the URL was
>>>>> >> >>> localhost:2181, and life was good. Today tried running some of
>>>>> the
>>>>> >> >>> same
>>>>> >> >>> code as a remote client, which means <host name>:2181. Things
>>>>> hang
>>>>> >> >>> when
>>>>> >> >>> BatchWriter tries to commit anything and Scan hangs when it
>>>>> tries to
>>>>> >> >>> iterate
>>>>> >> >>> through a Map.
>>>>> >> >>>
>>>>> >> >>> Let's focus on the scan part:
>>>>> >> >>>
>>>>> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>>>>> >> >>> hangs.
>>>>> >> >>> for(Entry<Key,Value> entry : scan) {
>>>>> >> >>> def row = entry.getKey().getRow();
>>>>> >> >>> def value = entry.getValue();
>>>>> >> >>> println "value=" + value;
>>>>> >> >>> }
>>>>> >> >>>
>>>>> >> >>> This is what appears in the console :
>>>>> >> >>>
>>>>> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>>>> ping
>>>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>>>> >> >>>
>>>>> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>>>> ping
>>>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>>>> >> >>>
>>>>> >> >>> <and on and on>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> The only difference between success and a hang is a URL change,
>>>>> and of
>>>>> >> >>> course being remote.
>>>>> >> >>>
>>>>> >> >>> I don't believe this is a firewall issue. I shutdown the
>>>>> firewall.
>>>>> >> >>>
>>>>> >> >>> Am I missing something?
>>>>> >> >>>
>>>>> >> >>> Thanks all.
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> There are ways and there are ways,
>>>>> >> >>>
>>>>> >> >>> Geoffry Roberts
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> > There are ways and there are ways,
>>>>> >> >
>>>>> >> > Geoffry Roberts
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > There are ways and there are ways,
>>>>> >
>>>>> > Geoffry Roberts
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> There are ways and there are ways,
>>>>
>>>> Geoffry Roberts
>>>>
>>>
>
--
There are ways and there are ways,
Geoffry Roberts
Re: Remotely Accumulo
Posted by Eric Newton <er...@gmail.com>.
Did you find the message in the tserver*.out, terver*.err or the monitor
page?
(Thanks for the follow-up message.)
On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts <th...@gmail.com>
wrote:
> Just for the record, I finally got to the bottom of things. One of my
> Tservers was running out of memory. I hadn't noticed. I had my SA
> allocate a lttle more--each node now has 6G up from 2G--and things are
> working better.
> On Oct 8, 2014 10:09 AM, "Josh Elser" <jo...@gmail.com> wrote:
>
>> Jstack is a tool which can be used to tell a java process to dump the
>> current stack traces for all of its threads. It's usually included with the
>> JDK. `kill -3 $pid` also does the same. If the output can't be respected
>> automatically to your shell, check the stdout for the process you gave as
>> an argument.
>>
>> When your client is sitting waiting on data from the tabletserver, you
>> can get the stack traces from the tserver and you should be able to find a
>> thread with scan in the name, along with your client's IP, and we can help
>> debug exactly what the server is doing that is preventing it from returning
>> data to your client.
>> On Oct 8, 2014 9:43 AM, "Geoffry Roberts" <th...@gmail.com> wrote:
>>
>>> Thanks Josh. But what do you mean my "jstack'ing"? I'm unfamiliar
>>> with that term. A better question would be how can one troubleshoot such a
>>> thing?
>>>
>>> btw
>>> I am the sole user on this cluster.
>>>
>>> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com> wrote:
>>>
>>>> Ok, this record:
>>>>
>>>> tcp 0 0 0.0.0.0:9997 0.0.0.0:*
>>>> LISTEN
>>>>
>>>> Means that your is listening on the correct port on all interfaces.
>>>> There shouldn't be issues connecting to the tserver. This is also
>>>> confirmed by the fact that you authenticated and got a Connector (this
>>>> does an RPC to the tserver).
>>>>
>>>> So, your tserver is up, and your client can communicate with it. The
>>>> real question is why is the scan hanging. Perhaps jstack'ing the
>>>> tserver when your client is blocked waiting for results.
>>>>
>>>> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com>
>>>> wrote:
>>>> > "...it's when
>>>> > you make a Connector, and your client will talk to a tabletserver to
>>>> > authenticate, that your program should hang. It would be good to
>>>> > verify that."
>>>> >
>>>> >
>>>> > My program should hang? Would you expand? That is exactly what it is
>>>> > doing. I am able to get a connector. But when I try to iterate the
>>>> result
>>>> > of a scan, that's when it hangs.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Here's what comes from netstat:
>>>> >
>>>> >
>>>> > $ netstat -na | grep 9997
>>>> >
>>>> > tcp 0 0 0.0.0.0:9997 0.0.0.0:*
>>>> > LISTEN
>>>> >
>>>> > tcp 0 0 204.9.140.36:35679 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53146 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33896 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53282 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53188 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:35609 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33901 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:35588 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33877 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33946 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53167 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33949 204.9.140.38:9997
>>>> > ESTABLISHED
>>>> >
>>>> > tcp 0 0 204.9.140.36:35546 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33852 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53125 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33922 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33747 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33961 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33793 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:35768 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33917 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33814 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:35567 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33444 204.9.140.38:9997
>>>> > FIN_WAIT2
>>>> >
>>>> > tcp 0 0 204.9.140.36:35701 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33969 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53258 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33831 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53210 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53104 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33789 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33856 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53237 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33835 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:35651 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33938 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33041 204.9.140.36:9997
>>>> > ESTABLISHED
>>>> >
>>>> > tcp 0 0 204.9.140.36:53285 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:53305 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33768 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:35630 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33754 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:35745 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:35724 204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:9997 204.9.140.36:33041
>>>> > ESTABLISHED
>>>> >
>>>> > tcp 0 0 204.9.140.36:53083 204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:50623 204.9.140.37:9997
>>>> > ESTABLISHED
>>>> >
>>>> > tcp 0 0 204.9.140.36:33772 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33732 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33874 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp 0 0 204.9.140.36:33810 204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> >
>>>> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for
>>>> the
>>>> >> tserver? Assuming you haven't altered tserv.port.client in
>>>> >> accumulo-site.xml, we want the line for port 9997.
>>>> >>
>>>> >> From my laptop running a tserver on localhost:
>>>> >>
>>>> >> $ netstat -na | grep 9997
>>>> >> tcp4 0 0 127.0.0.1.9997 *.*
>>>> LISTEN
>>>> >>
>>>> >> Depending on the tool you use, you can grep out the pid of the
>>>> tserver
>>>> >> or just that port itself.
>>>> >>
>>>> >> Just so you know, ZK binds to all available interfaces when it
>>>> starts,
>>>> >> so it should work seamlessly with localhost or the FQDN for the host.
>>>> >> As such, it shouldn't matter what you provide to the
>>>> >> ZooKeeperInstance. That should connect in all cases for you, it's
>>>> when
>>>> >> you make a Connector, and your client will talk to a tabletserver to
>>>> >> authenticate, that your program should hang. It would be good to
>>>> >> verify that.
>>>> >>
>>>> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
>>>> threadedblue@gmail.com>
>>>> >> wrote:
>>>> >> > All,
>>>> >> >
>>>> >> > Thanks for the responses.
>>>> >> >
>>>> >> > Is this a problem for Accumulo?
>>>> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my
>>>> IP in
>>>> >> > reverse followed by their domain name, as opposed to my FQDN,
>>>> which what
>>>> >> > I
>>>> >> > use in my config files.
>>>> >> >
>>>> >> > Running Accumulo 1.5.1
>>>> >> > I have only one interface.
>>>> >> > I have the FQDN in both master and slaves files for both Hadoop and
>>>> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the
>>>> Zookeepers are
>>>> >> > referenced.
>>>> >> > Also, I am passing in all Zk FQDN when I instantiate
>>>> ZookeeperInstance.
>>>> >> > Forward DNS works
>>>> >> > Reverse DNS... well (See above).
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
>>>> wrote:
>>>> >> >>
>>>> >> >> Accumulo tservers typically listen on a single interface. If you
>>>> have a
>>>> >> >> server with multiple interfaces (e.g. loopback and eth0), you
>>>> might
>>>> >> >> have a
>>>> >> >> problem in which the tablet servers are not listening on
>>>> externally
>>>> >> >> reachable interfaces. Tablet servers will list the interfaces
>>>> that they
>>>> >> >> are
>>>> >> >> listening to when they boot, and you can also use tools like lsof
>>>> to
>>>> >> >> find
>>>> >> >> them.
>>>> >> >>
>>>> >> >> If that is indeed the problem, then you might just need to change
>>>> you
>>>> >> >> conf/slaves file to use <hostname> instead of localhost, and then
>>>> >> >> restart.
>>>> >> >>
>>>> >> >> Adam
>>>> >> >>
>>>> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <threadedblue@gmail.com
>>>> >
>>>> >> >> wrote:
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> I have been happily working with Acc, but today things changed.
>>>> No
>>>> >> >>> errors
>>>> >> >>>
>>>> >> >>> Until now I ran everything server side, which meant the URL was
>>>> >> >>> localhost:2181, and life was good. Today tried running some of
>>>> the
>>>> >> >>> same
>>>> >> >>> code as a remote client, which means <host name>:2181. Things
>>>> hang
>>>> >> >>> when
>>>> >> >>> BatchWriter tries to commit anything and Scan hangs when it
>>>> tries to
>>>> >> >>> iterate
>>>> >> >>> through a Map.
>>>> >> >>>
>>>> >> >>> Let's focus on the scan part:
>>>> >> >>>
>>>> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>>>> >> >>> hangs.
>>>> >> >>> for(Entry<Key,Value> entry : scan) {
>>>> >> >>> def row = entry.getKey().getRow();
>>>> >> >>> def value = entry.getValue();
>>>> >> >>> println "value=" + value;
>>>> >> >>> }
>>>> >> >>>
>>>> >> >>> This is what appears in the console :
>>>> >> >>>
>>>> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>>> ping
>>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>>> >> >>>
>>>> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>>> ping
>>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>>> >> >>>
>>>> >> >>> <and on and on>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> The only difference between success and a hang is a URL change,
>>>> and of
>>>> >> >>> course being remote.
>>>> >> >>>
>>>> >> >>> I don't believe this is a firewall issue. I shutdown the
>>>> firewall.
>>>> >> >>>
>>>> >> >>> Am I missing something?
>>>> >> >>>
>>>> >> >>> Thanks all.
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> There are ways and there are ways,
>>>> >> >>>
>>>> >> >>> Geoffry Roberts
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > There are ways and there are ways,
>>>> >> >
>>>> >> > Geoffry Roberts
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > There are ways and there are ways,
>>>> >
>>>> > Geoffry Roberts
>>>>
>>>
>>>
>>>
>>> --
>>> There are ways and there are ways,
>>>
>>> Geoffry Roberts
>>>
>>
Re: Remotely Accumulo
Posted by Geoffry Roberts <th...@gmail.com>.
Just for the record, I finally got to the bottom of things. One of my
Tservers was running out of memory. I hadn't noticed. I had my SA
allocate a lttle more--each node now has 6G up from 2G--and things are
working better.
On Oct 8, 2014 10:09 AM, "Josh Elser" <jo...@gmail.com> wrote:
> Jstack is a tool which can be used to tell a java process to dump the
> current stack traces for all of its threads. It's usually included with the
> JDK. `kill -3 $pid` also does the same. If the output can't be respected
> automatically to your shell, check the stdout for the process you gave as
> an argument.
>
> When your client is sitting waiting on data from the tabletserver, you can
> get the stack traces from the tserver and you should be able to find a
> thread with scan in the name, along with your client's IP, and we can help
> debug exactly what the server is doing that is preventing it from returning
> data to your client.
> On Oct 8, 2014 9:43 AM, "Geoffry Roberts" <th...@gmail.com> wrote:
>
>> Thanks Josh. But what do you mean my "jstack'ing"? I'm unfamiliar with
>> that term. A better question would be how can one troubleshoot such a
>> thing?
>>
>> btw
>> I am the sole user on this cluster.
>>
>> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com> wrote:
>>
>>> Ok, this record:
>>>
>>> tcp 0 0 0.0.0.0:9997 0.0.0.0:*
>>> LISTEN
>>>
>>> Means that your is listening on the correct port on all interfaces.
>>> There shouldn't be issues connecting to the tserver. This is also
>>> confirmed by the fact that you authenticated and got a Connector (this
>>> does an RPC to the tserver).
>>>
>>> So, your tserver is up, and your client can communicate with it. The
>>> real question is why is the scan hanging. Perhaps jstack'ing the
>>> tserver when your client is blocked waiting for results.
>>>
>>> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com>
>>> wrote:
>>> > "...it's when
>>> > you make a Connector, and your client will talk to a tabletserver to
>>> > authenticate, that your program should hang. It would be good to
>>> > verify that."
>>> >
>>> >
>>> > My program should hang? Would you expand? That is exactly what it is
>>> > doing. I am able to get a connector. But when I try to iterate the
>>> result
>>> > of a scan, that's when it hangs.
>>> >
>>> >
>>> >
>>> >
>>> > Here's what comes from netstat:
>>> >
>>> >
>>> > $ netstat -na | grep 9997
>>> >
>>> > tcp 0 0 0.0.0.0:9997 0.0.0.0:*
>>> > LISTEN
>>> >
>>> > tcp 0 0 204.9.140.36:35679 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53146 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33896 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53282 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53188 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:35609 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33901 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:35588 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33877 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33946 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53167 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33949 204.9.140.38:9997
>>> > ESTABLISHED
>>> >
>>> > tcp 0 0 204.9.140.36:35546 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33852 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53125 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33922 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33747 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33961 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33793 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:35768 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33917 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33814 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:35567 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33444 204.9.140.38:9997
>>> > FIN_WAIT2
>>> >
>>> > tcp 0 0 204.9.140.36:35701 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33969 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53258 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33831 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53210 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53104 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33789 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33856 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53237 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33835 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:35651 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33938 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33041 204.9.140.36:9997
>>> > ESTABLISHED
>>> >
>>> > tcp 0 0 204.9.140.36:53285 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:53305 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33768 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:35630 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33754 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:35745 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:35724 204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:9997 204.9.140.36:33041
>>> > ESTABLISHED
>>> >
>>> > tcp 0 0 204.9.140.36:53083 204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:50623 204.9.140.37:9997
>>> > ESTABLISHED
>>> >
>>> > tcp 0 0 204.9.140.36:33772 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33732 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33874 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp 0 0 204.9.140.36:33810 204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> >
>>> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
>>> wrote:
>>> >>
>>> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
>>> >> tserver? Assuming you haven't altered tserv.port.client in
>>> >> accumulo-site.xml, we want the line for port 9997.
>>> >>
>>> >> From my laptop running a tserver on localhost:
>>> >>
>>> >> $ netstat -na | grep 9997
>>> >> tcp4 0 0 127.0.0.1.9997 *.*
>>> LISTEN
>>> >>
>>> >> Depending on the tool you use, you can grep out the pid of the tserver
>>> >> or just that port itself.
>>> >>
>>> >> Just so you know, ZK binds to all available interfaces when it starts,
>>> >> so it should work seamlessly with localhost or the FQDN for the host.
>>> >> As such, it shouldn't matter what you provide to the
>>> >> ZooKeeperInstance. That should connect in all cases for you, it's when
>>> >> you make a Connector, and your client will talk to a tabletserver to
>>> >> authenticate, that your program should hang. It would be good to
>>> >> verify that.
>>> >>
>>> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
>>> threadedblue@gmail.com>
>>> >> wrote:
>>> >> > All,
>>> >> >
>>> >> > Thanks for the responses.
>>> >> >
>>> >> > Is this a problem for Accumulo?
>>> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my
>>> IP in
>>> >> > reverse followed by their domain name, as opposed to my FQDN, which
>>> what
>>> >> > I
>>> >> > use in my config files.
>>> >> >
>>> >> > Running Accumulo 1.5.1
>>> >> > I have only one interface.
>>> >> > I have the FQDN in both master and slaves files for both Hadoop and
>>> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers
>>> are
>>> >> > referenced.
>>> >> > Also, I am passing in all Zk FQDN when I instantiate
>>> ZookeeperInstance.
>>> >> > Forward DNS works
>>> >> > Reverse DNS... well (See above).
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
>>> wrote:
>>> >> >>
>>> >> >> Accumulo tservers typically listen on a single interface. If you
>>> have a
>>> >> >> server with multiple interfaces (e.g. loopback and eth0), you might
>>> >> >> have a
>>> >> >> problem in which the tablet servers are not listening on externally
>>> >> >> reachable interfaces. Tablet servers will list the interfaces that
>>> they
>>> >> >> are
>>> >> >> listening to when they boot, and you can also use tools like lsof
>>> to
>>> >> >> find
>>> >> >> them.
>>> >> >>
>>> >> >> If that is indeed the problem, then you might just need to change
>>> you
>>> >> >> conf/slaves file to use <hostname> instead of localhost, and then
>>> >> >> restart.
>>> >> >>
>>> >> >> Adam
>>> >> >>
>>> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>>
>>> >> >>> I have been happily working with Acc, but today things changed.
>>> No
>>> >> >>> errors
>>> >> >>>
>>> >> >>> Until now I ran everything server side, which meant the URL was
>>> >> >>> localhost:2181, and life was good. Today tried running some of
>>> the
>>> >> >>> same
>>> >> >>> code as a remote client, which means <host name>:2181. Things
>>> hang
>>> >> >>> when
>>> >> >>> BatchWriter tries to commit anything and Scan hangs when it tries
>>> to
>>> >> >>> iterate
>>> >> >>> through a Map.
>>> >> >>>
>>> >> >>> Let's focus on the scan part:
>>> >> >>>
>>> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>>> >> >>> hangs.
>>> >> >>> for(Entry<Key,Value> entry : scan) {
>>> >> >>> def row = entry.getKey().getRow();
>>> >> >>> def value = entry.getValue();
>>> >> >>> println "value=" + value;
>>> >> >>> }
>>> >> >>>
>>> >> >>> This is what appears in the console :
>>> >> >>>
>>> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>> ping
>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>> >> >>>
>>> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>> ping
>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>> >> >>>
>>> >> >>> <and on and on>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> The only difference between success and a hang is a URL change,
>>> and of
>>> >> >>> course being remote.
>>> >> >>>
>>> >> >>> I don't believe this is a firewall issue. I shutdown the
>>> firewall.
>>> >> >>>
>>> >> >>> Am I missing something?
>>> >> >>>
>>> >> >>> Thanks all.
>>> >> >>>
>>> >> >>> --
>>> >> >>> There are ways and there are ways,
>>> >> >>>
>>> >> >>> Geoffry Roberts
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > There are ways and there are ways,
>>> >> >
>>> >> > Geoffry Roberts
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > There are ways and there are ways,
>>> >
>>> > Geoffry Roberts
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
Re: Remotely Accumulo
Posted by Josh Elser <jo...@gmail.com>.
Jstack is a tool which can be used to tell a java process to dump the
current stack traces for all of its threads. It's usually included with the
JDK. `kill -3 $pid` also does the same. If the output can't be respected
automatically to your shell, check the stdout for the process you gave as
an argument.
When your client is sitting waiting on data from the tabletserver, you can
get the stack traces from the tserver and you should be able to find a
thread with scan in the name, along with your client's IP, and we can help
debug exactly what the server is doing that is preventing it from returning
data to your client.
On Oct 8, 2014 9:43 AM, "Geoffry Roberts" <th...@gmail.com> wrote:
> Thanks Josh. But what do you mean my "jstack'ing"? I'm unfamiliar with
> that term. A better question would be how can one troubleshoot such a
> thing?
>
> btw
> I am the sole user on this cluster.
>
> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com> wrote:
>
>> Ok, this record:
>>
>> tcp 0 0 0.0.0.0:9997 0.0.0.0:*
>> LISTEN
>>
>> Means that your is listening on the correct port on all interfaces.
>> There shouldn't be issues connecting to the tserver. This is also
>> confirmed by the fact that you authenticated and got a Connector (this
>> does an RPC to the tserver).
>>
>> So, your tserver is up, and your client can communicate with it. The
>> real question is why is the scan hanging. Perhaps jstack'ing the
>> tserver when your client is blocked waiting for results.
>>
>> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com>
>> wrote:
>> > "...it's when
>> > you make a Connector, and your client will talk to a tabletserver to
>> > authenticate, that your program should hang. It would be good to
>> > verify that."
>> >
>> >
>> > My program should hang? Would you expand? That is exactly what it is
>> > doing. I am able to get a connector. But when I try to iterate the
>> result
>> > of a scan, that's when it hangs.
>> >
>> >
>> >
>> >
>> > Here's what comes from netstat:
>> >
>> >
>> > $ netstat -na | grep 9997
>> >
>> > tcp 0 0 0.0.0.0:9997 0.0.0.0:*
>> > LISTEN
>> >
>> > tcp 0 0 204.9.140.36:35679 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53146 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33896 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53282 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53188 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35609 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33901 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35588 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33877 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33946 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53167 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33949 204.9.140.38:9997
>> > ESTABLISHED
>> >
>> > tcp 0 0 204.9.140.36:35546 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33852 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53125 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33922 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33747 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33961 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33793 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35768 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33917 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33814 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35567 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33444 204.9.140.38:9997
>> > FIN_WAIT2
>> >
>> > tcp 0 0 204.9.140.36:35701 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33969 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53258 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33831 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53210 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53104 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33789 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33856 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53237 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33835 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35651 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33938 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33041 204.9.140.36:9997
>> > ESTABLISHED
>> >
>> > tcp 0 0 204.9.140.36:53285 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:53305 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33768 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35630 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33754 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35745 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:35724 204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:9997 204.9.140.36:33041
>> > ESTABLISHED
>> >
>> > tcp 0 0 204.9.140.36:53083 204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:50623 204.9.140.37:9997
>> > ESTABLISHED
>> >
>> > tcp 0 0 204.9.140.36:33772 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33732 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33874 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp 0 0 204.9.140.36:33810 204.9.140.38:9997
>> > TIME_WAIT
>> >
>> >
>> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
>> wrote:
>> >>
>> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
>> >> tserver? Assuming you haven't altered tserv.port.client in
>> >> accumulo-site.xml, we want the line for port 9997.
>> >>
>> >> From my laptop running a tserver on localhost:
>> >>
>> >> $ netstat -na | grep 9997
>> >> tcp4 0 0 127.0.0.1.9997 *.*
>> LISTEN
>> >>
>> >> Depending on the tool you use, you can grep out the pid of the tserver
>> >> or just that port itself.
>> >>
>> >> Just so you know, ZK binds to all available interfaces when it starts,
>> >> so it should work seamlessly with localhost or the FQDN for the host.
>> >> As such, it shouldn't matter what you provide to the
>> >> ZooKeeperInstance. That should connect in all cases for you, it's when
>> >> you make a Connector, and your client will talk to a tabletserver to
>> >> authenticate, that your program should hang. It would be good to
>> >> verify that.
>> >>
>> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
>> threadedblue@gmail.com>
>> >> wrote:
>> >> > All,
>> >> >
>> >> > Thanks for the responses.
>> >> >
>> >> > Is this a problem for Accumulo?
>> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my
>> IP in
>> >> > reverse followed by their domain name, as opposed to my FQDN, which
>> what
>> >> > I
>> >> > use in my config files.
>> >> >
>> >> > Running Accumulo 1.5.1
>> >> > I have only one interface.
>> >> > I have the FQDN in both master and slaves files for both Hadoop and
>> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers
>> are
>> >> > referenced.
>> >> > Also, I am passing in all Zk FQDN when I instantiate
>> ZookeeperInstance.
>> >> > Forward DNS works
>> >> > Reverse DNS... well (See above).
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
>> wrote:
>> >> >>
>> >> >> Accumulo tservers typically listen on a single interface. If you
>> have a
>> >> >> server with multiple interfaces (e.g. loopback and eth0), you might
>> >> >> have a
>> >> >> problem in which the tablet servers are not listening on externally
>> >> >> reachable interfaces. Tablet servers will list the interfaces that
>> they
>> >> >> are
>> >> >> listening to when they boot, and you can also use tools like lsof to
>> >> >> find
>> >> >> them.
>> >> >>
>> >> >> If that is indeed the problem, then you might just need to change
>> you
>> >> >> conf/slaves file to use <hostname> instead of localhost, and then
>> >> >> restart.
>> >> >>
>> >> >> Adam
>> >> >>
>> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>>
>> >> >>> I have been happily working with Acc, but today things changed. No
>> >> >>> errors
>> >> >>>
>> >> >>> Until now I ran everything server side, which meant the URL was
>> >> >>> localhost:2181, and life was good. Today tried running some of the
>> >> >>> same
>> >> >>> code as a remote client, which means <host name>:2181. Things hang
>> >> >>> when
>> >> >>> BatchWriter tries to commit anything and Scan hangs when it tries
>> to
>> >> >>> iterate
>> >> >>> through a Map.
>> >> >>>
>> >> >>> Let's focus on the scan part:
>> >> >>>
>> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>> >> >>> hangs.
>> >> >>> for(Entry<Key,Value> entry : scan) {
>> >> >>> def row = entry.getKey().getRow();
>> >> >>> def value = entry.getValue();
>> >> >>> println "value=" + value;
>> >> >>> }
>> >> >>>
>> >> >>> This is what appears in the console :
>> >> >>>
>> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>> ping
>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>> >> >>>
>> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>> ping
>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>> >> >>>
>> >> >>> <and on and on>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> The only difference between success and a hang is a URL change,
>> and of
>> >> >>> course being remote.
>> >> >>>
>> >> >>> I don't believe this is a firewall issue. I shutdown the firewall.
>> >> >>>
>> >> >>> Am I missing something?
>> >> >>>
>> >> >>> Thanks all.
>> >> >>>
>> >> >>> --
>> >> >>> There are ways and there are ways,
>> >> >>>
>> >> >>> Geoffry Roberts
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > There are ways and there are ways,
>> >> >
>> >> > Geoffry Roberts
>> >
>> >
>> >
>> >
>> > --
>> > There are ways and there are ways,
>> >
>> > Geoffry Roberts
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>
Re: Remotely Accumulo
Posted by Geoffry Roberts <th...@gmail.com>.
Thanks Josh. But what do you mean my "jstack'ing"? I'm unfamiliar with
that term. A better question would be how can one troubleshoot such a
thing?
btw
I am the sole user on this cluster.
On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com> wrote:
> Ok, this record:
>
> tcp 0 0 0.0.0.0:9997 0.0.0.0:*
> LISTEN
>
> Means that your is listening on the correct port on all interfaces.
> There shouldn't be issues connecting to the tserver. This is also
> confirmed by the fact that you authenticated and got a Connector (this
> does an RPC to the tserver).
>
> So, your tserver is up, and your client can communicate with it. The
> real question is why is the scan hanging. Perhaps jstack'ing the
> tserver when your client is blocked waiting for results.
>
> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com>
> wrote:
> > "...it's when
> > you make a Connector, and your client will talk to a tabletserver to
> > authenticate, that your program should hang. It would be good to
> > verify that."
> >
> >
> > My program should hang? Would you expand? That is exactly what it is
> > doing. I am able to get a connector. But when I try to iterate the
> result
> > of a scan, that's when it hangs.
> >
> >
> >
> >
> > Here's what comes from netstat:
> >
> >
> > $ netstat -na | grep 9997
> >
> > tcp 0 0 0.0.0.0:9997 0.0.0.0:*
> > LISTEN
> >
> > tcp 0 0 204.9.140.36:35679 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53146 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33896 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53282 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53188 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35609 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33901 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35588 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33877 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33946 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53167 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33949 204.9.140.38:9997
> > ESTABLISHED
> >
> > tcp 0 0 204.9.140.36:35546 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33852 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53125 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33922 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33747 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33961 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33793 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35768 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33917 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33814 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35567 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33444 204.9.140.38:9997
> > FIN_WAIT2
> >
> > tcp 0 0 204.9.140.36:35701 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33969 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53258 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33831 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53210 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53104 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33789 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33856 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53237 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33835 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35651 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33938 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33041 204.9.140.36:9997
> > ESTABLISHED
> >
> > tcp 0 0 204.9.140.36:53285 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:53305 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33768 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35630 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33754 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35745 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:35724 204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:9997 204.9.140.36:33041
> > ESTABLISHED
> >
> > tcp 0 0 204.9.140.36:53083 204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:50623 204.9.140.37:9997
> > ESTABLISHED
> >
> > tcp 0 0 204.9.140.36:33772 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33732 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33874 204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp 0 0 204.9.140.36:33810 204.9.140.38:9997
> > TIME_WAIT
> >
> >
> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
> wrote:
> >>
> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
> >> tserver? Assuming you haven't altered tserv.port.client in
> >> accumulo-site.xml, we want the line for port 9997.
> >>
> >> From my laptop running a tserver on localhost:
> >>
> >> $ netstat -na | grep 9997
> >> tcp4 0 0 127.0.0.1.9997 *.*
> LISTEN
> >>
> >> Depending on the tool you use, you can grep out the pid of the tserver
> >> or just that port itself.
> >>
> >> Just so you know, ZK binds to all available interfaces when it starts,
> >> so it should work seamlessly with localhost or the FQDN for the host.
> >> As such, it shouldn't matter what you provide to the
> >> ZooKeeperInstance. That should connect in all cases for you, it's when
> >> you make a Connector, and your client will talk to a tabletserver to
> >> authenticate, that your program should hang. It would be good to
> >> verify that.
> >>
> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
> threadedblue@gmail.com>
> >> wrote:
> >> > All,
> >> >
> >> > Thanks for the responses.
> >> >
> >> > Is this a problem for Accumulo?
> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my IP
> in
> >> > reverse followed by their domain name, as opposed to my FQDN, which
> what
> >> > I
> >> > use in my config files.
> >> >
> >> > Running Accumulo 1.5.1
> >> > I have only one interface.
> >> > I have the FQDN in both master and slaves files for both Hadoop and
> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers
> are
> >> > referenced.
> >> > Also, I am passing in all Zk FQDN when I instantiate
> ZookeeperInstance.
> >> > Forward DNS works
> >> > Reverse DNS... well (See above).
> >> >
> >> >
> >> >
> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
> wrote:
> >> >>
> >> >> Accumulo tservers typically listen on a single interface. If you
> have a
> >> >> server with multiple interfaces (e.g. loopback and eth0), you might
> >> >> have a
> >> >> problem in which the tablet servers are not listening on externally
> >> >> reachable interfaces. Tablet servers will list the interfaces that
> they
> >> >> are
> >> >> listening to when they boot, and you can also use tools like lsof to
> >> >> find
> >> >> them.
> >> >>
> >> >> If that is indeed the problem, then you might just need to change you
> >> >> conf/slaves file to use <hostname> instead of localhost, and then
> >> >> restart.
> >> >>
> >> >> Adam
> >> >>
> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>>
> >> >>> I have been happily working with Acc, but today things changed. No
> >> >>> errors
> >> >>>
> >> >>> Until now I ran everything server side, which meant the URL was
> >> >>> localhost:2181, and life was good. Today tried running some of the
> >> >>> same
> >> >>> code as a remote client, which means <host name>:2181. Things hang
> >> >>> when
> >> >>> BatchWriter tries to commit anything and Scan hangs when it tries to
> >> >>> iterate
> >> >>> through a Map.
> >> >>>
> >> >>> Let's focus on the scan part:
> >> >>>
> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
> >> >>> hangs.
> >> >>> for(Entry<Key,Value> entry : scan) {
> >> >>> def row = entry.getKey().getRow();
> >> >>> def value = entry.getValue();
> >> >>> println "value=" + value;
> >> >>> }
> >> >>>
> >> >>> This is what appears in the console :
> >> >>>
> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >> >>>
> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >> >>>
> >> >>> <and on and on>
> >> >>>
> >> >>>
> >> >>>
> >> >>> The only difference between success and a hang is a URL change, and
> of
> >> >>> course being remote.
> >> >>>
> >> >>> I don't believe this is a firewall issue. I shutdown the firewall.
> >> >>>
> >> >>> Am I missing something?
> >> >>>
> >> >>> Thanks all.
> >> >>>
> >> >>> --
> >> >>> There are ways and there are ways,
> >> >>>
> >> >>> Geoffry Roberts
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > There are ways and there are ways,
> >> >
> >> > Geoffry Roberts
> >
> >
> >
> >
> > --
> > There are ways and there are ways,
> >
> > Geoffry Roberts
>
--
There are ways and there are ways,
Geoffry Roberts
Re: Remotely Accumulo
Posted by Josh Elser <jo...@gmail.com>.
Ok, this record:
tcp 0 0 0.0.0.0:9997 0.0.0.0:*
LISTEN
Means that your is listening on the correct port on all interfaces.
There shouldn't be issues connecting to the tserver. This is also
confirmed by the fact that you authenticated and got a Connector (this
does an RPC to the tserver).
So, your tserver is up, and your client can communicate with it. The
real question is why is the scan hanging. Perhaps jstack'ing the
tserver when your client is blocked waiting for results.
On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com> wrote:
> "...it's when
> you make a Connector, and your client will talk to a tabletserver to
> authenticate, that your program should hang. It would be good to
> verify that."
>
>
> My program should hang? Would you expand? That is exactly what it is
> doing. I am able to get a connector. But when I try to iterate the result
> of a scan, that's when it hangs.
>
>
>
>
> Here's what comes from netstat:
>
>
> $ netstat -na | grep 9997
>
> tcp 0 0 0.0.0.0:9997 0.0.0.0:*
> LISTEN
>
> tcp 0 0 204.9.140.36:35679 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53146 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33896 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53282 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53188 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:35609 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33901 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:35588 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33877 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33946 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53167 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33949 204.9.140.38:9997
> ESTABLISHED
>
> tcp 0 0 204.9.140.36:35546 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33852 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53125 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33922 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33747 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33961 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33793 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:35768 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33917 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33814 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:35567 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33444 204.9.140.38:9997
> FIN_WAIT2
>
> tcp 0 0 204.9.140.36:35701 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33969 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53258 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33831 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53210 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53104 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33789 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33856 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53237 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33835 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:35651 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33938 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33041 204.9.140.36:9997
> ESTABLISHED
>
> tcp 0 0 204.9.140.36:53285 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:53305 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33768 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:35630 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33754 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:35745 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:35724 204.9.140.36:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:9997 204.9.140.36:33041
> ESTABLISHED
>
> tcp 0 0 204.9.140.36:53083 204.9.140.37:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:50623 204.9.140.37:9997
> ESTABLISHED
>
> tcp 0 0 204.9.140.36:33772 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33732 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33874 204.9.140.38:9997
> TIME_WAIT
>
> tcp 0 0 204.9.140.36:33810 204.9.140.38:9997
> TIME_WAIT
>
>
> On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com> wrote:
>>
>> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
>> tserver? Assuming you haven't altered tserv.port.client in
>> accumulo-site.xml, we want the line for port 9997.
>>
>> From my laptop running a tserver on localhost:
>>
>> $ netstat -na | grep 9997
>> tcp4 0 0 127.0.0.1.9997 *.* LISTEN
>>
>> Depending on the tool you use, you can grep out the pid of the tserver
>> or just that port itself.
>>
>> Just so you know, ZK binds to all available interfaces when it starts,
>> so it should work seamlessly with localhost or the FQDN for the host.
>> As such, it shouldn't matter what you provide to the
>> ZooKeeperInstance. That should connect in all cases for you, it's when
>> you make a Connector, and your client will talk to a tabletserver to
>> authenticate, that your program should hang. It would be good to
>> verify that.
>>
>> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <th...@gmail.com>
>> wrote:
>> > All,
>> >
>> > Thanks for the responses.
>> >
>> > Is this a problem for Accumulo?
>> > Reverse DNS is yielding my ISP's host name. You know the drill, my IP in
>> > reverse followed by their domain name, as opposed to my FQDN, which what
>> > I
>> > use in my config files.
>> >
>> > Running Accumulo 1.5.1
>> > I have only one interface.
>> > I have the FQDN in both master and slaves files for both Hadoop and
>> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers are
>> > referenced.
>> > Also, I am passing in all Zk FQDN when I instantiate ZookeeperInstance.
>> > Forward DNS works
>> > Reverse DNS... well (See above).
>> >
>> >
>> >
>> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org> wrote:
>> >>
>> >> Accumulo tservers typically listen on a single interface. If you have a
>> >> server with multiple interfaces (e.g. loopback and eth0), you might
>> >> have a
>> >> problem in which the tablet servers are not listening on externally
>> >> reachable interfaces. Tablet servers will list the interfaces that they
>> >> are
>> >> listening to when they boot, and you can also use tools like lsof to
>> >> find
>> >> them.
>> >>
>> >> If that is indeed the problem, then you might just need to change you
>> >> conf/slaves file to use <hostname> instead of localhost, and then
>> >> restart.
>> >>
>> >> Adam
>> >>
>> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
>> >> wrote:
>> >>>
>> >>>
>> >>> I have been happily working with Acc, but today things changed. No
>> >>> errors
>> >>>
>> >>> Until now I ran everything server side, which meant the URL was
>> >>> localhost:2181, and life was good. Today tried running some of the
>> >>> same
>> >>> code as a remote client, which means <host name>:2181. Things hang
>> >>> when
>> >>> BatchWriter tries to commit anything and Scan hangs when it tries to
>> >>> iterate
>> >>> through a Map.
>> >>>
>> >>> Let's focus on the scan part:
>> >>>
>> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>> >>> hangs.
>> >>> for(Entry<Key,Value> entry : scan) {
>> >>> def row = entry.getKey().getRow();
>> >>> def value = entry.getValue();
>> >>> println "value=" + value;
>> >>> }
>> >>>
>> >>> This is what appears in the console :
>> >>>
>> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>> >>> response for sessionid: 0x148c6f03388005e after 21ms
>> >>>
>> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>> >>> response for sessionid: 0x148c6f03388005e after 21ms
>> >>>
>> >>> <and on and on>
>> >>>
>> >>>
>> >>>
>> >>> The only difference between success and a hang is a URL change, and of
>> >>> course being remote.
>> >>>
>> >>> I don't believe this is a firewall issue. I shutdown the firewall.
>> >>>
>> >>> Am I missing something?
>> >>>
>> >>> Thanks all.
>> >>>
>> >>> --
>> >>> There are ways and there are ways,
>> >>>
>> >>> Geoffry Roberts
>> >
>> >
>> >
>> >
>> > --
>> > There are ways and there are ways,
>> >
>> > Geoffry Roberts
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
Re: Remotely Accumulo
Posted by Geoffry Roberts <th...@gmail.com>.
"...it's when
you make a Connector, and your client will talk to a tabletserver to
authenticate, that your program should hang. It would be good to
verify that."
My program should hang? Would you expand? That is exactly what it is
doing. I am able to get a connector. But when I try to iterate the result
of a scan, that's when it hangs.
Here's what comes from netstat:
$ netstat -na | grep 9997
tcp 0 0 0.0.0.0:9997 0.0.0.0:*
LISTEN
tcp 0 0 204.9.140.36:35679 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53146 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33896 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53282 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53188 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:35609 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33901 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:35588 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33877 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33946 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53167 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33949 204.9.140.38:9997
ESTABLISHED
tcp 0 0 204.9.140.36:35546 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33852 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53125 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33922 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33747 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33961 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33793 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:35768 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33917 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33814 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:35567 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33444 204.9.140.38:9997
FIN_WAIT2
tcp 0 0 204.9.140.36:35701 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33969 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53258 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33831 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53210 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53104 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33789 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33856 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53237 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33835 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:35651 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33938 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33041 204.9.140.36:9997
ESTABLISHED
tcp 0 0 204.9.140.36:53285 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:53305 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33768 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:35630 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33754 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:35745 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:35724 204.9.140.36:9997
TIME_WAIT
tcp 0 0 204.9.140.36:9997 204.9.140.36:33041
ESTABLISHED
tcp 0 0 204.9.140.36:53083 204.9.140.37:9997
TIME_WAIT
tcp 0 0 204.9.140.36:50623 204.9.140.37:9997
ESTABLISHED
tcp 0 0 204.9.140.36:33772 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33732 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33874 204.9.140.38:9997
TIME_WAIT
tcp 0 0 204.9.140.36:33810 204.9.140.38:9997
TIME_WAIT
On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com> wrote:
> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
> tserver? Assuming you haven't altered tserv.port.client in
> accumulo-site.xml, we want the line for port 9997.
>
> From my laptop running a tserver on localhost:
>
> $ netstat -na | grep 9997
> tcp4 0 0 127.0.0.1.9997 *.* LISTEN
>
> Depending on the tool you use, you can grep out the pid of the tserver
> or just that port itself.
>
> Just so you know, ZK binds to all available interfaces when it starts,
> so it should work seamlessly with localhost or the FQDN for the host.
> As such, it shouldn't matter what you provide to the
> ZooKeeperInstance. That should connect in all cases for you, it's when
> you make a Connector, and your client will talk to a tabletserver to
> authenticate, that your program should hang. It would be good to
> verify that.
>
> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <th...@gmail.com>
> wrote:
> > All,
> >
> > Thanks for the responses.
> >
> > Is this a problem for Accumulo?
> > Reverse DNS is yielding my ISP's host name. You know the drill, my IP in
> > reverse followed by their domain name, as opposed to my FQDN, which what
> I
> > use in my config files.
> >
> > Running Accumulo 1.5.1
> > I have only one interface.
> > I have the FQDN in both master and slaves files for both Hadoop and
> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers are
> > referenced.
> > Also, I am passing in all Zk FQDN when I instantiate ZookeeperInstance.
> > Forward DNS works
> > Reverse DNS... well (See above).
> >
> >
> >
> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org> wrote:
> >>
> >> Accumulo tservers typically listen on a single interface. If you have a
> >> server with multiple interfaces (e.g. loopback and eth0), you might
> have a
> >> problem in which the tablet servers are not listening on externally
> >> reachable interfaces. Tablet servers will list the interfaces that they
> are
> >> listening to when they boot, and you can also use tools like lsof to
> find
> >> them.
> >>
> >> If that is indeed the problem, then you might just need to change you
> >> conf/slaves file to use <hostname> instead of localhost, and then
> restart.
> >>
> >> Adam
> >>
> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
> wrote:
> >>>
> >>>
> >>> I have been happily working with Acc, but today things changed. No
> >>> errors
> >>>
> >>> Until now I ran everything server side, which meant the URL was
> >>> localhost:2181, and life was good. Today tried running some of the
> same
> >>> code as a remote client, which means <host name>:2181. Things hang
> when
> >>> BatchWriter tries to commit anything and Scan hangs when it tries to
> iterate
> >>> through a Map.
> >>>
> >>> Let's focus on the scan part:
> >>>
> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
> >>> for(Entry<Key,Value> entry : scan) {
> >>> def row = entry.getKey().getRow();
> >>> def value = entry.getValue();
> >>> println "value=" + value;
> >>> }
> >>>
> >>> This is what appears in the console :
> >>>
> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >>>
> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >>>
> >>> <and on and on>
> >>>
> >>>
> >>>
> >>> The only difference between success and a hang is a URL change, and of
> >>> course being remote.
> >>>
> >>> I don't believe this is a firewall issue. I shutdown the firewall.
> >>>
> >>> Am I missing something?
> >>>
> >>> Thanks all.
> >>>
> >>> --
> >>> There are ways and there are ways,
> >>>
> >>> Geoffry Roberts
> >
> >
> >
> >
> > --
> > There are ways and there are ways,
> >
> > Geoffry Roberts
>
--
There are ways and there are ways,
Geoffry Roberts
Re: Remotely Accumulo
Posted by Josh Elser <jo...@gmail.com>.
Can you provide the output from netstat, lsof or /proc/$pid/fd for the
tserver? Assuming you haven't altered tserv.port.client in
accumulo-site.xml, we want the line for port 9997.
>From my laptop running a tserver on localhost:
$ netstat -na | grep 9997
tcp4 0 0 127.0.0.1.9997 *.* LISTEN
Depending on the tool you use, you can grep out the pid of the tserver
or just that port itself.
Just so you know, ZK binds to all available interfaces when it starts,
so it should work seamlessly with localhost or the FQDN for the host.
As such, it shouldn't matter what you provide to the
ZooKeeperInstance. That should connect in all cases for you, it's when
you make a Connector, and your client will talk to a tabletserver to
authenticate, that your program should hang. It would be good to
verify that.
On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <th...@gmail.com> wrote:
> All,
>
> Thanks for the responses.
>
> Is this a problem for Accumulo?
> Reverse DNS is yielding my ISP's host name. You know the drill, my IP in
> reverse followed by their domain name, as opposed to my FQDN, which what I
> use in my config files.
>
> Running Accumulo 1.5.1
> I have only one interface.
> I have the FQDN in both master and slaves files for both Hadoop and
> Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers are
> referenced.
> Also, I am passing in all Zk FQDN when I instantiate ZookeeperInstance.
> Forward DNS works
> Reverse DNS... well (See above).
>
>
>
> On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org> wrote:
>>
>> Accumulo tservers typically listen on a single interface. If you have a
>> server with multiple interfaces (e.g. loopback and eth0), you might have a
>> problem in which the tablet servers are not listening on externally
>> reachable interfaces. Tablet servers will list the interfaces that they are
>> listening to when they boot, and you can also use tools like lsof to find
>> them.
>>
>> If that is indeed the problem, then you might just need to change you
>> conf/slaves file to use <hostname> instead of localhost, and then restart.
>>
>> Adam
>>
>> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com> wrote:
>>>
>>>
>>> I have been happily working with Acc, but today things changed. No
>>> errors
>>>
>>> Until now I ran everything server side, which meant the URL was
>>> localhost:2181, and life was good. Today tried running some of the same
>>> code as a remote client, which means <host name>:2181. Things hang when
>>> BatchWriter tries to commit anything and Scan hangs when it tries to iterate
>>> through a Map.
>>>
>>> Let's focus on the scan part:
>>>
>>> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
>>> for(Entry<Key,Value> entry : scan) {
>>> def row = entry.getKey().getRow();
>>> def value = entry.getValue();
>>> println "value=" + value;
>>> }
>>>
>>> This is what appears in the console :
>>>
>>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>>> response for sessionid: 0x148c6f03388005e after 21ms
>>>
>>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>>> response for sessionid: 0x148c6f03388005e after 21ms
>>>
>>> <and on and on>
>>>
>>>
>>>
>>> The only difference between success and a hang is a URL change, and of
>>> course being remote.
>>>
>>> I don't believe this is a firewall issue. I shutdown the firewall.
>>>
>>> Am I missing something?
>>>
>>> Thanks all.
>>>
>>> --
>>> There are ways and there are ways,
>>>
>>> Geoffry Roberts
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
Re: Remotely Accumulo
Posted by Geoffry Roberts <th...@gmail.com>.
All,
Thanks for the responses.
Is this a problem for Accumulo?
Reverse DNS is yielding my ISP's host name. You know the drill, my IP in
reverse followed by their domain name, as opposed to my FQDN, which what I
use in my config files.
- Running Accumulo 1.5.1
- I have only one interface.
- I have the FQDN in both master and slaves files for both Hadoop and
Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers are
referenced.
- Also, I am passing in all Zk FQDN when I instantiate ZookeeperInstance.
- Forward DNS works
- Reverse DNS... well (See above).
On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org> wrote:
> Accumulo tservers typically listen on a single interface. If you have a
> server with multiple interfaces (e.g. loopback and eth0), you might have a
> problem in which the tablet servers are not listening on externally
> reachable interfaces. Tablet servers will list the interfaces that they are
> listening to when they boot, and you can also use tools like lsof to find
> them.
>
> If that is indeed the problem, then you might just need to change you
> conf/slaves file to use <hostname> instead of localhost, and then restart.
>
> Adam
> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com> wrote:
>
>>
>> I have been happily working with Acc, but today things changed. No errors
>>
>> Until now I ran everything server side, which meant the URL was
>> localhost:2181, and life was good. Today tried running some of the same
>> code as a remote client, which means <host name>:2181. Things hang when
>> BatchWriter tries to commit anything and Scan hangs when it tries to
>> iterate through a Map.
>>
>> Let's focus on the scan part:
>>
>> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
>> for(Entry<Key,Value> entry : scan) {
>> def row = entry.getKey().getRow();
>> def value = entry.getValue();
>> println "value=" + value;
>> }
>>
>> This is what appears in the console :
>>
>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>> response for sessionid: 0x148c6f03388005e after 21ms
>>
>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>> response for sessionid: 0x148c6f03388005e after 21ms
>>
>> <and on and on>
>>
>>
>> The only difference between success and a hang is a URL change, and of
>> course being remote.
>>
>> I don't believe this is a firewall issue. I shutdown the firewall.
>>
>> Am I missing something?
>>
>> Thanks all.
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
--
There are ways and there are ways,
Geoffry Roberts
Re: Remotely Accumulo
Posted by Adam Fuchs <af...@apache.org>.
Accumulo tservers typically listen on a single interface. If you have a
server with multiple interfaces (e.g. loopback and eth0), you might have a
problem in which the tablet servers are not listening on externally
reachable interfaces. Tablet servers will list the interfaces that they are
listening to when they boot, and you can also use tools like lsof to find
them.
If that is indeed the problem, then you might just need to change you
conf/slaves file to use <hostname> instead of localhost, and then restart.
Adam
On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com> wrote:
>
> I have been happily working with Acc, but today things changed. No errors
>
> Until now I ran everything server side, which meant the URL was
> localhost:2181, and life was good. Today tried running some of the same
> code as a remote client, which means <host name>:2181. Things hang when
> BatchWriter tries to commit anything and Scan hangs when it tries to
> iterate through a Map.
>
> Let's focus on the scan part:
>
> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
> for(Entry<Key,Value> entry : scan) {
> def row = entry.getKey().getRow();
> def value = entry.getValue();
> println "value=" + value;
> }
>
> This is what appears in the console :
>
> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> <and on and on>
>
>
> The only difference between success and a hang is a URL change, and of
> course being remote.
>
> I don't believe this is a firewall issue. I shutdown the firewall.
>
> Am I missing something?
>
> Thanks all.
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>
Re: Remotely Accumulo
Posted by Keith Turner <ke...@deenlo.com>.
If you add the following Log4j code before scanning, maybe the trace
messages from Accumulo client code will shed some light on whats happening.
Logger.getLogger("org.apache.accumulo.core.client").setLevel(Level.TRACE);
On Mon, Oct 6, 2014 at 5:26 PM, Geoffry Roberts <th...@gmail.com>
wrote:
>
> I have been happily working with Acc, but today things changed. No errors
>
> Until now I ran everything server side, which meant the URL was
> localhost:2181, and life was good. Today tried running some of the same
> code as a remote client, which means <host name>:2181. Things hang when
> BatchWriter tries to commit anything and Scan hangs when it tries to
> iterate through a Map.
>
> Let's focus on the scan part:
>
> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
> for(Entry<Key,Value> entry : scan) {
> def row = entry.getKey().getRow();
> def value = entry.getValue();
> println "value=" + value;
> }
>
> This is what appears in the console :
>
> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> <and on and on>
>
>
> The only difference between success and a hang is a URL change, and of
> course being remote.
>
> I don't believe this is a firewall issue. I shutdown the firewall.
>
> Am I missing something?
>
> Thanks all.
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>