You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Geoffry Roberts <th...@gmail.com> on 2014/10/06 23:26:38 UTC

Remotely Accumulo

I have been happily working with Acc, but today things changed.  No errors

Until now I ran everything server side, which meant the URL was
localhost:2181, and life was good.  Today tried running some of the same
code as a remote client, which means <host name>:2181.  Things hang when
BatchWriter tries to commit anything and Scan hangs when it tries to
iterate through a Map.

Let's focus on the scan part:

scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
for(Entry<Key,Value> entry : scan) {
def row = entry.getKey().getRow();
def value = entry.getValue();
println "value=" + value;
}

This is what appears in the console :

17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
response for sessionid: 0x148c6f03388005e after 21ms

17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
response for sessionid: 0x148c6f03388005e after 21ms

<and on and on>


The only difference between success and a hang is a URL change, and of
course being remote.

I don't believe this is a firewall issue.  I shutdown the firewall.

Am I missing something?

Thanks all.

-- 
There are ways and there are ways,

Geoffry Roberts

Re: Remotely Accumulo

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Geoffry!

What version of Accumulo are you using?

Can you check your DNS on the cluster?

1) Does 'hostname' return the name you expect from the client? (the client
must be able to see all ZK servers and all tablet servers in the cluster)

2) Do your cluster config files contain the same host names that would be
returned by the above command on each server?

3) Does forward and reverse DNS work for each host for the name referenced
in your config files?

On Mon, Oct 6, 2014 at 4:26 PM, Geoffry Roberts <th...@gmail.com>
wrote:

>
> I have been happily working with Acc, but today things changed.  No errors
>
> Until now I ran everything server side, which meant the URL was
> localhost:2181, and life was good.  Today tried running some of the same
> code as a remote client, which means <host name>:2181.  Things hang when
> BatchWriter tries to commit anything and Scan hangs when it tries to
> iterate through a Map.
>
> Let's focus on the scan part:
>
> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
> for(Entry<Key,Value> entry : scan) {
> def row = entry.getKey().getRow();
> def value = entry.getValue();
> println "value=" + value;
> }
>
> This is what appears in the console :
>
> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> <and on and on>
>
>
> The only difference between success and a hang is a URL change, and of
> course being remote.
>
> I don't believe this is a firewall issue.  I shutdown the firewall.
>
> Am I missing something?
>
> Thanks all.
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>



-- 
Sean

Re: Remotely Accumulo

Posted by Geoffry Roberts <th...@gmail.com>.
So start-here.sh does it. Thanks for pointing that out.  I was looking all
through the shell commands .


I did try, from the master, start-all.sh and it worked for starting the
tserver, but I noticed that on the master, it increased the number of
processes labeled "Main" from the usual five to seven.


>From accumulo-site.xml, everything memory related:


  <property>

    <name>tserver.memory.maps.max</name>

    <value>256M</value>

  </property>

  <property>

    <name>tserver.memory.maps.native.enabled</name>

    <value>false</value>

  </property>

  <property>

    <name>tserver.cache.data.size</name>

    <value>50M</value>

  </property>

  <property>

    <name>tserver.cache.index.size</name>

    <value>100M</value>

  </property>

  <property>

    <name>tserver.walog.max.size</name>

    <value>512M</value>

  </property>

On Thu, Oct 9, 2014 at 10:54 AM, Josh Elser <jo...@gmail.com> wrote:

> You can use start-here.sh on the host in question or `start-server.sh
> $hostname tserver`. FWIW, re-invoking start-all should just ignored the
> hosts which already have processes running and just start a tserver on the
> host that died.
>
> 2G should be enough to get a connector and read a table. TBH, 256M should
> be enough for that.
>
> Also, the JVM OOME doesn't include timestamps, there's isn't much more to
> glean from that message other than "it died because it ran out of heap".
>
> What does your accumulo-site.xml look like?
>
> Geoffry Roberts wrote:
>
>> I found the message in tserver*.out. tserver*.err has 0 in it.
>>
>> I posted last night, life was good, sat down this morning and saw that
>> another tserver had crashed, over night, with no activity.  ??  In
>> tserver*.out it again says out of heap space.
>>
>> ACCUMULO_TSERVER_OPTS=-Xmx2G -Xms1G. I would have thought it sufficient.
>>
>> The fact that the log entries lack timestamps, but have hashmarks makes
>> makes me wonder if I am reading things correctly.
>>
>> #
>>
>> # java.lang.OutOfMemoryError: Java heap space
>>
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>
>> #   Executing /bin/sh -c "kill -9 3241"...
>>
>>
>> Is there a way to start a particular tablet server?
>>
>>
>> On Wed, Oct 8, 2014 at 6:55 PM, Eric Newton <eric.newton@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Did you find the message in the tserver*.out, terver*.err or the
>>     monitor page?
>>
>>     (Thanks for the follow-up message.)
>>
>>     On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts
>>     <threadedblue@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Just for the record, I finally got to the bottom of things.  One
>>         of my Tservers was running out of memory.  I hadn't noticed.  I
>>         had my SA allocate a lttle more--each node now has 6G up from
>>         2G--and things are working better.
>>
>>         On Oct 8, 2014 10:09 AM, "Josh Elser" <josh.elser@gmail.com
>>         <ma...@gmail.com>> wrote:
>>
>>             Jstack is a tool which can be used to tell a java process to
>>             dump the current stack traces for all of its threads. It's
>>             usually included with the JDK. `kill -3 $pid` also does the
>>             same. If the output can't be respected automatically to your
>>             shell, check the stdout for the process you gave as an
>>             argument.
>>
>>             When your client is sitting waiting on data from the
>>             tabletserver, you can get the stack traces from the tserver
>>             and you should be able to find a thread with scan in the
>>             name, along with your client's IP, and we can help debug
>>             exactly what the server is doing that is preventing it from
>>             returning data to your client.
>>
>>             On Oct 8, 2014 9:43 AM, "Geoffry Roberts"
>>             <threadedblue@gmail.com <ma...@gmail.com>>
>> wrote:
>>
>>                 Thanks Josh.  But what do you mean my "jstack'ing"?  I'm
>>                 unfamiliar with that term.  A better question would be
>>                 how can one troubleshoot such a thing?
>>
>>                 btw
>>                 I am the sole user on this cluster.
>>
>>                 On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser
>>                 <josh.elser@gmail.com <ma...@gmail.com>>
>> wrote:
>>
>>                     Ok, this record:
>>
>>                     tcp        0      0 0.0.0.0:9997
>>                     <http://0.0.0.0:9997>                0.0.0.0:*
>>                           LISTEN
>>
>>                     Means that your is listening on the correct port on
>>                     all interfaces.
>>                     There shouldn't be issues connecting to the tserver.
>>                     This is also
>>                     confirmed by the fact that you authenticated and got
>>                     a Connector (this
>>                     does an RPC to the tserver).
>>
>>                     So, your tserver is up, and your client can
>>                     communicate with it. The
>>                     real question is why is the scan hanging. Perhaps
>>                     jstack'ing the
>>                     tserver when your client is blocked waiting for
>> results.
>>
>>                     On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts
>>                     <threadedblue@gmail.com
>>                     <ma...@gmail.com>> wrote:
>>                      > "...it's when
>>                      > you make a Connector, and your client will talk
>>                     to a tabletserver to
>>                      > authenticate, that your program should hang. It
>>                     would be good to
>>                      > verify that."
>>                      >
>>                      >
>>                      > My program should hang?  Would you expand?  That
>>                     is exactly what it is
>>                      > doing.  I am able to get a connector.  But when I
>>                     try to iterate the result
>>                      > of a scan, that's when it hangs.
>>                      >
>>                      >
>>                      >
>>                      >
>>                      > Here's what comes from netstat:
>>                      >
>>                      >
>>                      > $ netstat -na | grep 9997
>>                      >
>>                      > tcp        0      0 0.0.0.0:9997
>>                     <http://0.0.0.0:9997>                0.0.0.0:*
>>                      > LISTEN
>>                      >
>>                      > tcp        0      0 204.9.140.36:35679
>>                     <http://204.9.140.36:35679> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53146
>>                     <http://204.9.140.36:53146> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33896
>>                     <http://204.9.140.36:33896> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53282
>>                     <http://204.9.140.36:53282> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53188
>>                     <http://204.9.140.36:53188> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35609
>>                     <http://204.9.140.36:35609> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33901
>>                     <http://204.9.140.36:33901> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35588
>>                     <http://204.9.140.36:35588> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33877
>>                     <http://204.9.140.36:33877> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33946
>>                     <http://204.9.140.36:33946> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53167
>>                     <http://204.9.140.36:53167> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33949
>>                     <http://204.9.140.36:33949> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > ESTABLISHED
>>                      >
>>                      > tcp        0      0 204.9.140.36:35546
>>                     <http://204.9.140.36:35546> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33852
>>                     <http://204.9.140.36:33852> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53125
>>                     <http://204.9.140.36:53125> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33922
>>                     <http://204.9.140.36:33922> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33747
>>                     <http://204.9.140.36:33747> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33961
>>                     <http://204.9.140.36:33961> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33793
>>                     <http://204.9.140.36:33793> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35768
>>                     <http://204.9.140.36:35768> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33917
>>                     <http://204.9.140.36:33917> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33814
>>                     <http://204.9.140.36:33814> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35567
>>                     <http://204.9.140.36:35567> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33444
>>                     <http://204.9.140.36:33444> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > FIN_WAIT2
>>                      >
>>                      > tcp        0      0 204.9.140.36:35701
>>                     <http://204.9.140.36:35701> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33969
>>                     <http://204.9.140.36:33969> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53258
>>                     <http://204.9.140.36:53258> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33831
>>                     <http://204.9.140.36:33831> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53210
>>                     <http://204.9.140.36:53210> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53104
>>                     <http://204.9.140.36:53104> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33789
>>                     <http://204.9.140.36:33789> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33856
>>                     <http://204.9.140.36:33856> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53237
>>                     <http://204.9.140.36:53237> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33835
>>                     <http://204.9.140.36:33835> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35651
>>                     <http://204.9.140.36:35651> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33938
>>                     <http://204.9.140.36:33938> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33041
>>                     <http://204.9.140.36:33041> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > ESTABLISHED
>>                      >
>>                      > tcp        0      0 204.9.140.36:53285
>>                     <http://204.9.140.36:53285> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:53305
>>                     <http://204.9.140.36:53305> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33768
>>                     <http://204.9.140.36:33768> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35630
>>                     <http://204.9.140.36:35630> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33754
>>                     <http://204.9.140.36:33754> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35745
>>                     <http://204.9.140.36:35745> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:35724
>>                     <http://204.9.140.36:35724> 204.9.140.36:9997
>>                     <http://204.9.140.36:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:9997
>>                     <http://204.9.140.36:9997> 204.9.140.36:33041
>>                     <http://204.9.140.36:33041>
>>                      > ESTABLISHED
>>                      >
>>                      > tcp        0      0 204.9.140.36:53083
>>                     <http://204.9.140.36:53083> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:50623
>>                     <http://204.9.140.36:50623> 204.9.140.37:9997
>>                     <http://204.9.140.37:9997>
>>                      > ESTABLISHED
>>                      >
>>                      > tcp        0      0 204.9.140.36:33772
>>                     <http://204.9.140.36:33772> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33732
>>                     <http://204.9.140.36:33732> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33874
>>                     <http://204.9.140.36:33874> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      > tcp        0      0 204.9.140.36:33810
>>                     <http://204.9.140.36:33810> 204.9.140.38:9997
>>                     <http://204.9.140.38:9997>
>>                      > TIME_WAIT
>>                      >
>>                      >
>>                      > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser
>>                     <josh.elser@gmail.com <ma...@gmail.com>>
>>
>>                     wrote:
>>                      >>
>>                      >> Can you provide the output from netstat, lsof or
>>                     /proc/$pid/fd for the
>>                      >> tserver? Assuming you haven't altered
>>                     tserv.port.client in
>>                      >> accumulo-site.xml, we want the line for port 9997.
>>                      >>
>>                      >> From my laptop running a tserver on localhost:
>>                      >>
>>                      >> $ netstat -na | grep 9997
>>                      >> tcp4       0      0  127.0.0.1.9997         *.*
>>                                        LISTEN
>>                      >>
>>                      >> Depending on the tool you use, you can grep out
>>                     the pid of the tserver
>>                      >> or just that port itself.
>>                      >>
>>                      >> Just so you know, ZK binds to all available
>>                     interfaces when it starts,
>>                      >> so it should work seamlessly with localhost or
>>                     the FQDN for the host.
>>                      >> As such, it shouldn't matter what you provide to
>> the
>>                      >> ZooKeeperInstance. That should connect in all
>>                     cases for you, it's when
>>                      >> you make a Connector, and your client will talk
>>                     to a tabletserver to
>>                      >> authenticate, that your program should hang. It
>>                     would be good to
>>                      >> verify that.
>>                      >>
>>                      >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts
>>                     <threadedblue@gmail.com <mailto:
>> threadedblue@gmail.com>>
>>                      >> wrote:
>>                      >> > All,
>>                      >> >
>>                      >> > Thanks for the responses.
>>                      >> >
>>                      >> > Is this a problem for Accumulo?
>>                      >> > Reverse DNS is yielding my ISP's host name.
>>                     You know the drill, my IP in
>>                      >> > reverse followed by their domain name, as
>>                     opposed to my FQDN, which what
>>                      >> > I
>>                      >> > use in my config files.
>>                      >> >
>>                      >> > Running Accumulo 1.5.1
>>                      >> > I have only one interface.
>>                      >> > I have the FQDN in both master and slaves
>>                     files for both Hadoop and
>>                      >> > Accumulo; in zoo.cfg; and in accumulo-site.xml
>>                     where the Zookeepers are
>>                      >> > referenced.
>>                      >> > Also, I am passing in all Zk FQDN when I
>>                     instantiate ZookeeperInstance.
>>                      >> > Forward DNS works
>>                      >> > Reverse DNS... well (See above).
>>                      >> >
>>                      >> >
>>                      >> >
>>                      >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs
>>                     <afuchs@apache.org <ma...@apache.org>> wrote:
>>                      >> >>
>>                      >> >> Accumulo tservers typically listen on a
>>                     single interface. If you have a
>>                      >> >> server with multiple interfaces (e.g.
>>                     loopback and eth0), you might
>>                      >> >> have a
>>                      >> >> problem in which the tablet servers are not
>>                     listening on externally
>>                      >> >> reachable interfaces. Tablet servers will
>>                     list the interfaces that they
>>                      >> >> are
>>                      >> >> listening to when they boot, and you can also
>>                     use tools like lsof to
>>                      >> >> find
>>                      >> >> them.
>>                      >> >>
>>                      >> >> If that is indeed the problem, then you might
>>                     just need to change you
>>                      >> >> conf/slaves file to use <hostname> instead of
>>                     localhost, and then
>>                      >> >> restart.
>>                      >> >>
>>                      >> >> Adam
>>                      >> >>
>>                      >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts"
>>                     <threadedblue@gmail.com <mailto:
>> threadedblue@gmail.com>>
>>
>>                      >> >> wrote:
>>                      >> >>>
>>                      >> >>>
>>                      >> >>> I have been happily working with Acc, but
>>                     today things changed.  No
>>                      >> >>> errors
>>                      >> >>>
>>                      >> >>> Until now I ran everything server side,
>>                     which meant the URL was
>>                      >> >>> localhost:2181, and life was good.  Today
>>                     tried running some of the
>>                      >> >>> same
>>                      >> >>> code as a remote client, which means <host
>>                     name>:2181.  Things hang
>>                      >> >>> when
>>                      >> >>> BatchWriter tries to commit anything and
>>                     Scan hangs when it tries to
>>                      >> >>> iterate
>>                      >> >>> through a Map.
>>                      >> >>>
>>                      >> >>> Let's focus on the scan part:
>>                      >> >>>
>>                      >> >>> scan.fetchColumnFamily(new Text("colfY"));
>>                     // This executes then
>>                      >> >>> hangs.
>>                      >> >>> for(Entry<Key,Value> entry : scan) {
>>                      >> >>> def row = entry.getKey().getRow();
>>                      >> >>> def value = entry.getValue();
>>                      >> >>> println "value=" + value;
>>                      >> >>> }
>>                      >> >>>
>>                      >> >>> This is what appears in the console :
>>                      >> >>>
>>                      >> >>> 17:22:39.802 C{0} M DEBUG
>>                     org.apache.zookeeper.ClientCnxn - Got ping
>>                      >> >>> response for sessionid: 0x148c6f03388005e
>>                     after 21ms
>>                      >> >>>
>>                      >> >>> 17:22:49.803 C{0} M DEBUG
>>                     org.apache.zookeeper.ClientCnxn - Got ping
>>                      >> >>> response for sessionid: 0x148c6f03388005e
>>                     after 21ms
>>                      >> >>>
>>                      >> >>> <and on and on>
>>                      >> >>>
>>                      >> >>>
>>                      >> >>>
>>                      >> >>> The only difference between success and a
>>                     hang is a URL change, and of
>>                      >> >>> course being remote.
>>                      >> >>>
>>                      >> >>> I don't believe this is a firewall issue.  I
>>                     shutdown the firewall.
>>                      >> >>>
>>                      >> >>> Am I missing something?
>>                      >> >>>
>>                      >> >>> Thanks all.
>>                      >> >>>
>>                      >> >>> --
>>                      >> >>> There are ways and there are ways,
>>                      >> >>>
>>                      >> >>> Geoffry Roberts
>>                      >> >
>>                      >> >
>>                      >> >
>>                      >> >
>>                      >> > --
>>                      >> > There are ways and there are ways,
>>                      >> >
>>                      >> > Geoffry Roberts
>>                      >
>>                      >
>>                      >
>>                      >
>>                      > --
>>                      > There are ways and there are ways,
>>                      >
>>                      > Geoffry Roberts
>>
>>
>>
>>
>>                 --
>>                 There are ways and there are ways,
>>
>>                 Geoffry Roberts
>>
>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Remotely Accumulo

Posted by Josh Elser <jo...@gmail.com>.
You can use start-here.sh on the host in question or `start-server.sh 
$hostname tserver`. FWIW, re-invoking start-all should just ignored the 
hosts which already have processes running and just start a tserver on 
the host that died.

2G should be enough to get a connector and read a table. TBH, 256M 
should be enough for that.

Also, the JVM OOME doesn't include timestamps, there's isn't much more 
to glean from that message other than "it died because it ran out of heap".

What does your accumulo-site.xml look like?

Geoffry Roberts wrote:
> I found the message in tserver*.out. tserver*.err has 0 in it.
>
> I posted last night, life was good, sat down this morning and saw that
> another tserver had crashed, over night, with no activity.  ??  In
> tserver*.out it again says out of heap space.
>
> ACCUMULO_TSERVER_OPTS=-Xmx2G -Xms1G. I would have thought it sufficient.
>
> The fact that the log entries lack timestamps, but have hashmarks makes
> makes me wonder if I am reading things correctly.
>
> #
>
> # java.lang.OutOfMemoryError: Java heap space
>
> # -XX:OnOutOfMemoryError="kill -9 %p"
>
> #   Executing /bin/sh -c "kill -9 3241"...
>
>
> Is there a way to start a particular tablet server?
>
>
> On Wed, Oct 8, 2014 at 6:55 PM, Eric Newton <eric.newton@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Did you find the message in the tserver*.out, terver*.err or the
>     monitor page?
>
>     (Thanks for the follow-up message.)
>
>     On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts
>     <threadedblue@gmail.com <ma...@gmail.com>> wrote:
>
>         Just for the record, I finally got to the bottom of things.  One
>         of my Tservers was running out of memory.  I hadn't noticed.  I
>         had my SA allocate a lttle more--each node now has 6G up from
>         2G--and things are working better.
>
>         On Oct 8, 2014 10:09 AM, "Josh Elser" <josh.elser@gmail.com
>         <ma...@gmail.com>> wrote:
>
>             Jstack is a tool which can be used to tell a java process to
>             dump the current stack traces for all of its threads. It's
>             usually included with the JDK. `kill -3 $pid` also does the
>             same. If the output can't be respected automatically to your
>             shell, check the stdout for the process you gave as an
>             argument.
>
>             When your client is sitting waiting on data from the
>             tabletserver, you can get the stack traces from the tserver
>             and you should be able to find a thread with scan in the
>             name, along with your client's IP, and we can help debug
>             exactly what the server is doing that is preventing it from
>             returning data to your client.
>
>             On Oct 8, 2014 9:43 AM, "Geoffry Roberts"
>             <threadedblue@gmail.com <ma...@gmail.com>> wrote:
>
>                 Thanks Josh.  But what do you mean my "jstack'ing"?  I'm
>                 unfamiliar with that term.  A better question would be
>                 how can one troubleshoot such a thing?
>
>                 btw
>                 I am the sole user on this cluster.
>
>                 On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser
>                 <josh.elser@gmail.com <ma...@gmail.com>> wrote:
>
>                     Ok, this record:
>
>                     tcp        0      0 0.0.0.0:9997
>                     <http://0.0.0.0:9997>                0.0.0.0:*
>                           LISTEN
>
>                     Means that your is listening on the correct port on
>                     all interfaces.
>                     There shouldn't be issues connecting to the tserver.
>                     This is also
>                     confirmed by the fact that you authenticated and got
>                     a Connector (this
>                     does an RPC to the tserver).
>
>                     So, your tserver is up, and your client can
>                     communicate with it. The
>                     real question is why is the scan hanging. Perhaps
>                     jstack'ing the
>                     tserver when your client is blocked waiting for results.
>
>                     On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts
>                     <threadedblue@gmail.com
>                     <ma...@gmail.com>> wrote:
>                      > "...it's when
>                      > you make a Connector, and your client will talk
>                     to a tabletserver to
>                      > authenticate, that your program should hang. It
>                     would be good to
>                      > verify that."
>                      >
>                      >
>                      > My program should hang?  Would you expand?  That
>                     is exactly what it is
>                      > doing.  I am able to get a connector.  But when I
>                     try to iterate the result
>                      > of a scan, that's when it hangs.
>                      >
>                      >
>                      >
>                      >
>                      > Here's what comes from netstat:
>                      >
>                      >
>                      > $ netstat -na | grep 9997
>                      >
>                      > tcp        0      0 0.0.0.0:9997
>                     <http://0.0.0.0:9997>                0.0.0.0:*
>                      > LISTEN
>                      >
>                      > tcp        0      0 204.9.140.36:35679
>                     <http://204.9.140.36:35679> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53146
>                     <http://204.9.140.36:53146> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33896
>                     <http://204.9.140.36:33896> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53282
>                     <http://204.9.140.36:53282> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53188
>                     <http://204.9.140.36:53188> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35609
>                     <http://204.9.140.36:35609> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33901
>                     <http://204.9.140.36:33901> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35588
>                     <http://204.9.140.36:35588> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33877
>                     <http://204.9.140.36:33877> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33946
>                     <http://204.9.140.36:33946> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53167
>                     <http://204.9.140.36:53167> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33949
>                     <http://204.9.140.36:33949> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > ESTABLISHED
>                      >
>                      > tcp        0      0 204.9.140.36:35546
>                     <http://204.9.140.36:35546> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33852
>                     <http://204.9.140.36:33852> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53125
>                     <http://204.9.140.36:53125> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33922
>                     <http://204.9.140.36:33922> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33747
>                     <http://204.9.140.36:33747> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33961
>                     <http://204.9.140.36:33961> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33793
>                     <http://204.9.140.36:33793> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35768
>                     <http://204.9.140.36:35768> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33917
>                     <http://204.9.140.36:33917> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33814
>                     <http://204.9.140.36:33814> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35567
>                     <http://204.9.140.36:35567> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33444
>                     <http://204.9.140.36:33444> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > FIN_WAIT2
>                      >
>                      > tcp        0      0 204.9.140.36:35701
>                     <http://204.9.140.36:35701> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33969
>                     <http://204.9.140.36:33969> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53258
>                     <http://204.9.140.36:53258> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33831
>                     <http://204.9.140.36:33831> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53210
>                     <http://204.9.140.36:53210> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53104
>                     <http://204.9.140.36:53104> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33789
>                     <http://204.9.140.36:33789> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33856
>                     <http://204.9.140.36:33856> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53237
>                     <http://204.9.140.36:53237> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33835
>                     <http://204.9.140.36:33835> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35651
>                     <http://204.9.140.36:35651> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33938
>                     <http://204.9.140.36:33938> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33041
>                     <http://204.9.140.36:33041> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > ESTABLISHED
>                      >
>                      > tcp        0      0 204.9.140.36:53285
>                     <http://204.9.140.36:53285> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:53305
>                     <http://204.9.140.36:53305> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33768
>                     <http://204.9.140.36:33768> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35630
>                     <http://204.9.140.36:35630> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33754
>                     <http://204.9.140.36:33754> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35745
>                     <http://204.9.140.36:35745> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:35724
>                     <http://204.9.140.36:35724> 204.9.140.36:9997
>                     <http://204.9.140.36:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:9997
>                     <http://204.9.140.36:9997> 204.9.140.36:33041
>                     <http://204.9.140.36:33041>
>                      > ESTABLISHED
>                      >
>                      > tcp        0      0 204.9.140.36:53083
>                     <http://204.9.140.36:53083> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:50623
>                     <http://204.9.140.36:50623> 204.9.140.37:9997
>                     <http://204.9.140.37:9997>
>                      > ESTABLISHED
>                      >
>                      > tcp        0      0 204.9.140.36:33772
>                     <http://204.9.140.36:33772> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33732
>                     <http://204.9.140.36:33732> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33874
>                     <http://204.9.140.36:33874> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      > tcp        0      0 204.9.140.36:33810
>                     <http://204.9.140.36:33810> 204.9.140.38:9997
>                     <http://204.9.140.38:9997>
>                      > TIME_WAIT
>                      >
>                      >
>                      > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser
>                     <josh.elser@gmail.com <ma...@gmail.com>>
>                     wrote:
>                      >>
>                      >> Can you provide the output from netstat, lsof or
>                     /proc/$pid/fd for the
>                      >> tserver? Assuming you haven't altered
>                     tserv.port.client in
>                      >> accumulo-site.xml, we want the line for port 9997.
>                      >>
>                      >> From my laptop running a tserver on localhost:
>                      >>
>                      >> $ netstat -na | grep 9997
>                      >> tcp4       0      0  127.0.0.1.9997         *.*
>                                        LISTEN
>                      >>
>                      >> Depending on the tool you use, you can grep out
>                     the pid of the tserver
>                      >> or just that port itself.
>                      >>
>                      >> Just so you know, ZK binds to all available
>                     interfaces when it starts,
>                      >> so it should work seamlessly with localhost or
>                     the FQDN for the host.
>                      >> As such, it shouldn't matter what you provide to the
>                      >> ZooKeeperInstance. That should connect in all
>                     cases for you, it's when
>                      >> you make a Connector, and your client will talk
>                     to a tabletserver to
>                      >> authenticate, that your program should hang. It
>                     would be good to
>                      >> verify that.
>                      >>
>                      >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts
>                     <threadedblue@gmail.com <ma...@gmail.com>>
>                      >> wrote:
>                      >> > All,
>                      >> >
>                      >> > Thanks for the responses.
>                      >> >
>                      >> > Is this a problem for Accumulo?
>                      >> > Reverse DNS is yielding my ISP's host name.
>                     You know the drill, my IP in
>                      >> > reverse followed by their domain name, as
>                     opposed to my FQDN, which what
>                      >> > I
>                      >> > use in my config files.
>                      >> >
>                      >> > Running Accumulo 1.5.1
>                      >> > I have only one interface.
>                      >> > I have the FQDN in both master and slaves
>                     files for both Hadoop and
>                      >> > Accumulo; in zoo.cfg; and in accumulo-site.xml
>                     where the Zookeepers are
>                      >> > referenced.
>                      >> > Also, I am passing in all Zk FQDN when I
>                     instantiate ZookeeperInstance.
>                      >> > Forward DNS works
>                      >> > Reverse DNS... well (See above).
>                      >> >
>                      >> >
>                      >> >
>                      >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs
>                     <afuchs@apache.org <ma...@apache.org>> wrote:
>                      >> >>
>                      >> >> Accumulo tservers typically listen on a
>                     single interface. If you have a
>                      >> >> server with multiple interfaces (e.g.
>                     loopback and eth0), you might
>                      >> >> have a
>                      >> >> problem in which the tablet servers are not
>                     listening on externally
>                      >> >> reachable interfaces. Tablet servers will
>                     list the interfaces that they
>                      >> >> are
>                      >> >> listening to when they boot, and you can also
>                     use tools like lsof to
>                      >> >> find
>                      >> >> them.
>                      >> >>
>                      >> >> If that is indeed the problem, then you might
>                     just need to change you
>                      >> >> conf/slaves file to use <hostname> instead of
>                     localhost, and then
>                      >> >> restart.
>                      >> >>
>                      >> >> Adam
>                      >> >>
>                      >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts"
>                     <threadedblue@gmail.com <ma...@gmail.com>>
>                      >> >> wrote:
>                      >> >>>
>                      >> >>>
>                      >> >>> I have been happily working with Acc, but
>                     today things changed.  No
>                      >> >>> errors
>                      >> >>>
>                      >> >>> Until now I ran everything server side,
>                     which meant the URL was
>                      >> >>> localhost:2181, and life was good.  Today
>                     tried running some of the
>                      >> >>> same
>                      >> >>> code as a remote client, which means <host
>                     name>:2181.  Things hang
>                      >> >>> when
>                      >> >>> BatchWriter tries to commit anything and
>                     Scan hangs when it tries to
>                      >> >>> iterate
>                      >> >>> through a Map.
>                      >> >>>
>                      >> >>> Let's focus on the scan part:
>                      >> >>>
>                      >> >>> scan.fetchColumnFamily(new Text("colfY"));
>                     // This executes then
>                      >> >>> hangs.
>                      >> >>> for(Entry<Key,Value> entry : scan) {
>                      >> >>> def row = entry.getKey().getRow();
>                      >> >>> def value = entry.getValue();
>                      >> >>> println "value=" + value;
>                      >> >>> }
>                      >> >>>
>                      >> >>> This is what appears in the console :
>                      >> >>>
>                      >> >>> 17:22:39.802 C{0} M DEBUG
>                     org.apache.zookeeper.ClientCnxn - Got ping
>                      >> >>> response for sessionid: 0x148c6f03388005e
>                     after 21ms
>                      >> >>>
>                      >> >>> 17:22:49.803 C{0} M DEBUG
>                     org.apache.zookeeper.ClientCnxn - Got ping
>                      >> >>> response for sessionid: 0x148c6f03388005e
>                     after 21ms
>                      >> >>>
>                      >> >>> <and on and on>
>                      >> >>>
>                      >> >>>
>                      >> >>>
>                      >> >>> The only difference between success and a
>                     hang is a URL change, and of
>                      >> >>> course being remote.
>                      >> >>>
>                      >> >>> I don't believe this is a firewall issue.  I
>                     shutdown the firewall.
>                      >> >>>
>                      >> >>> Am I missing something?
>                      >> >>>
>                      >> >>> Thanks all.
>                      >> >>>
>                      >> >>> --
>                      >> >>> There are ways and there are ways,
>                      >> >>>
>                      >> >>> Geoffry Roberts
>                      >> >
>                      >> >
>                      >> >
>                      >> >
>                      >> > --
>                      >> > There are ways and there are ways,
>                      >> >
>                      >> > Geoffry Roberts
>                      >
>                      >
>                      >
>                      >
>                      > --
>                      > There are ways and there are ways,
>                      >
>                      > Geoffry Roberts
>
>
>
>
>                 --
>                 There are ways and there are ways,
>
>                 Geoffry Roberts
>
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts

Re: Remotely Accumulo

Posted by Geoffry Roberts <th...@gmail.com>.
I found the message in tserver*.out.  tserver*.err has 0 in it.

I posted last night, life was good, sat down this morning and saw that
another tserver had crashed, over night, with no activity.  ??  In tserver*.out
it again says out of heap space.

ACCUMULO_TSERVER_OPTS=-Xmx2G -Xms1G. I would have thought it sufficient.

The fact that the log entries lack timestamps, but have hashmarks makes
makes me wonder if I am reading things correctly.

#

# java.lang.OutOfMemoryError: Java heap space

# -XX:OnOutOfMemoryError="kill -9 %p"

#   Executing /bin/sh -c "kill -9 3241"...


Is there a way to start a particular tablet server?

On Wed, Oct 8, 2014 at 6:55 PM, Eric Newton <er...@gmail.com> wrote:

> Did you find the message in the tserver*.out, terver*.err or the monitor
> page?
>
> (Thanks for the follow-up message.)
>
> On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts <th...@gmail.com>
> wrote:
>
>> Just for the record, I finally got to the bottom of things.  One of my
>> Tservers was running out of memory.  I hadn't noticed.  I had my SA
>> allocate a lttle more--each node now has 6G up from 2G--and things are
>> working better.
>>  On Oct 8, 2014 10:09 AM, "Josh Elser" <jo...@gmail.com> wrote:
>>
>>> Jstack is a tool which can be used to tell a java process to dump the
>>> current stack traces for all of its threads. It's usually included with the
>>> JDK. `kill -3 $pid` also does the same. If the output can't be respected
>>> automatically to your shell, check the stdout for the process you gave as
>>> an argument.
>>>
>>> When your client is sitting waiting on data from the tabletserver, you
>>> can get the stack traces from the tserver and you should be able to find a
>>> thread with scan in the name, along with your client's IP, and we can help
>>> debug exactly what the server is doing that is preventing it from returning
>>> data to your client.
>>> On Oct 8, 2014 9:43 AM, "Geoffry Roberts" <th...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Josh.  But what do you mean my "jstack'ing"?  I'm unfamiliar
>>>> with that term.  A better question would be how can one troubleshoot such a
>>>> thing?
>>>>
>>>> btw
>>>> I am the sole user on this cluster.
>>>>
>>>> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Ok, this record:
>>>>>
>>>>> tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>>>>>      LISTEN
>>>>>
>>>>> Means that your is listening on the correct port on all interfaces.
>>>>> There shouldn't be issues connecting to the tserver. This is also
>>>>> confirmed by the fact that you authenticated and got a Connector (this
>>>>> does an RPC to the tserver).
>>>>>
>>>>> So, your tserver is up, and your client can communicate with it. The
>>>>> real question is why is the scan hanging. Perhaps jstack'ing the
>>>>> tserver when your client is blocked waiting for results.
>>>>>
>>>>> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <
>>>>> threadedblue@gmail.com> wrote:
>>>>> > "...it's when
>>>>> > you make a Connector, and your client will talk to a tabletserver to
>>>>> > authenticate, that your program should hang. It would be good to
>>>>> > verify that."
>>>>> >
>>>>> >
>>>>> > My program should hang?  Would you expand?  That is exactly what it
>>>>> is
>>>>> > doing.  I am able to get a connector.  But when I try to iterate the
>>>>> result
>>>>> > of a scan, that's when it hangs.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Here's what comes from netstat:
>>>>> >
>>>>> >
>>>>> > $ netstat -na | grep 9997
>>>>> >
>>>>> > tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>>>>> > LISTEN
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35679          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53146          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33896          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53282          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53188          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35609          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33901          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35588          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33877          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33946          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53167          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33949          204.9.140.38:9997
>>>>> > ESTABLISHED
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35546          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33852          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53125          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33922          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33747          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33961          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33793          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35768          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33917          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33814          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35567          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33444          204.9.140.38:9997
>>>>> > FIN_WAIT2
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35701          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33969          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53258          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33831          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53210          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53104          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33789          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33856          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53237          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33835          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35651          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33938          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33041          204.9.140.36:9997
>>>>> > ESTABLISHED
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53285          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53305          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33768          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35630          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33754          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35745          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:35724          204.9.140.36:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:9997           204.9.140.36:33041
>>>>> > ESTABLISHED
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:53083          204.9.140.37:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:50623          204.9.140.37:9997
>>>>> > ESTABLISHED
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33772          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33732          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33874          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> > tcp        0      0 204.9.140.36:33810          204.9.140.38:9997
>>>>> > TIME_WAIT
>>>>> >
>>>>> >
>>>>> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for
>>>>> the
>>>>> >> tserver? Assuming you haven't altered tserv.port.client in
>>>>> >> accumulo-site.xml, we want the line for port 9997.
>>>>> >>
>>>>> >> From my laptop running a tserver on localhost:
>>>>> >>
>>>>> >> $ netstat -na | grep 9997
>>>>> >> tcp4       0      0  127.0.0.1.9997         *.*
>>>>> LISTEN
>>>>> >>
>>>>> >> Depending on the tool you use, you can grep out the pid of the
>>>>> tserver
>>>>> >> or just that port itself.
>>>>> >>
>>>>> >> Just so you know, ZK binds to all available interfaces when it
>>>>> starts,
>>>>> >> so it should work seamlessly with localhost or the FQDN for the
>>>>> host.
>>>>> >> As such, it shouldn't matter what you provide to the
>>>>> >> ZooKeeperInstance. That should connect in all cases for you, it's
>>>>> when
>>>>> >> you make a Connector, and your client will talk to a tabletserver to
>>>>> >> authenticate, that your program should hang. It would be good to
>>>>> >> verify that.
>>>>> >>
>>>>> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
>>>>> threadedblue@gmail.com>
>>>>> >> wrote:
>>>>> >> > All,
>>>>> >> >
>>>>> >> > Thanks for the responses.
>>>>> >> >
>>>>> >> > Is this a problem for Accumulo?
>>>>> >> > Reverse DNS is yielding my ISP's host name. You know the drill,
>>>>> my IP in
>>>>> >> > reverse followed by their domain name, as opposed to my FQDN,
>>>>> which what
>>>>> >> > I
>>>>> >> > use in my config files.
>>>>> >> >
>>>>> >> > Running Accumulo 1.5.1
>>>>> >> > I have only one interface.
>>>>> >> > I have the FQDN in both master and slaves files for both Hadoop
>>>>> and
>>>>> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the
>>>>> Zookeepers are
>>>>> >> > referenced.
>>>>> >> > Also, I am passing in all Zk FQDN when I instantiate
>>>>> ZookeeperInstance.
>>>>> >> > Forward DNS works
>>>>> >> > Reverse DNS... well (See above).
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> Accumulo tservers typically listen on a single interface. If you
>>>>> have a
>>>>> >> >> server with multiple interfaces (e.g. loopback and eth0), you
>>>>> might
>>>>> >> >> have a
>>>>> >> >> problem in which the tablet servers are not listening on
>>>>> externally
>>>>> >> >> reachable interfaces. Tablet servers will list the interfaces
>>>>> that they
>>>>> >> >> are
>>>>> >> >> listening to when they boot, and you can also use tools like
>>>>> lsof to
>>>>> >> >> find
>>>>> >> >> them.
>>>>> >> >>
>>>>> >> >> If that is indeed the problem, then you might just need to
>>>>> change you
>>>>> >> >> conf/slaves file to use <hostname> instead of localhost, and then
>>>>> >> >> restart.
>>>>> >> >>
>>>>> >> >> Adam
>>>>> >> >>
>>>>> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <
>>>>> threadedblue@gmail.com>
>>>>> >> >> wrote:
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> I have been happily working with Acc, but today things
>>>>> changed.  No
>>>>> >> >>> errors
>>>>> >> >>>
>>>>> >> >>> Until now I ran everything server side, which meant the URL was
>>>>> >> >>> localhost:2181, and life was good.  Today tried running some of
>>>>> the
>>>>> >> >>> same
>>>>> >> >>> code as a remote client, which means <host name>:2181.  Things
>>>>> hang
>>>>> >> >>> when
>>>>> >> >>> BatchWriter tries to commit anything and Scan hangs when it
>>>>> tries to
>>>>> >> >>> iterate
>>>>> >> >>> through a Map.
>>>>> >> >>>
>>>>> >> >>> Let's focus on the scan part:
>>>>> >> >>>
>>>>> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>>>>> >> >>> hangs.
>>>>> >> >>> for(Entry<Key,Value> entry : scan) {
>>>>> >> >>> def row = entry.getKey().getRow();
>>>>> >> >>> def value = entry.getValue();
>>>>> >> >>> println "value=" + value;
>>>>> >> >>> }
>>>>> >> >>>
>>>>> >> >>> This is what appears in the console :
>>>>> >> >>>
>>>>> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>>>> ping
>>>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>>>> >> >>>
>>>>> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>>>> ping
>>>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>>>> >> >>>
>>>>> >> >>> <and on and on>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> The only difference between success and a hang is a URL change,
>>>>> and of
>>>>> >> >>> course being remote.
>>>>> >> >>>
>>>>> >> >>> I don't believe this is a firewall issue.  I shutdown the
>>>>> firewall.
>>>>> >> >>>
>>>>> >> >>> Am I missing something?
>>>>> >> >>>
>>>>> >> >>> Thanks all.
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> There are ways and there are ways,
>>>>> >> >>>
>>>>> >> >>> Geoffry Roberts
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> > There are ways and there are ways,
>>>>> >> >
>>>>> >> > Geoffry Roberts
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > There are ways and there are ways,
>>>>> >
>>>>> > Geoffry Roberts
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> There are ways and there are ways,
>>>>
>>>> Geoffry Roberts
>>>>
>>>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Remotely Accumulo

Posted by Eric Newton <er...@gmail.com>.
Did you find the message in the tserver*.out, terver*.err or the monitor
page?

(Thanks for the follow-up message.)

On Wed, Oct 8, 2014 at 6:39 PM, Geoffry Roberts <th...@gmail.com>
wrote:

> Just for the record, I finally got to the bottom of things.  One of my
> Tservers was running out of memory.  I hadn't noticed.  I had my SA
> allocate a lttle more--each node now has 6G up from 2G--and things are
> working better.
>  On Oct 8, 2014 10:09 AM, "Josh Elser" <jo...@gmail.com> wrote:
>
>> Jstack is a tool which can be used to tell a java process to dump the
>> current stack traces for all of its threads. It's usually included with the
>> JDK. `kill -3 $pid` also does the same. If the output can't be respected
>> automatically to your shell, check the stdout for the process you gave as
>> an argument.
>>
>> When your client is sitting waiting on data from the tabletserver, you
>> can get the stack traces from the tserver and you should be able to find a
>> thread with scan in the name, along with your client's IP, and we can help
>> debug exactly what the server is doing that is preventing it from returning
>> data to your client.
>> On Oct 8, 2014 9:43 AM, "Geoffry Roberts" <th...@gmail.com> wrote:
>>
>>> Thanks Josh.  But what do you mean my "jstack'ing"?  I'm unfamiliar
>>> with that term.  A better question would be how can one troubleshoot such a
>>> thing?
>>>
>>> btw
>>> I am the sole user on this cluster.
>>>
>>> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com> wrote:
>>>
>>>> Ok, this record:
>>>>
>>>> tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>>>>      LISTEN
>>>>
>>>> Means that your is listening on the correct port on all interfaces.
>>>> There shouldn't be issues connecting to the tserver. This is also
>>>> confirmed by the fact that you authenticated and got a Connector (this
>>>> does an RPC to the tserver).
>>>>
>>>> So, your tserver is up, and your client can communicate with it. The
>>>> real question is why is the scan hanging. Perhaps jstack'ing the
>>>> tserver when your client is blocked waiting for results.
>>>>
>>>> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com>
>>>> wrote:
>>>> > "...it's when
>>>> > you make a Connector, and your client will talk to a tabletserver to
>>>> > authenticate, that your program should hang. It would be good to
>>>> > verify that."
>>>> >
>>>> >
>>>> > My program should hang?  Would you expand?  That is exactly what it is
>>>> > doing.  I am able to get a connector.  But when I try to iterate the
>>>> result
>>>> > of a scan, that's when it hangs.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Here's what comes from netstat:
>>>> >
>>>> >
>>>> > $ netstat -na | grep 9997
>>>> >
>>>> > tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>>>> > LISTEN
>>>> >
>>>> > tcp        0      0 204.9.140.36:35679          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53146          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33896          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53282          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53188          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:35609          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33901          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:35588          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33877          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33946          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53167          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33949          204.9.140.38:9997
>>>> > ESTABLISHED
>>>> >
>>>> > tcp        0      0 204.9.140.36:35546          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33852          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53125          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33922          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33747          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33961          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33793          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:35768          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33917          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33814          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:35567          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33444          204.9.140.38:9997
>>>> > FIN_WAIT2
>>>> >
>>>> > tcp        0      0 204.9.140.36:35701          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33969          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53258          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33831          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53210          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53104          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33789          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33856          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53237          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33835          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:35651          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33938          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33041          204.9.140.36:9997
>>>> > ESTABLISHED
>>>> >
>>>> > tcp        0      0 204.9.140.36:53285          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:53305          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33768          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:35630          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33754          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:35745          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:35724          204.9.140.36:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:9997           204.9.140.36:33041
>>>> > ESTABLISHED
>>>> >
>>>> > tcp        0      0 204.9.140.36:53083          204.9.140.37:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:50623          204.9.140.37:9997
>>>> > ESTABLISHED
>>>> >
>>>> > tcp        0      0 204.9.140.36:33772          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33732          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33874          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> > tcp        0      0 204.9.140.36:33810          204.9.140.38:9997
>>>> > TIME_WAIT
>>>> >
>>>> >
>>>> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for
>>>> the
>>>> >> tserver? Assuming you haven't altered tserv.port.client in
>>>> >> accumulo-site.xml, we want the line for port 9997.
>>>> >>
>>>> >> From my laptop running a tserver on localhost:
>>>> >>
>>>> >> $ netstat -na | grep 9997
>>>> >> tcp4       0      0  127.0.0.1.9997         *.*
>>>> LISTEN
>>>> >>
>>>> >> Depending on the tool you use, you can grep out the pid of the
>>>> tserver
>>>> >> or just that port itself.
>>>> >>
>>>> >> Just so you know, ZK binds to all available interfaces when it
>>>> starts,
>>>> >> so it should work seamlessly with localhost or the FQDN for the host.
>>>> >> As such, it shouldn't matter what you provide to the
>>>> >> ZooKeeperInstance. That should connect in all cases for you, it's
>>>> when
>>>> >> you make a Connector, and your client will talk to a tabletserver to
>>>> >> authenticate, that your program should hang. It would be good to
>>>> >> verify that.
>>>> >>
>>>> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
>>>> threadedblue@gmail.com>
>>>> >> wrote:
>>>> >> > All,
>>>> >> >
>>>> >> > Thanks for the responses.
>>>> >> >
>>>> >> > Is this a problem for Accumulo?
>>>> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my
>>>> IP in
>>>> >> > reverse followed by their domain name, as opposed to my FQDN,
>>>> which what
>>>> >> > I
>>>> >> > use in my config files.
>>>> >> >
>>>> >> > Running Accumulo 1.5.1
>>>> >> > I have only one interface.
>>>> >> > I have the FQDN in both master and slaves files for both Hadoop and
>>>> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the
>>>> Zookeepers are
>>>> >> > referenced.
>>>> >> > Also, I am passing in all Zk FQDN when I instantiate
>>>> ZookeeperInstance.
>>>> >> > Forward DNS works
>>>> >> > Reverse DNS... well (See above).
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
>>>> wrote:
>>>> >> >>
>>>> >> >> Accumulo tservers typically listen on a single interface. If you
>>>> have a
>>>> >> >> server with multiple interfaces (e.g. loopback and eth0), you
>>>> might
>>>> >> >> have a
>>>> >> >> problem in which the tablet servers are not listening on
>>>> externally
>>>> >> >> reachable interfaces. Tablet servers will list the interfaces
>>>> that they
>>>> >> >> are
>>>> >> >> listening to when they boot, and you can also use tools like lsof
>>>> to
>>>> >> >> find
>>>> >> >> them.
>>>> >> >>
>>>> >> >> If that is indeed the problem, then you might just need to change
>>>> you
>>>> >> >> conf/slaves file to use <hostname> instead of localhost, and then
>>>> >> >> restart.
>>>> >> >>
>>>> >> >> Adam
>>>> >> >>
>>>> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <threadedblue@gmail.com
>>>> >
>>>> >> >> wrote:
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> I have been happily working with Acc, but today things changed.
>>>> No
>>>> >> >>> errors
>>>> >> >>>
>>>> >> >>> Until now I ran everything server side, which meant the URL was
>>>> >> >>> localhost:2181, and life was good.  Today tried running some of
>>>> the
>>>> >> >>> same
>>>> >> >>> code as a remote client, which means <host name>:2181.  Things
>>>> hang
>>>> >> >>> when
>>>> >> >>> BatchWriter tries to commit anything and Scan hangs when it
>>>> tries to
>>>> >> >>> iterate
>>>> >> >>> through a Map.
>>>> >> >>>
>>>> >> >>> Let's focus on the scan part:
>>>> >> >>>
>>>> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>>>> >> >>> hangs.
>>>> >> >>> for(Entry<Key,Value> entry : scan) {
>>>> >> >>> def row = entry.getKey().getRow();
>>>> >> >>> def value = entry.getValue();
>>>> >> >>> println "value=" + value;
>>>> >> >>> }
>>>> >> >>>
>>>> >> >>> This is what appears in the console :
>>>> >> >>>
>>>> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>>> ping
>>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>>> >> >>>
>>>> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>>> ping
>>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>>> >> >>>
>>>> >> >>> <and on and on>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> The only difference between success and a hang is a URL change,
>>>> and of
>>>> >> >>> course being remote.
>>>> >> >>>
>>>> >> >>> I don't believe this is a firewall issue.  I shutdown the
>>>> firewall.
>>>> >> >>>
>>>> >> >>> Am I missing something?
>>>> >> >>>
>>>> >> >>> Thanks all.
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> There are ways and there are ways,
>>>> >> >>>
>>>> >> >>> Geoffry Roberts
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > There are ways and there are ways,
>>>> >> >
>>>> >> > Geoffry Roberts
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > There are ways and there are ways,
>>>> >
>>>> > Geoffry Roberts
>>>>
>>>
>>>
>>>
>>> --
>>> There are ways and there are ways,
>>>
>>> Geoffry Roberts
>>>
>>

Re: Remotely Accumulo

Posted by Geoffry Roberts <th...@gmail.com>.
Just for the record, I finally got to the bottom of things.  One of my
Tservers was running out of memory.  I hadn't noticed.  I had my SA
allocate a lttle more--each node now has 6G up from 2G--and things are
working better.
 On Oct 8, 2014 10:09 AM, "Josh Elser" <jo...@gmail.com> wrote:

> Jstack is a tool which can be used to tell a java process to dump the
> current stack traces for all of its threads. It's usually included with the
> JDK. `kill -3 $pid` also does the same. If the output can't be respected
> automatically to your shell, check the stdout for the process you gave as
> an argument.
>
> When your client is sitting waiting on data from the tabletserver, you can
> get the stack traces from the tserver and you should be able to find a
> thread with scan in the name, along with your client's IP, and we can help
> debug exactly what the server is doing that is preventing it from returning
> data to your client.
> On Oct 8, 2014 9:43 AM, "Geoffry Roberts" <th...@gmail.com> wrote:
>
>> Thanks Josh.  But what do you mean my "jstack'ing"?  I'm unfamiliar with
>> that term.  A better question would be how can one troubleshoot such a
>> thing?
>>
>> btw
>> I am the sole user on this cluster.
>>
>> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com> wrote:
>>
>>> Ok, this record:
>>>
>>> tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>>>      LISTEN
>>>
>>> Means that your is listening on the correct port on all interfaces.
>>> There shouldn't be issues connecting to the tserver. This is also
>>> confirmed by the fact that you authenticated and got a Connector (this
>>> does an RPC to the tserver).
>>>
>>> So, your tserver is up, and your client can communicate with it. The
>>> real question is why is the scan hanging. Perhaps jstack'ing the
>>> tserver when your client is blocked waiting for results.
>>>
>>> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com>
>>> wrote:
>>> > "...it's when
>>> > you make a Connector, and your client will talk to a tabletserver to
>>> > authenticate, that your program should hang. It would be good to
>>> > verify that."
>>> >
>>> >
>>> > My program should hang?  Would you expand?  That is exactly what it is
>>> > doing.  I am able to get a connector.  But when I try to iterate the
>>> result
>>> > of a scan, that's when it hangs.
>>> >
>>> >
>>> >
>>> >
>>> > Here's what comes from netstat:
>>> >
>>> >
>>> > $ netstat -na | grep 9997
>>> >
>>> > tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>>> > LISTEN
>>> >
>>> > tcp        0      0 204.9.140.36:35679          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53146          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33896          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53282          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53188          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:35609          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33901          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:35588          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33877          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33946          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53167          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33949          204.9.140.38:9997
>>> > ESTABLISHED
>>> >
>>> > tcp        0      0 204.9.140.36:35546          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33852          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53125          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33922          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33747          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33961          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33793          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:35768          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33917          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33814          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:35567          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33444          204.9.140.38:9997
>>> > FIN_WAIT2
>>> >
>>> > tcp        0      0 204.9.140.36:35701          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33969          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53258          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33831          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53210          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53104          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33789          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33856          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53237          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33835          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:35651          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33938          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33041          204.9.140.36:9997
>>> > ESTABLISHED
>>> >
>>> > tcp        0      0 204.9.140.36:53285          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:53305          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33768          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:35630          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33754          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:35745          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:35724          204.9.140.36:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:9997           204.9.140.36:33041
>>> > ESTABLISHED
>>> >
>>> > tcp        0      0 204.9.140.36:53083          204.9.140.37:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:50623          204.9.140.37:9997
>>> > ESTABLISHED
>>> >
>>> > tcp        0      0 204.9.140.36:33772          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33732          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33874          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> > tcp        0      0 204.9.140.36:33810          204.9.140.38:9997
>>> > TIME_WAIT
>>> >
>>> >
>>> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
>>> wrote:
>>> >>
>>> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
>>> >> tserver? Assuming you haven't altered tserv.port.client in
>>> >> accumulo-site.xml, we want the line for port 9997.
>>> >>
>>> >> From my laptop running a tserver on localhost:
>>> >>
>>> >> $ netstat -na | grep 9997
>>> >> tcp4       0      0  127.0.0.1.9997         *.*
>>> LISTEN
>>> >>
>>> >> Depending on the tool you use, you can grep out the pid of the tserver
>>> >> or just that port itself.
>>> >>
>>> >> Just so you know, ZK binds to all available interfaces when it starts,
>>> >> so it should work seamlessly with localhost or the FQDN for the host.
>>> >> As such, it shouldn't matter what you provide to the
>>> >> ZooKeeperInstance. That should connect in all cases for you, it's when
>>> >> you make a Connector, and your client will talk to a tabletserver to
>>> >> authenticate, that your program should hang. It would be good to
>>> >> verify that.
>>> >>
>>> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
>>> threadedblue@gmail.com>
>>> >> wrote:
>>> >> > All,
>>> >> >
>>> >> > Thanks for the responses.
>>> >> >
>>> >> > Is this a problem for Accumulo?
>>> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my
>>> IP in
>>> >> > reverse followed by their domain name, as opposed to my FQDN, which
>>> what
>>> >> > I
>>> >> > use in my config files.
>>> >> >
>>> >> > Running Accumulo 1.5.1
>>> >> > I have only one interface.
>>> >> > I have the FQDN in both master and slaves files for both Hadoop and
>>> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers
>>> are
>>> >> > referenced.
>>> >> > Also, I am passing in all Zk FQDN when I instantiate
>>> ZookeeperInstance.
>>> >> > Forward DNS works
>>> >> > Reverse DNS... well (See above).
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
>>> wrote:
>>> >> >>
>>> >> >> Accumulo tservers typically listen on a single interface. If you
>>> have a
>>> >> >> server with multiple interfaces (e.g. loopback and eth0), you might
>>> >> >> have a
>>> >> >> problem in which the tablet servers are not listening on externally
>>> >> >> reachable interfaces. Tablet servers will list the interfaces that
>>> they
>>> >> >> are
>>> >> >> listening to when they boot, and you can also use tools like lsof
>>> to
>>> >> >> find
>>> >> >> them.
>>> >> >>
>>> >> >> If that is indeed the problem, then you might just need to change
>>> you
>>> >> >> conf/slaves file to use <hostname> instead of localhost, and then
>>> >> >> restart.
>>> >> >>
>>> >> >> Adam
>>> >> >>
>>> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>>
>>> >> >>> I have been happily working with Acc, but today things changed.
>>> No
>>> >> >>> errors
>>> >> >>>
>>> >> >>> Until now I ran everything server side, which meant the URL was
>>> >> >>> localhost:2181, and life was good.  Today tried running some of
>>> the
>>> >> >>> same
>>> >> >>> code as a remote client, which means <host name>:2181.  Things
>>> hang
>>> >> >>> when
>>> >> >>> BatchWriter tries to commit anything and Scan hangs when it tries
>>> to
>>> >> >>> iterate
>>> >> >>> through a Map.
>>> >> >>>
>>> >> >>> Let's focus on the scan part:
>>> >> >>>
>>> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>>> >> >>> hangs.
>>> >> >>> for(Entry<Key,Value> entry : scan) {
>>> >> >>> def row = entry.getKey().getRow();
>>> >> >>> def value = entry.getValue();
>>> >> >>> println "value=" + value;
>>> >> >>> }
>>> >> >>>
>>> >> >>> This is what appears in the console :
>>> >> >>>
>>> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>> ping
>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>> >> >>>
>>> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>>> ping
>>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>>> >> >>>
>>> >> >>> <and on and on>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> The only difference between success and a hang is a URL change,
>>> and of
>>> >> >>> course being remote.
>>> >> >>>
>>> >> >>> I don't believe this is a firewall issue.  I shutdown the
>>> firewall.
>>> >> >>>
>>> >> >>> Am I missing something?
>>> >> >>>
>>> >> >>> Thanks all.
>>> >> >>>
>>> >> >>> --
>>> >> >>> There are ways and there are ways,
>>> >> >>>
>>> >> >>> Geoffry Roberts
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > There are ways and there are ways,
>>> >> >
>>> >> > Geoffry Roberts
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > There are ways and there are ways,
>>> >
>>> > Geoffry Roberts
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>

Re: Remotely Accumulo

Posted by Josh Elser <jo...@gmail.com>.
Jstack is a tool which can be used to tell a java process to dump the
current stack traces for all of its threads. It's usually included with the
JDK. `kill -3 $pid` also does the same. If the output can't be respected
automatically to your shell, check the stdout for the process you gave as
an argument.

When your client is sitting waiting on data from the tabletserver, you can
get the stack traces from the tserver and you should be able to find a
thread with scan in the name, along with your client's IP, and we can help
debug exactly what the server is doing that is preventing it from returning
data to your client.
On Oct 8, 2014 9:43 AM, "Geoffry Roberts" <th...@gmail.com> wrote:

> Thanks Josh.  But what do you mean my "jstack'ing"?  I'm unfamiliar with
> that term.  A better question would be how can one troubleshoot such a
> thing?
>
> btw
> I am the sole user on this cluster.
>
> On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com> wrote:
>
>> Ok, this record:
>>
>> tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>>      LISTEN
>>
>> Means that your is listening on the correct port on all interfaces.
>> There shouldn't be issues connecting to the tserver. This is also
>> confirmed by the fact that you authenticated and got a Connector (this
>> does an RPC to the tserver).
>>
>> So, your tserver is up, and your client can communicate with it. The
>> real question is why is the scan hanging. Perhaps jstack'ing the
>> tserver when your client is blocked waiting for results.
>>
>> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com>
>> wrote:
>> > "...it's when
>> > you make a Connector, and your client will talk to a tabletserver to
>> > authenticate, that your program should hang. It would be good to
>> > verify that."
>> >
>> >
>> > My program should hang?  Would you expand?  That is exactly what it is
>> > doing.  I am able to get a connector.  But when I try to iterate the
>> result
>> > of a scan, that's when it hangs.
>> >
>> >
>> >
>> >
>> > Here's what comes from netstat:
>> >
>> >
>> > $ netstat -na | grep 9997
>> >
>> > tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>> > LISTEN
>> >
>> > tcp        0      0 204.9.140.36:35679          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53146          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33896          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53282          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53188          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:35609          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33901          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:35588          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33877          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33946          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53167          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33949          204.9.140.38:9997
>> > ESTABLISHED
>> >
>> > tcp        0      0 204.9.140.36:35546          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33852          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53125          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33922          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33747          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33961          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33793          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:35768          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33917          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33814          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:35567          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33444          204.9.140.38:9997
>> > FIN_WAIT2
>> >
>> > tcp        0      0 204.9.140.36:35701          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33969          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53258          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33831          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53210          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53104          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33789          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33856          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53237          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33835          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:35651          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33938          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33041          204.9.140.36:9997
>> > ESTABLISHED
>> >
>> > tcp        0      0 204.9.140.36:53285          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:53305          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33768          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:35630          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33754          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:35745          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:35724          204.9.140.36:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:9997           204.9.140.36:33041
>> > ESTABLISHED
>> >
>> > tcp        0      0 204.9.140.36:53083          204.9.140.37:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:50623          204.9.140.37:9997
>> > ESTABLISHED
>> >
>> > tcp        0      0 204.9.140.36:33772          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33732          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33874          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> > tcp        0      0 204.9.140.36:33810          204.9.140.38:9997
>> > TIME_WAIT
>> >
>> >
>> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
>> wrote:
>> >>
>> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
>> >> tserver? Assuming you haven't altered tserv.port.client in
>> >> accumulo-site.xml, we want the line for port 9997.
>> >>
>> >> From my laptop running a tserver on localhost:
>> >>
>> >> $ netstat -na | grep 9997
>> >> tcp4       0      0  127.0.0.1.9997         *.*
>> LISTEN
>> >>
>> >> Depending on the tool you use, you can grep out the pid of the tserver
>> >> or just that port itself.
>> >>
>> >> Just so you know, ZK binds to all available interfaces when it starts,
>> >> so it should work seamlessly with localhost or the FQDN for the host.
>> >> As such, it shouldn't matter what you provide to the
>> >> ZooKeeperInstance. That should connect in all cases for you, it's when
>> >> you make a Connector, and your client will talk to a tabletserver to
>> >> authenticate, that your program should hang. It would be good to
>> >> verify that.
>> >>
>> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
>> threadedblue@gmail.com>
>> >> wrote:
>> >> > All,
>> >> >
>> >> > Thanks for the responses.
>> >> >
>> >> > Is this a problem for Accumulo?
>> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my
>> IP in
>> >> > reverse followed by their domain name, as opposed to my FQDN, which
>> what
>> >> > I
>> >> > use in my config files.
>> >> >
>> >> > Running Accumulo 1.5.1
>> >> > I have only one interface.
>> >> > I have the FQDN in both master and slaves files for both Hadoop and
>> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers
>> are
>> >> > referenced.
>> >> > Also, I am passing in all Zk FQDN when I instantiate
>> ZookeeperInstance.
>> >> > Forward DNS works
>> >> > Reverse DNS... well (See above).
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
>> wrote:
>> >> >>
>> >> >> Accumulo tservers typically listen on a single interface. If you
>> have a
>> >> >> server with multiple interfaces (e.g. loopback and eth0), you might
>> >> >> have a
>> >> >> problem in which the tablet servers are not listening on externally
>> >> >> reachable interfaces. Tablet servers will list the interfaces that
>> they
>> >> >> are
>> >> >> listening to when they boot, and you can also use tools like lsof to
>> >> >> find
>> >> >> them.
>> >> >>
>> >> >> If that is indeed the problem, then you might just need to change
>> you
>> >> >> conf/slaves file to use <hostname> instead of localhost, and then
>> >> >> restart.
>> >> >>
>> >> >> Adam
>> >> >>
>> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>>
>> >> >>> I have been happily working with Acc, but today things changed.  No
>> >> >>> errors
>> >> >>>
>> >> >>> Until now I ran everything server side, which meant the URL was
>> >> >>> localhost:2181, and life was good.  Today tried running some of the
>> >> >>> same
>> >> >>> code as a remote client, which means <host name>:2181.  Things hang
>> >> >>> when
>> >> >>> BatchWriter tries to commit anything and Scan hangs when it tries
>> to
>> >> >>> iterate
>> >> >>> through a Map.
>> >> >>>
>> >> >>> Let's focus on the scan part:
>> >> >>>
>> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>> >> >>> hangs.
>> >> >>> for(Entry<Key,Value> entry : scan) {
>> >> >>> def row = entry.getKey().getRow();
>> >> >>> def value = entry.getValue();
>> >> >>> println "value=" + value;
>> >> >>> }
>> >> >>>
>> >> >>> This is what appears in the console :
>> >> >>>
>> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>> ping
>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>> >> >>>
>> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got
>> ping
>> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
>> >> >>>
>> >> >>> <and on and on>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> The only difference between success and a hang is a URL change,
>> and of
>> >> >>> course being remote.
>> >> >>>
>> >> >>> I don't believe this is a firewall issue.  I shutdown the firewall.
>> >> >>>
>> >> >>> Am I missing something?
>> >> >>>
>> >> >>> Thanks all.
>> >> >>>
>> >> >>> --
>> >> >>> There are ways and there are ways,
>> >> >>>
>> >> >>> Geoffry Roberts
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > There are ways and there are ways,
>> >> >
>> >> > Geoffry Roberts
>> >
>> >
>> >
>> >
>> > --
>> > There are ways and there are ways,
>> >
>> > Geoffry Roberts
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Remotely Accumulo

Posted by Geoffry Roberts <th...@gmail.com>.
Thanks Josh.  But what do you mean my "jstack'ing"?  I'm unfamiliar with
that term.  A better question would be how can one troubleshoot such a
thing?

btw
I am the sole user on this cluster.

On Tue, Oct 7, 2014 at 4:18 PM, Josh Elser <jo...@gmail.com> wrote:

> Ok, this record:
>
> tcp        0      0 0.0.0.0:9997                0.0.0.0:*
>      LISTEN
>
> Means that your is listening on the correct port on all interfaces.
> There shouldn't be issues connecting to the tserver. This is also
> confirmed by the fact that you authenticated and got a Connector (this
> does an RPC to the tserver).
>
> So, your tserver is up, and your client can communicate with it. The
> real question is why is the scan hanging. Perhaps jstack'ing the
> tserver when your client is blocked waiting for results.
>
> On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com>
> wrote:
> > "...it's when
> > you make a Connector, and your client will talk to a tabletserver to
> > authenticate, that your program should hang. It would be good to
> > verify that."
> >
> >
> > My program should hang?  Would you expand?  That is exactly what it is
> > doing.  I am able to get a connector.  But when I try to iterate the
> result
> > of a scan, that's when it hangs.
> >
> >
> >
> >
> > Here's what comes from netstat:
> >
> >
> > $ netstat -na | grep 9997
> >
> > tcp        0      0 0.0.0.0:9997                0.0.0.0:*
> > LISTEN
> >
> > tcp        0      0 204.9.140.36:35679          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53146          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33896          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53282          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53188          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35609          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33901          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35588          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33877          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33946          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53167          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33949          204.9.140.38:9997
> > ESTABLISHED
> >
> > tcp        0      0 204.9.140.36:35546          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33852          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53125          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33922          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33747          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33961          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33793          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35768          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33917          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33814          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35567          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33444          204.9.140.38:9997
> > FIN_WAIT2
> >
> > tcp        0      0 204.9.140.36:35701          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33969          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53258          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33831          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53210          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53104          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33789          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33856          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53237          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33835          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35651          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33938          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33041          204.9.140.36:9997
> > ESTABLISHED
> >
> > tcp        0      0 204.9.140.36:53285          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:53305          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33768          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35630          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33754          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35745          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:35724          204.9.140.36:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:9997           204.9.140.36:33041
> > ESTABLISHED
> >
> > tcp        0      0 204.9.140.36:53083          204.9.140.37:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:50623          204.9.140.37:9997
> > ESTABLISHED
> >
> > tcp        0      0 204.9.140.36:33772          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33732          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33874          204.9.140.38:9997
> > TIME_WAIT
> >
> > tcp        0      0 204.9.140.36:33810          204.9.140.38:9997
> > TIME_WAIT
> >
> >
> > On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com>
> wrote:
> >>
> >> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
> >> tserver? Assuming you haven't altered tserv.port.client in
> >> accumulo-site.xml, we want the line for port 9997.
> >>
> >> From my laptop running a tserver on localhost:
> >>
> >> $ netstat -na | grep 9997
> >> tcp4       0      0  127.0.0.1.9997         *.*
> LISTEN
> >>
> >> Depending on the tool you use, you can grep out the pid of the tserver
> >> or just that port itself.
> >>
> >> Just so you know, ZK binds to all available interfaces when it starts,
> >> so it should work seamlessly with localhost or the FQDN for the host.
> >> As such, it shouldn't matter what you provide to the
> >> ZooKeeperInstance. That should connect in all cases for you, it's when
> >> you make a Connector, and your client will talk to a tabletserver to
> >> authenticate, that your program should hang. It would be good to
> >> verify that.
> >>
> >> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <
> threadedblue@gmail.com>
> >> wrote:
> >> > All,
> >> >
> >> > Thanks for the responses.
> >> >
> >> > Is this a problem for Accumulo?
> >> > Reverse DNS is yielding my ISP's host name. You know the drill, my IP
> in
> >> > reverse followed by their domain name, as opposed to my FQDN, which
> what
> >> > I
> >> > use in my config files.
> >> >
> >> > Running Accumulo 1.5.1
> >> > I have only one interface.
> >> > I have the FQDN in both master and slaves files for both Hadoop and
> >> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers
> are
> >> > referenced.
> >> > Also, I am passing in all Zk FQDN when I instantiate
> ZookeeperInstance.
> >> > Forward DNS works
> >> > Reverse DNS... well (See above).
> >> >
> >> >
> >> >
> >> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org>
> wrote:
> >> >>
> >> >> Accumulo tservers typically listen on a single interface. If you
> have a
> >> >> server with multiple interfaces (e.g. loopback and eth0), you might
> >> >> have a
> >> >> problem in which the tablet servers are not listening on externally
> >> >> reachable interfaces. Tablet servers will list the interfaces that
> they
> >> >> are
> >> >> listening to when they boot, and you can also use tools like lsof to
> >> >> find
> >> >> them.
> >> >>
> >> >> If that is indeed the problem, then you might just need to change you
> >> >> conf/slaves file to use <hostname> instead of localhost, and then
> >> >> restart.
> >> >>
> >> >> Adam
> >> >>
> >> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>>
> >> >>> I have been happily working with Acc, but today things changed.  No
> >> >>> errors
> >> >>>
> >> >>> Until now I ran everything server side, which meant the URL was
> >> >>> localhost:2181, and life was good.  Today tried running some of the
> >> >>> same
> >> >>> code as a remote client, which means <host name>:2181.  Things hang
> >> >>> when
> >> >>> BatchWriter tries to commit anything and Scan hangs when it tries to
> >> >>> iterate
> >> >>> through a Map.
> >> >>>
> >> >>> Let's focus on the scan part:
> >> >>>
> >> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
> >> >>> hangs.
> >> >>> for(Entry<Key,Value> entry : scan) {
> >> >>> def row = entry.getKey().getRow();
> >> >>> def value = entry.getValue();
> >> >>> println "value=" + value;
> >> >>> }
> >> >>>
> >> >>> This is what appears in the console :
> >> >>>
> >> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >> >>>
> >> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >> >>>
> >> >>> <and on and on>
> >> >>>
> >> >>>
> >> >>>
> >> >>> The only difference between success and a hang is a URL change, and
> of
> >> >>> course being remote.
> >> >>>
> >> >>> I don't believe this is a firewall issue.  I shutdown the firewall.
> >> >>>
> >> >>> Am I missing something?
> >> >>>
> >> >>> Thanks all.
> >> >>>
> >> >>> --
> >> >>> There are ways and there are ways,
> >> >>>
> >> >>> Geoffry Roberts
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > There are ways and there are ways,
> >> >
> >> > Geoffry Roberts
> >
> >
> >
> >
> > --
> > There are ways and there are ways,
> >
> > Geoffry Roberts
>



-- 
There are ways and there are ways,

Geoffry Roberts

Re: Remotely Accumulo

Posted by Josh Elser <jo...@gmail.com>.
Ok, this record:

tcp        0      0 0.0.0.0:9997                0.0.0.0:*
     LISTEN

Means that your is listening on the correct port on all interfaces.
There shouldn't be issues connecting to the tserver. This is also
confirmed by the fact that you authenticated and got a Connector (this
does an RPC to the tserver).

So, your tserver is up, and your client can communicate with it. The
real question is why is the scan hanging. Perhaps jstack'ing the
tserver when your client is blocked waiting for results.

On Tue, Oct 7, 2014 at 2:07 PM, Geoffry Roberts <th...@gmail.com> wrote:
> "...it's when
> you make a Connector, and your client will talk to a tabletserver to
> authenticate, that your program should hang. It would be good to
> verify that."
>
>
> My program should hang?  Would you expand?  That is exactly what it is
> doing.  I am able to get a connector.  But when I try to iterate the result
> of a scan, that's when it hangs.
>
>
>
>
> Here's what comes from netstat:
>
>
> $ netstat -na | grep 9997
>
> tcp        0      0 0.0.0.0:9997                0.0.0.0:*
> LISTEN
>
> tcp        0      0 204.9.140.36:35679          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53146          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33896          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53282          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53188          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:35609          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33901          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:35588          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33877          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33946          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53167          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33949          204.9.140.38:9997
> ESTABLISHED
>
> tcp        0      0 204.9.140.36:35546          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33852          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53125          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33922          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33747          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33961          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33793          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:35768          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33917          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33814          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:35567          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33444          204.9.140.38:9997
> FIN_WAIT2
>
> tcp        0      0 204.9.140.36:35701          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33969          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53258          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33831          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53210          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53104          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33789          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33856          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53237          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33835          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:35651          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33938          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33041          204.9.140.36:9997
> ESTABLISHED
>
> tcp        0      0 204.9.140.36:53285          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:53305          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33768          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:35630          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33754          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:35745          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:35724          204.9.140.36:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:9997           204.9.140.36:33041
> ESTABLISHED
>
> tcp        0      0 204.9.140.36:53083          204.9.140.37:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:50623          204.9.140.37:9997
> ESTABLISHED
>
> tcp        0      0 204.9.140.36:33772          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33732          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33874          204.9.140.38:9997
> TIME_WAIT
>
> tcp        0      0 204.9.140.36:33810          204.9.140.38:9997
> TIME_WAIT
>
>
> On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com> wrote:
>>
>> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
>> tserver? Assuming you haven't altered tserv.port.client in
>> accumulo-site.xml, we want the line for port 9997.
>>
>> From my laptop running a tserver on localhost:
>>
>> $ netstat -na | grep 9997
>> tcp4       0      0  127.0.0.1.9997         *.*                    LISTEN
>>
>> Depending on the tool you use, you can grep out the pid of the tserver
>> or just that port itself.
>>
>> Just so you know, ZK binds to all available interfaces when it starts,
>> so it should work seamlessly with localhost or the FQDN for the host.
>> As such, it shouldn't matter what you provide to the
>> ZooKeeperInstance. That should connect in all cases for you, it's when
>> you make a Connector, and your client will talk to a tabletserver to
>> authenticate, that your program should hang. It would be good to
>> verify that.
>>
>> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <th...@gmail.com>
>> wrote:
>> > All,
>> >
>> > Thanks for the responses.
>> >
>> > Is this a problem for Accumulo?
>> > Reverse DNS is yielding my ISP's host name. You know the drill, my IP in
>> > reverse followed by their domain name, as opposed to my FQDN, which what
>> > I
>> > use in my config files.
>> >
>> > Running Accumulo 1.5.1
>> > I have only one interface.
>> > I have the FQDN in both master and slaves files for both Hadoop and
>> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers are
>> > referenced.
>> > Also, I am passing in all Zk FQDN when I instantiate ZookeeperInstance.
>> > Forward DNS works
>> > Reverse DNS... well (See above).
>> >
>> >
>> >
>> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org> wrote:
>> >>
>> >> Accumulo tservers typically listen on a single interface. If you have a
>> >> server with multiple interfaces (e.g. loopback and eth0), you might
>> >> have a
>> >> problem in which the tablet servers are not listening on externally
>> >> reachable interfaces. Tablet servers will list the interfaces that they
>> >> are
>> >> listening to when they boot, and you can also use tools like lsof to
>> >> find
>> >> them.
>> >>
>> >> If that is indeed the problem, then you might just need to change you
>> >> conf/slaves file to use <hostname> instead of localhost, and then
>> >> restart.
>> >>
>> >> Adam
>> >>
>> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
>> >> wrote:
>> >>>
>> >>>
>> >>> I have been happily working with Acc, but today things changed.  No
>> >>> errors
>> >>>
>> >>> Until now I ran everything server side, which meant the URL was
>> >>> localhost:2181, and life was good.  Today tried running some of the
>> >>> same
>> >>> code as a remote client, which means <host name>:2181.  Things hang
>> >>> when
>> >>> BatchWriter tries to commit anything and Scan hangs when it tries to
>> >>> iterate
>> >>> through a Map.
>> >>>
>> >>> Let's focus on the scan part:
>> >>>
>> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then
>> >>> hangs.
>> >>> for(Entry<Key,Value> entry : scan) {
>> >>> def row = entry.getKey().getRow();
>> >>> def value = entry.getValue();
>> >>> println "value=" + value;
>> >>> }
>> >>>
>> >>> This is what appears in the console :
>> >>>
>> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>> >>> response for sessionid: 0x148c6f03388005e after 21ms
>> >>>
>> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>> >>> response for sessionid: 0x148c6f03388005e after 21ms
>> >>>
>> >>> <and on and on>
>> >>>
>> >>>
>> >>>
>> >>> The only difference between success and a hang is a URL change, and of
>> >>> course being remote.
>> >>>
>> >>> I don't believe this is a firewall issue.  I shutdown the firewall.
>> >>>
>> >>> Am I missing something?
>> >>>
>> >>> Thanks all.
>> >>>
>> >>> --
>> >>> There are ways and there are ways,
>> >>>
>> >>> Geoffry Roberts
>> >
>> >
>> >
>> >
>> > --
>> > There are ways and there are ways,
>> >
>> > Geoffry Roberts
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts

Re: Remotely Accumulo

Posted by Geoffry Roberts <th...@gmail.com>.
"...it's when
you make a Connector, and your client will talk to a tabletserver to
authenticate, that your program should hang. It would be good to
verify that."


My program should hang?  Would you expand?  That is exactly what it is
doing.  I am able to get a connector.  But when I try to iterate the result
of a scan, that's when it hangs.




Here's what comes from netstat:


$ netstat -na | grep 9997

tcp        0      0 0.0.0.0:9997                0.0.0.0:*
LISTEN

tcp        0      0 204.9.140.36:35679          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53146          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33896          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53282          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53188          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:35609          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33901          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:35588          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33877          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33946          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53167          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33949          204.9.140.38:9997
ESTABLISHED

tcp        0      0 204.9.140.36:35546          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33852          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53125          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33922          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33747          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33961          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33793          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:35768          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33917          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33814          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:35567          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33444          204.9.140.38:9997
FIN_WAIT2

tcp        0      0 204.9.140.36:35701          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33969          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53258          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33831          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53210          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53104          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33789          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33856          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53237          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33835          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:35651          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33938          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33041          204.9.140.36:9997
ESTABLISHED

tcp        0      0 204.9.140.36:53285          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:53305          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33768          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:35630          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33754          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:35745          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:35724          204.9.140.36:9997
TIME_WAIT

tcp        0      0 204.9.140.36:9997           204.9.140.36:33041
ESTABLISHED

tcp        0      0 204.9.140.36:53083          204.9.140.37:9997
TIME_WAIT

tcp        0      0 204.9.140.36:50623          204.9.140.37:9997
ESTABLISHED

tcp        0      0 204.9.140.36:33772          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33732          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33874          204.9.140.38:9997
TIME_WAIT

tcp        0      0 204.9.140.36:33810          204.9.140.38:9997
TIME_WAIT

On Tue, Oct 7, 2014 at 11:34 AM, Josh Elser <jo...@gmail.com> wrote:

> Can you provide the output from netstat, lsof or /proc/$pid/fd for the
> tserver? Assuming you haven't altered tserv.port.client in
> accumulo-site.xml, we want the line for port 9997.
>
> From my laptop running a tserver on localhost:
>
> $ netstat -na | grep 9997
> tcp4       0      0  127.0.0.1.9997         *.*                    LISTEN
>
> Depending on the tool you use, you can grep out the pid of the tserver
> or just that port itself.
>
> Just so you know, ZK binds to all available interfaces when it starts,
> so it should work seamlessly with localhost or the FQDN for the host.
> As such, it shouldn't matter what you provide to the
> ZooKeeperInstance. That should connect in all cases for you, it's when
> you make a Connector, and your client will talk to a tabletserver to
> authenticate, that your program should hang. It would be good to
> verify that.
>
> On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <th...@gmail.com>
> wrote:
> > All,
> >
> > Thanks for the responses.
> >
> > Is this a problem for Accumulo?
> > Reverse DNS is yielding my ISP's host name. You know the drill, my IP in
> > reverse followed by their domain name, as opposed to my FQDN, which what
> I
> > use in my config files.
> >
> > Running Accumulo 1.5.1
> > I have only one interface.
> > I have the FQDN in both master and slaves files for both Hadoop and
> > Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers are
> > referenced.
> > Also, I am passing in all Zk FQDN when I instantiate ZookeeperInstance.
> > Forward DNS works
> > Reverse DNS... well (See above).
> >
> >
> >
> > On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org> wrote:
> >>
> >> Accumulo tservers typically listen on a single interface. If you have a
> >> server with multiple interfaces (e.g. loopback and eth0), you might
> have a
> >> problem in which the tablet servers are not listening on externally
> >> reachable interfaces. Tablet servers will list the interfaces that they
> are
> >> listening to when they boot, and you can also use tools like lsof to
> find
> >> them.
> >>
> >> If that is indeed the problem, then you might just need to change you
> >> conf/slaves file to use <hostname> instead of localhost, and then
> restart.
> >>
> >> Adam
> >>
> >> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com>
> wrote:
> >>>
> >>>
> >>> I have been happily working with Acc, but today things changed.  No
> >>> errors
> >>>
> >>> Until now I ran everything server side, which meant the URL was
> >>> localhost:2181, and life was good.  Today tried running some of the
> same
> >>> code as a remote client, which means <host name>:2181.  Things hang
> when
> >>> BatchWriter tries to commit anything and Scan hangs when it tries to
> iterate
> >>> through a Map.
> >>>
> >>> Let's focus on the scan part:
> >>>
> >>> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
> >>> for(Entry<Key,Value> entry : scan) {
> >>> def row = entry.getKey().getRow();
> >>> def value = entry.getValue();
> >>> println "value=" + value;
> >>> }
> >>>
> >>> This is what appears in the console :
> >>>
> >>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >>>
> >>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> >>> response for sessionid: 0x148c6f03388005e after 21ms
> >>>
> >>> <and on and on>
> >>>
> >>>
> >>>
> >>> The only difference between success and a hang is a URL change, and of
> >>> course being remote.
> >>>
> >>> I don't believe this is a firewall issue.  I shutdown the firewall.
> >>>
> >>> Am I missing something?
> >>>
> >>> Thanks all.
> >>>
> >>> --
> >>> There are ways and there are ways,
> >>>
> >>> Geoffry Roberts
> >
> >
> >
> >
> > --
> > There are ways and there are ways,
> >
> > Geoffry Roberts
>



-- 
There are ways and there are ways,

Geoffry Roberts

Re: Remotely Accumulo

Posted by Josh Elser <jo...@gmail.com>.
Can you provide the output from netstat, lsof or /proc/$pid/fd for the
tserver? Assuming you haven't altered tserv.port.client in
accumulo-site.xml, we want the line for port 9997.

>From my laptop running a tserver on localhost:

$ netstat -na | grep 9997
tcp4       0      0  127.0.0.1.9997         *.*                    LISTEN

Depending on the tool you use, you can grep out the pid of the tserver
or just that port itself.

Just so you know, ZK binds to all available interfaces when it starts,
so it should work seamlessly with localhost or the FQDN for the host.
As such, it shouldn't matter what you provide to the
ZooKeeperInstance. That should connect in all cases for you, it's when
you make a Connector, and your client will talk to a tabletserver to
authenticate, that your program should hang. It would be good to
verify that.

On Tue, Oct 7, 2014 at 11:23 AM, Geoffry Roberts <th...@gmail.com> wrote:
> All,
>
> Thanks for the responses.
>
> Is this a problem for Accumulo?
> Reverse DNS is yielding my ISP's host name. You know the drill, my IP in
> reverse followed by their domain name, as opposed to my FQDN, which what I
> use in my config files.
>
> Running Accumulo 1.5.1
> I have only one interface.
> I have the FQDN in both master and slaves files for both Hadoop and
> Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers are
> referenced.
> Also, I am passing in all Zk FQDN when I instantiate ZookeeperInstance.
> Forward DNS works
> Reverse DNS... well (See above).
>
>
>
> On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org> wrote:
>>
>> Accumulo tservers typically listen on a single interface. If you have a
>> server with multiple interfaces (e.g. loopback and eth0), you might have a
>> problem in which the tablet servers are not listening on externally
>> reachable interfaces. Tablet servers will list the interfaces that they are
>> listening to when they boot, and you can also use tools like lsof to find
>> them.
>>
>> If that is indeed the problem, then you might just need to change you
>> conf/slaves file to use <hostname> instead of localhost, and then restart.
>>
>> Adam
>>
>> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com> wrote:
>>>
>>>
>>> I have been happily working with Acc, but today things changed.  No
>>> errors
>>>
>>> Until now I ran everything server side, which meant the URL was
>>> localhost:2181, and life was good.  Today tried running some of the same
>>> code as a remote client, which means <host name>:2181.  Things hang when
>>> BatchWriter tries to commit anything and Scan hangs when it tries to iterate
>>> through a Map.
>>>
>>> Let's focus on the scan part:
>>>
>>> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
>>> for(Entry<Key,Value> entry : scan) {
>>> def row = entry.getKey().getRow();
>>> def value = entry.getValue();
>>> println "value=" + value;
>>> }
>>>
>>> This is what appears in the console :
>>>
>>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>>> response for sessionid: 0x148c6f03388005e after 21ms
>>>
>>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>>> response for sessionid: 0x148c6f03388005e after 21ms
>>>
>>> <and on and on>
>>>
>>>
>>>
>>> The only difference between success and a hang is a URL change, and of
>>> course being remote.
>>>
>>> I don't believe this is a firewall issue.  I shutdown the firewall.
>>>
>>> Am I missing something?
>>>
>>> Thanks all.
>>>
>>> --
>>> There are ways and there are ways,
>>>
>>> Geoffry Roberts
>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts

Re: Remotely Accumulo

Posted by Geoffry Roberts <th...@gmail.com>.
All,

Thanks for the responses.

Is this a problem for Accumulo?
Reverse DNS is yielding my ISP's host name. You know the drill, my IP in
reverse followed by their domain name, as opposed to my FQDN, which what I
use in my config files.


   - Running Accumulo 1.5.1
   - I have only one interface.
   - I have the FQDN in both master and slaves files for both Hadoop and
   Accumulo; in zoo.cfg; and in accumulo-site.xml where the Zookeepers are
   referenced.
   - Also, I am passing in all Zk FQDN when I instantiate ZookeeperInstance.
   - Forward DNS works
   - Reverse DNS... well (See above).



On Mon, Oct 6, 2014 at 10:26 PM, Adam Fuchs <af...@apache.org> wrote:

> Accumulo tservers typically listen on a single interface. If you have a
> server with multiple interfaces (e.g. loopback and eth0), you might have a
> problem in which the tablet servers are not listening on externally
> reachable interfaces. Tablet servers will list the interfaces that they are
> listening to when they boot, and you can also use tools like lsof to find
> them.
>
> If that is indeed the problem, then you might just need to change you
> conf/slaves file to use <hostname> instead of localhost, and then restart.
>
> Adam
> On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com> wrote:
>
>>
>> I have been happily working with Acc, but today things changed.  No errors
>>
>> Until now I ran everything server side, which meant the URL was
>> localhost:2181, and life was good.  Today tried running some of the same
>> code as a remote client, which means <host name>:2181.  Things hang when
>> BatchWriter tries to commit anything and Scan hangs when it tries to
>> iterate through a Map.
>>
>> Let's focus on the scan part:
>>
>> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
>> for(Entry<Key,Value> entry : scan) {
>> def row = entry.getKey().getRow();
>> def value = entry.getValue();
>> println "value=" + value;
>> }
>>
>> This is what appears in the console :
>>
>> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>> response for sessionid: 0x148c6f03388005e after 21ms
>>
>> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
>> response for sessionid: 0x148c6f03388005e after 21ms
>>
>> <and on and on>
>>
>>
>> The only difference between success and a hang is a URL change, and of
>> course being remote.
>>
>> I don't believe this is a firewall issue.  I shutdown the firewall.
>>
>> Am I missing something?
>>
>> Thanks all.
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Remotely Accumulo

Posted by Adam Fuchs <af...@apache.org>.
Accumulo tservers typically listen on a single interface. If you have a
server with multiple interfaces (e.g. loopback and eth0), you might have a
problem in which the tablet servers are not listening on externally
reachable interfaces. Tablet servers will list the interfaces that they are
listening to when they boot, and you can also use tools like lsof to find
them.

If that is indeed the problem, then you might just need to change you
conf/slaves file to use <hostname> instead of localhost, and then restart.

Adam
On Oct 6, 2014 4:27 PM, "Geoffry Roberts" <th...@gmail.com> wrote:

>
> I have been happily working with Acc, but today things changed.  No errors
>
> Until now I ran everything server side, which meant the URL was
> localhost:2181, and life was good.  Today tried running some of the same
> code as a remote client, which means <host name>:2181.  Things hang when
> BatchWriter tries to commit anything and Scan hangs when it tries to
> iterate through a Map.
>
> Let's focus on the scan part:
>
> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
> for(Entry<Key,Value> entry : scan) {
> def row = entry.getKey().getRow();
> def value = entry.getValue();
> println "value=" + value;
> }
>
> This is what appears in the console :
>
> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> <and on and on>
>
>
> The only difference between success and a hang is a URL change, and of
> course being remote.
>
> I don't believe this is a firewall issue.  I shutdown the firewall.
>
> Am I missing something?
>
> Thanks all.
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Remotely Accumulo

Posted by Keith Turner <ke...@deenlo.com>.
 If you add the following Log4j code before scanning, maybe the trace
messages from Accumulo client code will shed some light on whats happening.

Logger.getLogger("org.apache.accumulo.core.client").setLevel(Level.TRACE);

On Mon, Oct 6, 2014 at 5:26 PM, Geoffry Roberts <th...@gmail.com>
wrote:

>
> I have been happily working with Acc, but today things changed.  No errors
>
> Until now I ran everything server side, which meant the URL was
> localhost:2181, and life was good.  Today tried running some of the same
> code as a remote client, which means <host name>:2181.  Things hang when
> BatchWriter tries to commit anything and Scan hangs when it tries to
> iterate through a Map.
>
> Let's focus on the scan part:
>
> scan.fetchColumnFamily(new Text("colfY")); // This executes then hangs.
> for(Entry<Key,Value> entry : scan) {
> def row = entry.getKey().getRow();
> def value = entry.getValue();
> println "value=" + value;
> }
>
> This is what appears in the console :
>
> 17:22:39.802 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> 17:22:49.803 C{0} M DEBUG org.apache.zookeeper.ClientCnxn - Got ping
> response for sessionid: 0x148c6f03388005e after 21ms
>
> <and on and on>
>
>
> The only difference between success and a hang is a URL change, and of
> course being remote.
>
> I don't believe this is a firewall issue.  I shutdown the firewall.
>
> Am I missing something?
>
> Thanks all.
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>