You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Niels Basjes <Ni...@basjes.nl> on 2017/10/20 11:32:38 UTC

HBase config settings go missing within Yarn.

Hi,

Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn
cluster where I need to connect to HBase.

What I have:

In my environment:
HADOOP_CONF_DIR=/etc/hadoop/conf/
HBASE_CONF_DIR=/etc/hbase/conf/
HIVE_CONF_DIR=/etc/hive/conf/
YARN_CONF_DIR=/etc/hadoop/conf/

In /etc/hbase/conf/hbase-site.xml I have correctly defined the zookeeper
hosts for HBase.

My test code is this:

public class Main {
  private static final Logger LOG = LoggerFactory.getLogger(Main.class);

  public static void main(String[] args) throws Exception {
    printZookeeperConfig();
    final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
    env.createInput(new HBaseSource()).print();
    env.execute("HBase config problem");
  }

  public static void printZookeeperConfig() {
    String zookeeper =
HBaseConfiguration.create().get("hbase.zookeeper.quorum");
    LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
  }

  public static class HBaseSource extends AbstractTableInputFormat<String> {
    @Override
    public void configure(org.apache.flink.configuration.Configuration
parameters) {
      table = createTable();
      if (table != null) {
        scan = getScanner();
      }
    }

    private HTable createTable() {
      LOG.info("Initializing HBaseConfiguration");
      // Uses files found in the classpath
      org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
      printZookeeperConfig();

      try {
        return new HTable(hConf, getTableName());
      } catch (Exception e) {
        LOG.error("Error instantiating a new HTable instance", e);
      }
      return null;
    }

    @Override
    public String getTableName() {
      return "bugs:flink";
    }

    @Override
    protected String mapResultToOutType(Result result) {
      return new
String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
    }

    @Override
    protected Scan getScanner() {
      return new Scan();
    }
  }

}


I run this application with this command on my Yarn cluster (note: first
starting a yarn-cluster and then submitting the job yields the same result).

flink \
    run \
    -m yarn-cluster \
    --yarncontainer 1 \
    --yarnname "Flink on Yarn HBase problem" \
    --yarnslots                     1     \
    --yarnjobManagerMemory          4000  \
    --yarntaskManagerMemory         4000  \
    --yarnstreaming                       \
    target/flink-hbase-connect-1.0-SNAPSHOT.jar

Now in the client side logfile
/usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see

1) Classpath actually contains /etc/hbase/conf/ both near the start
and at the end.

2) The zookeeper settings of my experimental environent have been
picked up by the software

2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main
                     - ----> Loading HBaseConfiguration: Zookeeper =
node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181


When I open the logfiles on the Hadoop cluster I see this:

2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main
                     - ----> Loading HBaseConfiguration: Zookeeper =
*localhost*


and as a consequence

2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn
                     - Opening socket connection to server
localhost.localdomain/127.0.0.1:2181

2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn
                     - Session 0x0 for server null, unexpected error,
closing socket connection and attempting reconnect

java.net.ConnectException: Connection refused

	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

2017-10-20 13:17:33,475 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        -
Possibly transient ZooKeeper, quorum=localhost:2181,
exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/hbaseid



The value 'localhost:2181' has been defined within the HBase jar in the
hbase-default.xml as the default value for the zookeeper nodes.

As a workaround I currently put this extra line in my code which I know is
nasty but "works on my cluster"

hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));


What am I doing wrong?

What is the right way to fix this?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: HBase config settings go missing within Yarn.

Posted by Niels Basjes <Ni...@basjes.nl>.
I have an idea how we can reduce the impact this class of problem.
If we can detect that we are running in a distributed environment then in
order to use HBase you MUST have an hbase-site.xml

I'll see if I can make a proof of concept.

Niels

On Wed, Oct 25, 2017 at 11:27 AM, Till Rohrmann <tr...@apache.org>
wrote:

> Hi Niels,
>
> good to see that you solved your problem.
>
> I’m not entirely sure how Pig does it, but I assume that there must be
> some kind of HBase support where the HBase specific files are explicitly
> send to the cluster or that it copies the environment variables. For Flink
> supporting this kind of behaviour is not really feasible because there are
> simply too many potential projects to support out there.
>
> The Flink idiomatic way would be either to read the config on the client,
> put it in the closure of the operator and then send it in serialized form
> to the cluster. Or you set the correct environment variables to start your
> Flink job cluster with by using env.java.opts or extending the class path
> information as you did.
>
> The following code shows the closure approach.
>
> public class Main {
>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>
>   public static void main(String[] args) throws Exception {
>     printZookeeperConfig();
>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>     env.createInput(new HBaseSource(HBaseConfiguration.create())).print();
>     env.execute("HBase config problem");
>   }
>
>   public static void printZookeeperConfig() {
>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>   }
>
>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>
>     // HBase configuration read on the client
>     private final org.apache.hadoop.conf.Configuration hConf;
>
>     public HBaseSource(org.apache.hadoop.conf.Configuration hConf) {
>       this.hConf = Preconditions.checkNotNull(hConf);
>     }
>
>     @Override
>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>       table = createTable();
>       if (table != null) {
>         scan = getScanner();
>       }
>     }
>
>     private HTable createTable() {
>       printZookeeperConfig();
>
>       try {
>         return new HTable(hConf, getTableName());
>       } catch (Exception e) {
>         LOG.error("Error instantiating a new HTable instance", e);
>       }
>       return null;
>     }
>
>     @Override
>     public String getTableName() {
>       return "bugs:flink";
>     }
>
>     @Override
>     protected String mapResultToOutType(Result result) {
>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>     }
>
>     @Override
>     protected Scan getScanner() {
>       return new Scan();
>     }
>   }
> }
>
> Cheers,
> Till
> ​
>
> On Tue, Oct 24, 2017 at 11:51 AM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> I changed my cluster config (on all nodes) to include the HBase config
>> dir in the classpath.
>> Now everything works as expected.
>>
>> This may very well be a misconfiguration of my cluster.
>> How ever ...
>> My current assesment:
>> Tools like Pig use the HBase config which has been specified on the LOCAL
>> machine. This allows running on a cluster and the HBase is not locally
>> defined.
>> Apparently Flink currently uses the HBase config which has been specified
>> on the REMOTE machine. This limits jobs to ONLY have the HBase that is
>> defined on the cluster.
>>
>> At this point I'm unsure which is the right approach.
>>
>> Niels Basjes
>>
>> On Tue, Oct 24, 2017 at 11:29 AM, Niels Basjes <Ni...@basjes.nl> wrote:
>>
>>> Minor correction: The HBase jar files are on the classpath, just in a
>>> different order.
>>>
>>> On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes <Ni...@basjes.nl> wrote:
>>>
>>>> I did some more digging.
>>>>
>>>> I added extra code to print both the environment variables and the
>>>> classpath that is used by the HBaseConfiguration to load the resource files.
>>>> I call this both locally and during startup of the job (i.e. these logs
>>>> arrive in the jobmanager.log on the cluster)
>>>>
>>>> Summary of that I found locally:
>>>>
>>>> Environment
>>>> 2017-10-24 08:50:15,612 INFO  com.bol.bugreports.Main
>>>>                      - HADOOP_CONF_DIR = /etc/hadoop/conf/
>>>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>>>                      - HBASE_CONF_DIR = /etc/hbase/conf/
>>>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>>>                      - FLINK_CONF_DIR = /usr/local/flink-1.3.2/conf
>>>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>>>                      - HIVE_CONF_DIR = /etc/hive/conf/
>>>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>>>                      - YARN_CONF_DIR = /etc/hadoop/conf/
>>>>
>>>> ClassPath
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - --> HBaseConfiguration: URLClassLoader =
>>>> sun.misc.Launcher$AppClassLoader@1b6d3586
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>>> b/flink-python_2.11-1.3.2.jar
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>>> b/flink-shaded-hadoop2-uber-1.3.2.jar
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>>> b/joda-time-2.9.1.jar
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>>> b/log4j-1.2.17.jar
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>>> b/slf4j-log4j12-1.7.7.jar
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>>> b/flink-dist_2.11-1.3.2.jar
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/home/nbasjes/FlinkHBaseC
>>>> onnect/
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/etc/hadoop/conf/
>>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/etc/hadoop/conf/
>>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/etc/hbase/conf/
>>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/lib/jvm/java-1.8.0-o
>>>> penjdk-1.8.0.141-2.b16.el6_9.x86_64/lib/tools.jar
>>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>>>> se/
>>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>>>> se/lib/activation-1.1.jar
>>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>>>> se/lib/aopalliance-1.0.jar
>>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>>>> se/lib/apacheds-i18n-2.0.0-M15.jar
>>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>>                      - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>>>> se/lib/apacheds-kerberos-codec-2.0.0-M
>>>> ...
>>>>
>>>>
>>>>
>>>> On the cluster node in the jobmanager.log:
>>>>
>>>> ENVIRONMENT
>>>> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                                       - HADOOP_CONF_DIR = /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
>>>> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                                       - TEZ_CONF_DIR = /etc/tez/conf
>>>> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - YARN_CONF_DIR = /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
>>>> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - LOG_DIRS = /var/log/hadoop-yarn/containers/application_1503304315746_0062/container_1503304315746_0062_01_000001
>>>> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - HADOOP_YARN_HOME = /usr/hdp/2.3.4.0-3485/hadoop-yarn
>>>> 2017-10-24 10:50:19,974 INFO  com.bol.bugreports.Main                                       - HADOOP_HOME = /usr/hdp/2.3.4.0-3485/hadoop
>>>> 2017-10-24 10:50:19,975 INFO  com.bol.bugreports.Main                                       - HDP_VERSION = 2.3.4.0-3485
>>>>
>>>> And the classpath:
>>>>
>>>> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/15/flink-hbase-connect-1.0-SNAPSHOT.jar
>>>> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-dist_2.11-1.3.2.jar
>>>> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-python_2.11-1.3.2.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-shaded-hadoop2-uber-1.3.2.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/joda-time-2.9.1.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/log4j-1.2.17.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/slf4j-log4j12-1.7.7.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/14/log4j.properties
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/11/logback.xml
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/16/flink-dist_2.11-1.3.2.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/10/flink-conf.yaml
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/container_1503304315746_0062_01_000001/
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/etc/hadoop/conf/
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-nfs-2.7.1.2.3.4.0-3485.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-annotations-2.7.1.2.3.4.0-3485.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-auth-2.7.1.2.3.4.0-3485.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-azure-2.7.1.2.3.4.0-3485.jar
>>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-aws-2.7.1.2.3.4.0-3485.jar
>>>>
>>>>
>>>> So apparently everything HBase that was specified clientside is missing
>>>> when the task is running on my cluster.
>>>>
>>>> The thing is that when running for example a Pig script I get
>>>> everything perfectly fine on this cluster as it is configured right now.
>>>> Also the config 'shouldn't' (I think) need anything different because
>>>> this application only needs the HBase client (Jar, packaged into
>>>> application) and the HBase zookeeper settings (present on the machine where
>>>> it is started).
>>>>
>>>> Niels Basjes
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 23, 2017 at 10:23 AM, Piotr Nowojski <
>>>> piotr@data-artisans.com> wrote:
>>>>
>>>>> Till do you have some idea what is going on? I do not see any
>>>>> meaningful difference between Niels code and HBaseWriteStreamExample.java.
>>>>> There is also a very similar issue on mailing list as well: “Flink can't
>>>>> read hdfs namenode logical url”
>>>>>
>>>>> Piotrek
>>>>>
>>>>> On 22 Oct 2017, at 12:56, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>>
>>>>> HI,
>>>>>
>>>>> Yes, on all nodes the the same /etc/hbase/conf/hbase-site.xml that
>>>>> contains the correct settings for hbase to find zookeeper.
>>>>> That is why adding that files as an additional resource to the
>>>>> configuration works.
>>>>> I have created a very simple project that reproduces the problem on my
>>>>> setup:
>>>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>>
>>>>> On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <
>>>>> piotr@data-artisans.com> wrote:
>>>>>
>>>>>> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the
>>>>>> machines? If yes, could you share your code?
>>>>>>
>>>>>> On 20 Oct 2017, at 16:29, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>>>
>>>>>> I look at the logfiles from the Hadoop Yarn webinterface. I.e.
>>>>>> actually looking in the jobmanager.log of the container running the Flink
>>>>>> task.
>>>>>> That is where I was able to find these messages .
>>>>>>
>>>>>> I do the
>>>>>>  hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>>>>> se-site.xml"));
>>>>>> in all places directly after the  HBaseConfiguration.create();
>>>>>> That way I simply force the task to look on the actual Hadoop node
>>>>>> for the same file it already loaded locally.
>>>>>>
>>>>>> The reason I'm suspecting Flink is because the clientside part of the
>>>>>> Flink application does have the right setting and the task/job actually
>>>>>> running in the cluster does not have the same settings.
>>>>>> So it seems in the transition into the cluster the application does
>>>>>> not copy everything it has available locally for some reason.
>>>>>>
>>>>>> There is a very high probability I did something wrong, I'm just not
>>>>>> seeing it at this moment.
>>>>>>
>>>>>> Niels
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <
>>>>>> piotr@data-artisans.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> What do you mean by saying:
>>>>>>>
>>>>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>>>>
>>>>>>>
>>>>>>> The error doesn’t come from Flink? Where do you execute
>>>>>>>
>>>>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>>>>>> se-site.xml"));
>>>>>>>
>>>>>>> ?
>>>>>>>
>>>>>>> To me it seems like it is a problem with misconfigured HBase and not
>>>>>>> something related to Flink.
>>>>>>>
>>>>>>> Piotrek
>>>>>>>
>>>>>>> On 20 Oct 2017, at 13:44, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>>>>
>>>>>>> To facilitate you guys helping me I put this test project on github:
>>>>>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>>>>>>
>>>>>>> Niels Basjes
>>>>>>>
>>>>>>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Ni...@basjes.nl>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop
>>>>>>>> yarn cluster where I need to connect to HBase.
>>>>>>>>
>>>>>>>> What I have:
>>>>>>>>
>>>>>>>> In my environment:
>>>>>>>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>>>>>>>> HBASE_CONF_DIR=/etc/hbase/conf/
>>>>>>>> HIVE_CONF_DIR=/etc/hive/conf/
>>>>>>>> YARN_CONF_DIR=/etc/hadoop/conf/
>>>>>>>>
>>>>>>>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the
>>>>>>>> zookeeper hosts for HBase.
>>>>>>>>
>>>>>>>> My test code is this:
>>>>>>>>
>>>>>>>> public class Main {
>>>>>>>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>>>>>>>>
>>>>>>>>   public static void main(String[] args) throws Exception {
>>>>>>>>     printZookeeperConfig();
>>>>>>>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>>>>>>>     env.createInput(new HBaseSource()).print();
>>>>>>>>     env.execute("HBase config problem");
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   public static void printZookeeperConfig() {
>>>>>>>>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>>>>>>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>>>>>>>>     @Override
>>>>>>>>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>>>>>>>>       table = createTable();
>>>>>>>>       if (table != null) {
>>>>>>>>         scan = getScanner();
>>>>>>>>       }
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     private HTable createTable() {
>>>>>>>>       LOG.info("Initializing HBaseConfiguration");
>>>>>>>>       // Uses files found in the classpath
>>>>>>>>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>>>>>>>>       printZookeeperConfig();
>>>>>>>>
>>>>>>>>       try {
>>>>>>>>         return new HTable(hConf, getTableName());
>>>>>>>>       } catch (Exception e) {
>>>>>>>>         LOG.error("Error instantiating a new HTable instance", e);
>>>>>>>>       }
>>>>>>>>       return null;
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     @Override
>>>>>>>>     public String getTableName() {
>>>>>>>>       return "bugs:flink";
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     @Override
>>>>>>>>     protected String mapResultToOutType(Result result) {
>>>>>>>>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     @Override
>>>>>>>>     protected Scan getScanner() {
>>>>>>>>       return new Scan();
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> I run this application with this command on my Yarn cluster (note:
>>>>>>>> first starting a yarn-cluster and then submitting the job yields the same
>>>>>>>> result).
>>>>>>>>
>>>>>>>> flink \
>>>>>>>>     run \
>>>>>>>>     -m yarn-cluster \
>>>>>>>>     --yarncontainer 1 \
>>>>>>>>     --yarnname "Flink on Yarn HBase problem" \
>>>>>>>>     --yarnslots                     1     \
>>>>>>>>     --yarnjobManagerMemory          4000  \
>>>>>>>>     --yarntaskManagerMemory         4000  \
>>>>>>>>     --yarnstreaming                       \
>>>>>>>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>>>>>>>>
>>>>>>>> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see
>>>>>>>>
>>>>>>>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>>>>>>>>
>>>>>>>> 2) The zookeeper settings of my experimental environent have been picked up by the software
>>>>>>>>
>>>>>>>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181
>>>>>>>>
>>>>>>>>
>>>>>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>>>>>
>>>>>>>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = *localhost*
>>>>>>>>
>>>>>>>>
>>>>>>>> and as a consequence
>>>>>>>>
>>>>>>>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181
>>>>>>>>
>>>>>>>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>>>>>>>>
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>
>>>>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>
>>>>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>>>>>>>
>>>>>>>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>>
>>>>>>>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>>>>>>>
>>>>>>>> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The value 'localhost:2181' has been defined within the HBase jar in
>>>>>>>> the hbase-default.xml as the default value for the zookeeper nodes.
>>>>>>>>
>>>>>>>> As a workaround I currently put this extra line in my code which I
>>>>>>>> know is nasty but "works on my cluster"
>>>>>>>>
>>>>>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>>>>>>>
>>>>>>>>
>>>>>>>> What am I doing wrong?
>>>>>>>>
>>>>>>>> What is the right way to fix this?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>>
>>>>>>>> Niels Basjes
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>
>>>>>>> Niels Basjes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>
>>>>>> Niels Basjes
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards / Met vriendelijke groeten,
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: HBase config settings go missing within Yarn.

Posted by Till Rohrmann <tr...@apache.org>.
Hi Niels,

good to see that you solved your problem.

I’m not entirely sure how Pig does it, but I assume that there must be some
kind of HBase support where the HBase specific files are explicitly send to
the cluster or that it copies the environment variables. For Flink
supporting this kind of behaviour is not really feasible because there are
simply too many potential projects to support out there.

The Flink idiomatic way would be either to read the config on the client,
put it in the closure of the operator and then send it in serialized form
to the cluster. Or you set the correct environment variables to start your
Flink job cluster with by using env.java.opts or extending the class path
information as you did.

The following code shows the closure approach.

public class Main {
  private static final Logger LOG = LoggerFactory.getLogger(Main.class);

  public static void main(String[] args) throws Exception {
    printZookeeperConfig();
    final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
    env.createInput(new HBaseSource(HBaseConfiguration.create())).print();
    env.execute("HBase config problem");
  }

  public static void printZookeeperConfig() {
    String zookeeper =
HBaseConfiguration.create().get("hbase.zookeeper.quorum");
    LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
  }

  public static class HBaseSource extends AbstractTableInputFormat<String> {

    // HBase configuration read on the client
    private final org.apache.hadoop.conf.Configuration hConf;

    public HBaseSource(org.apache.hadoop.conf.Configuration hConf) {
      this.hConf = Preconditions.checkNotNull(hConf);
    }

    @Override
    public void configure(org.apache.flink.configuration.Configuration
parameters) {
      table = createTable();
      if (table != null) {
        scan = getScanner();
      }
    }

    private HTable createTable() {
      printZookeeperConfig();

      try {
        return new HTable(hConf, getTableName());
      } catch (Exception e) {
        LOG.error("Error instantiating a new HTable instance", e);
      }
      return null;
    }

    @Override
    public String getTableName() {
      return "bugs:flink";
    }

    @Override
    protected String mapResultToOutType(Result result) {
      return new
String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
    }

    @Override
    protected Scan getScanner() {
      return new Scan();
    }
  }
}

Cheers,
Till
​

On Tue, Oct 24, 2017 at 11:51 AM, Niels Basjes <Ni...@basjes.nl> wrote:

> I changed my cluster config (on all nodes) to include the HBase config dir
> in the classpath.
> Now everything works as expected.
>
> This may very well be a misconfiguration of my cluster.
> How ever ...
> My current assesment:
> Tools like Pig use the HBase config which has been specified on the LOCAL
> machine. This allows running on a cluster and the HBase is not locally
> defined.
> Apparently Flink currently uses the HBase config which has been specified
> on the REMOTE machine. This limits jobs to ONLY have the HBase that is
> defined on the cluster.
>
> At this point I'm unsure which is the right approach.
>
> Niels Basjes
>
> On Tue, Oct 24, 2017 at 11:29 AM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> Minor correction: The HBase jar files are on the classpath, just in a
>> different order.
>>
>> On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes <Ni...@basjes.nl> wrote:
>>
>>> I did some more digging.
>>>
>>> I added extra code to print both the environment variables and the
>>> classpath that is used by the HBaseConfiguration to load the resource files.
>>> I call this both locally and during startup of the job (i.e. these logs
>>> arrive in the jobmanager.log on the cluster)
>>>
>>> Summary of that I found locally:
>>>
>>> Environment
>>> 2017-10-24 08:50:15,612 INFO  com.bol.bugreports.Main
>>>                    - HADOOP_CONF_DIR = /etc/hadoop/conf/
>>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>>                    - HBASE_CONF_DIR = /etc/hbase/conf/
>>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>>                    - FLINK_CONF_DIR = /usr/local/flink-1.3.2/conf
>>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>>                    - HIVE_CONF_DIR = /etc/hive/conf/
>>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>>                    - YARN_CONF_DIR = /etc/hadoop/conf/
>>>
>>> ClassPath
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - --> HBaseConfiguration: URLClassLoader =
>>> sun.misc.Launcher$AppClassLoader@1b6d3586
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>> b/flink-python_2.11-1.3.2.jar
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>> b/flink-shaded-hadoop2-uber-1.3.2.jar
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>> b/joda-time-2.9.1.jar
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>> b/log4j-1.2.17.jar
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>> b/slf4j-log4j12-1.7.7.jar
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>>> b/flink-dist_2.11-1.3.2.jar
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/home/nbasjes/FlinkHBaseC
>>> onnect/
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/etc/hadoop/conf/
>>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/etc/hadoop/conf/
>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/etc/hbase/conf/
>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/lib/jvm/java-1.8.0-o
>>> penjdk-1.8.0.141-2.b16.el6_9.x86_64/lib/tools.jar
>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hbase/
>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>>> se/lib/activation-1.1.jar
>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>>> se/lib/aopalliance-1.0.jar
>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>>> se/lib/apacheds-i18n-2.0.0-M15.jar
>>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>>> se/lib/apacheds-kerberos-codec-2.0.0-M
>>> ...
>>>
>>>
>>>
>>> On the cluster node in the jobmanager.log:
>>>
>>> ENVIRONMENT
>>> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                                       - HADOOP_CONF_DIR = /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
>>> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                                       - TEZ_CONF_DIR = /etc/tez/conf
>>> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - YARN_CONF_DIR = /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
>>> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - LOG_DIRS = /var/log/hadoop-yarn/containers/application_1503304315746_0062/container_1503304315746_0062_01_000001
>>> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - HADOOP_YARN_HOME = /usr/hdp/2.3.4.0-3485/hadoop-yarn
>>> 2017-10-24 10:50:19,974 INFO  com.bol.bugreports.Main                                       - HADOOP_HOME = /usr/hdp/2.3.4.0-3485/hadoop
>>> 2017-10-24 10:50:19,975 INFO  com.bol.bugreports.Main                                       - HDP_VERSION = 2.3.4.0-3485
>>>
>>> And the classpath:
>>>
>>> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/15/flink-hbase-connect-1.0-SNAPSHOT.jar
>>> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-dist_2.11-1.3.2.jar
>>> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-python_2.11-1.3.2.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-shaded-hadoop2-uber-1.3.2.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/joda-time-2.9.1.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/log4j-1.2.17.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/slf4j-log4j12-1.7.7.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/14/log4j.properties
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/11/logback.xml
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/16/flink-dist_2.11-1.3.2.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/10/flink-conf.yaml
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/container_1503304315746_0062_01_000001/
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/etc/hadoop/conf/
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-nfs-2.7.1.2.3.4.0-3485.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-annotations-2.7.1.2.3.4.0-3485.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-auth-2.7.1.2.3.4.0-3485.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-azure-2.7.1.2.3.4.0-3485.jar
>>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-aws-2.7.1.2.3.4.0-3485.jar
>>>
>>>
>>> So apparently everything HBase that was specified clientside is missing
>>> when the task is running on my cluster.
>>>
>>> The thing is that when running for example a Pig script I get everything
>>> perfectly fine on this cluster as it is configured right now.
>>> Also the config 'shouldn't' (I think) need anything different because
>>> this application only needs the HBase client (Jar, packaged into
>>> application) and the HBase zookeeper settings (present on the machine where
>>> it is started).
>>>
>>> Niels Basjes
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Oct 23, 2017 at 10:23 AM, Piotr Nowojski <
>>> piotr@data-artisans.com> wrote:
>>>
>>>> Till do you have some idea what is going on? I do not see any
>>>> meaningful difference between Niels code and HBaseWriteStreamExample.java.
>>>> There is also a very similar issue on mailing list as well: “Flink can't
>>>> read hdfs namenode logical url”
>>>>
>>>> Piotrek
>>>>
>>>> On 22 Oct 2017, at 12:56, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>
>>>> HI,
>>>>
>>>> Yes, on all nodes the the same /etc/hbase/conf/hbase-site.xml that
>>>> contains the correct settings for hbase to find zookeeper.
>>>> That is why adding that files as an additional resource to the
>>>> configuration works.
>>>> I have created a very simple project that reproduces the problem on my
>>>> setup:
>>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>>>
>>>> Niels Basjes
>>>>
>>>>
>>>> On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <
>>>> piotr@data-artisans.com> wrote:
>>>>
>>>>> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the
>>>>> machines? If yes, could you share your code?
>>>>>
>>>>> On 20 Oct 2017, at 16:29, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>>
>>>>> I look at the logfiles from the Hadoop Yarn webinterface. I.e.
>>>>> actually looking in the jobmanager.log of the container running the Flink
>>>>> task.
>>>>> That is where I was able to find these messages .
>>>>>
>>>>> I do the
>>>>>  hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>>>> se-site.xml"));
>>>>> in all places directly after the  HBaseConfiguration.create();
>>>>> That way I simply force the task to look on the actual Hadoop node for
>>>>> the same file it already loaded locally.
>>>>>
>>>>> The reason I'm suspecting Flink is because the clientside part of the
>>>>> Flink application does have the right setting and the task/job actually
>>>>> running in the cluster does not have the same settings.
>>>>> So it seems in the transition into the cluster the application does
>>>>> not copy everything it has available locally for some reason.
>>>>>
>>>>> There is a very high probability I did something wrong, I'm just not
>>>>> seeing it at this moment.
>>>>>
>>>>> Niels
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <
>>>>> piotr@data-artisans.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> What do you mean by saying:
>>>>>>
>>>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>>>
>>>>>>
>>>>>> The error doesn’t come from Flink? Where do you execute
>>>>>>
>>>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>>>>> se-site.xml"));
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> To me it seems like it is a problem with misconfigured HBase and not
>>>>>> something related to Flink.
>>>>>>
>>>>>> Piotrek
>>>>>>
>>>>>> On 20 Oct 2017, at 13:44, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>>>
>>>>>> To facilitate you guys helping me I put this test project on github:
>>>>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>>>>>
>>>>>> Niels Basjes
>>>>>>
>>>>>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Ni...@basjes.nl>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop
>>>>>>> yarn cluster where I need to connect to HBase.
>>>>>>>
>>>>>>> What I have:
>>>>>>>
>>>>>>> In my environment:
>>>>>>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>>>>>>> HBASE_CONF_DIR=/etc/hbase/conf/
>>>>>>> HIVE_CONF_DIR=/etc/hive/conf/
>>>>>>> YARN_CONF_DIR=/etc/hadoop/conf/
>>>>>>>
>>>>>>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the
>>>>>>> zookeeper hosts for HBase.
>>>>>>>
>>>>>>> My test code is this:
>>>>>>>
>>>>>>> public class Main {
>>>>>>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>>>>>>>
>>>>>>>   public static void main(String[] args) throws Exception {
>>>>>>>     printZookeeperConfig();
>>>>>>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>>>>>>     env.createInput(new HBaseSource()).print();
>>>>>>>     env.execute("HBase config problem");
>>>>>>>   }
>>>>>>>
>>>>>>>   public static void printZookeeperConfig() {
>>>>>>>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>>>>>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>>>>>>>   }
>>>>>>>
>>>>>>>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>>>>>>>     @Override
>>>>>>>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>>>>>>>       table = createTable();
>>>>>>>       if (table != null) {
>>>>>>>         scan = getScanner();
>>>>>>>       }
>>>>>>>     }
>>>>>>>
>>>>>>>     private HTable createTable() {
>>>>>>>       LOG.info("Initializing HBaseConfiguration");
>>>>>>>       // Uses files found in the classpath
>>>>>>>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>>>>>>>       printZookeeperConfig();
>>>>>>>
>>>>>>>       try {
>>>>>>>         return new HTable(hConf, getTableName());
>>>>>>>       } catch (Exception e) {
>>>>>>>         LOG.error("Error instantiating a new HTable instance", e);
>>>>>>>       }
>>>>>>>       return null;
>>>>>>>     }
>>>>>>>
>>>>>>>     @Override
>>>>>>>     public String getTableName() {
>>>>>>>       return "bugs:flink";
>>>>>>>     }
>>>>>>>
>>>>>>>     @Override
>>>>>>>     protected String mapResultToOutType(Result result) {
>>>>>>>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>>>>>>     }
>>>>>>>
>>>>>>>     @Override
>>>>>>>     protected Scan getScanner() {
>>>>>>>       return new Scan();
>>>>>>>     }
>>>>>>>   }
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> I run this application with this command on my Yarn cluster (note:
>>>>>>> first starting a yarn-cluster and then submitting the job yields the same
>>>>>>> result).
>>>>>>>
>>>>>>> flink \
>>>>>>>     run \
>>>>>>>     -m yarn-cluster \
>>>>>>>     --yarncontainer 1 \
>>>>>>>     --yarnname "Flink on Yarn HBase problem" \
>>>>>>>     --yarnslots                     1     \
>>>>>>>     --yarnjobManagerMemory          4000  \
>>>>>>>     --yarntaskManagerMemory         4000  \
>>>>>>>     --yarnstreaming                       \
>>>>>>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>>>>>>>
>>>>>>> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see
>>>>>>>
>>>>>>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>>>>>>>
>>>>>>> 2) The zookeeper settings of my experimental environent have been picked up by the software
>>>>>>>
>>>>>>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181
>>>>>>>
>>>>>>>
>>>>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>>>>
>>>>>>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = *localhost*
>>>>>>>
>>>>>>>
>>>>>>> and as a consequence
>>>>>>>
>>>>>>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181
>>>>>>>
>>>>>>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>>>>>>>
>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>
>>>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>
>>>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>>>>>>
>>>>>>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>
>>>>>>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>>>>>>
>>>>>>> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The value 'localhost:2181' has been defined within the HBase jar in
>>>>>>> the hbase-default.xml as the default value for the zookeeper nodes.
>>>>>>>
>>>>>>> As a workaround I currently put this extra line in my code which I
>>>>>>> know is nasty but "works on my cluster"
>>>>>>>
>>>>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>>>>>>
>>>>>>>
>>>>>>> What am I doing wrong?
>>>>>>>
>>>>>>> What is the right way to fix this?
>>>>>>>
>>>>>>> --
>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>
>>>>>>> Niels Basjes
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>
>>>>>> Niels Basjes
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards / Met vriendelijke groeten,
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Re: HBase config settings go missing within Yarn.

Posted by Niels Basjes <Ni...@basjes.nl>.
I changed my cluster config (on all nodes) to include the HBase config dir
in the classpath.
Now everything works as expected.

This may very well be a misconfiguration of my cluster.
How ever ...
My current assesment:
Tools like Pig use the HBase config which has been specified on the LOCAL
machine. This allows running on a cluster and the HBase is not locally
defined.
Apparently Flink currently uses the HBase config which has been specified
on the REMOTE machine. This limits jobs to ONLY have the HBase that is
defined on the cluster.

At this point I'm unsure which is the right approach.

Niels Basjes

On Tue, Oct 24, 2017 at 11:29 AM, Niels Basjes <Ni...@basjes.nl> wrote:

> Minor correction: The HBase jar files are on the classpath, just in a
> different order.
>
> On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> I did some more digging.
>>
>> I added extra code to print both the environment variables and the
>> classpath that is used by the HBaseConfiguration to load the resource files.
>> I call this both locally and during startup of the job (i.e. these logs
>> arrive in the jobmanager.log on the cluster)
>>
>> Summary of that I found locally:
>>
>> Environment
>> 2017-10-24 08:50:15,612 INFO  com.bol.bugreports.Main
>>                    - HADOOP_CONF_DIR = /etc/hadoop/conf/
>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>                    - HBASE_CONF_DIR = /etc/hbase/conf/
>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>                    - FLINK_CONF_DIR = /usr/local/flink-1.3.2/conf
>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>                    - HIVE_CONF_DIR = /etc/hive/conf/
>> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>>                    - YARN_CONF_DIR = /etc/hadoop/conf/
>>
>> ClassPath
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - --> HBaseConfiguration: URLClassLoader =
>> sun.misc.Launcher$AppClassLoader@1b6d3586
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>> b/flink-python_2.11-1.3.2.jar
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>> b/flink-shaded-hadoop2-uber-1.3.2.jar
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>> b/joda-time-2.9.1.jar
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>> b/log4j-1.2.17.jar
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>> b/slf4j-log4j12-1.7.7.jar
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/local/flink-1.3.2/li
>> b/flink-dist_2.11-1.3.2.jar
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/home/nbasjes/FlinkHBaseC
>> onnect/
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/etc/hadoop/conf/
>> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/etc/hadoop/conf/
>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/etc/hbase/conf/
>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/lib/jvm/java-1.8.0-o
>> penjdk-1.8.0.141-2.b16.el6_9.x86_64/lib/tools.jar
>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hbase/
>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>> se/lib/activation-1.1.jar
>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>> se/lib/aopalliance-1.0.jar
>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>> se/lib/apacheds-i18n-2.0.0-M15.jar
>> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>>                    - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hba
>> se/lib/apacheds-kerberos-codec-2.0.0-M
>> ...
>>
>>
>>
>> On the cluster node in the jobmanager.log:
>>
>> ENVIRONMENT
>> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                                       - HADOOP_CONF_DIR = /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
>> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                                       - TEZ_CONF_DIR = /etc/tez/conf
>> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - YARN_CONF_DIR = /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
>> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - LOG_DIRS = /var/log/hadoop-yarn/containers/application_1503304315746_0062/container_1503304315746_0062_01_000001
>> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - HADOOP_YARN_HOME = /usr/hdp/2.3.4.0-3485/hadoop-yarn
>> 2017-10-24 10:50:19,974 INFO  com.bol.bugreports.Main                                       - HADOOP_HOME = /usr/hdp/2.3.4.0-3485/hadoop
>> 2017-10-24 10:50:19,975 INFO  com.bol.bugreports.Main                                       - HDP_VERSION = 2.3.4.0-3485
>>
>> And the classpath:
>>
>> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/15/flink-hbase-connect-1.0-SNAPSHOT.jar
>> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-dist_2.11-1.3.2.jar
>> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-python_2.11-1.3.2.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-shaded-hadoop2-uber-1.3.2.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/joda-time-2.9.1.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/log4j-1.2.17.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/slf4j-log4j12-1.7.7.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/14/log4j.properties
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/11/logback.xml
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/16/flink-dist_2.11-1.3.2.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/10/flink-conf.yaml
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/container_1503304315746_0062_01_000001/
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/etc/hadoop/conf/
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-nfs-2.7.1.2.3.4.0-3485.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-annotations-2.7.1.2.3.4.0-3485.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-auth-2.7.1.2.3.4.0-3485.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-azure-2.7.1.2.3.4.0-3485.jar
>> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-aws-2.7.1.2.3.4.0-3485.jar
>>
>>
>> So apparently everything HBase that was specified clientside is missing
>> when the task is running on my cluster.
>>
>> The thing is that when running for example a Pig script I get everything
>> perfectly fine on this cluster as it is configured right now.
>> Also the config 'shouldn't' (I think) need anything different because
>> this application only needs the HBase client (Jar, packaged into
>> application) and the HBase zookeeper settings (present on the machine where
>> it is started).
>>
>> Niels Basjes
>>
>>
>>
>>
>>
>> On Mon, Oct 23, 2017 at 10:23 AM, Piotr Nowojski <piotr@data-artisans.com
>> > wrote:
>>
>>> Till do you have some idea what is going on? I do not see any meaningful
>>> difference between Niels code and HBaseWriteStreamExample.java. There is
>>> also a very similar issue on mailing list as well: “Flink can't read hdfs
>>> namenode logical url”
>>>
>>> Piotrek
>>>
>>> On 22 Oct 2017, at 12:56, Niels Basjes <Ni...@basjes.nl> wrote:
>>>
>>> HI,
>>>
>>> Yes, on all nodes the the same /etc/hbase/conf/hbase-site.xml that
>>> contains the correct settings for hbase to find zookeeper.
>>> That is why adding that files as an additional resource to the
>>> configuration works.
>>> I have created a very simple project that reproduces the problem on my
>>> setup:
>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>>
>>> Niels Basjes
>>>
>>>
>>> On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <piotr@data-artisans.com
>>> > wrote:
>>>
>>>> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the
>>>> machines? If yes, could you share your code?
>>>>
>>>> On 20 Oct 2017, at 16:29, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>
>>>> I look at the logfiles from the Hadoop Yarn webinterface. I.e. actually
>>>> looking in the jobmanager.log of the container running the Flink task.
>>>> That is where I was able to find these messages .
>>>>
>>>> I do the
>>>>  hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>>> se-site.xml"));
>>>> in all places directly after the  HBaseConfiguration.create();
>>>> That way I simply force the task to look on the actual Hadoop node for
>>>> the same file it already loaded locally.
>>>>
>>>> The reason I'm suspecting Flink is because the clientside part of the
>>>> Flink application does have the right setting and the task/job actually
>>>> running in the cluster does not have the same settings.
>>>> So it seems in the transition into the cluster the application does not
>>>> copy everything it has available locally for some reason.
>>>>
>>>> There is a very high probability I did something wrong, I'm just not
>>>> seeing it at this moment.
>>>>
>>>> Niels
>>>>
>>>>
>>>>
>>>> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <
>>>> piotr@data-artisans.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> What do you mean by saying:
>>>>>
>>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>>
>>>>>
>>>>> The error doesn’t come from Flink? Where do you execute
>>>>>
>>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>>>> se-site.xml"));
>>>>>
>>>>> ?
>>>>>
>>>>> To me it seems like it is a problem with misconfigured HBase and not
>>>>> something related to Flink.
>>>>>
>>>>> Piotrek
>>>>>
>>>>> On 20 Oct 2017, at 13:44, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>>
>>>>> To facilitate you guys helping me I put this test project on github:
>>>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn
>>>>>> cluster where I need to connect to HBase.
>>>>>>
>>>>>> What I have:
>>>>>>
>>>>>> In my environment:
>>>>>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>>>>>> HBASE_CONF_DIR=/etc/hbase/conf/
>>>>>> HIVE_CONF_DIR=/etc/hive/conf/
>>>>>> YARN_CONF_DIR=/etc/hadoop/conf/
>>>>>>
>>>>>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the
>>>>>> zookeeper hosts for HBase.
>>>>>>
>>>>>> My test code is this:
>>>>>>
>>>>>> public class Main {
>>>>>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>>>>>>
>>>>>>   public static void main(String[] args) throws Exception {
>>>>>>     printZookeeperConfig();
>>>>>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>>>>>     env.createInput(new HBaseSource()).print();
>>>>>>     env.execute("HBase config problem");
>>>>>>   }
>>>>>>
>>>>>>   public static void printZookeeperConfig() {
>>>>>>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>>>>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>>>>>>   }
>>>>>>
>>>>>>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>>>>>>     @Override
>>>>>>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>>>>>>       table = createTable();
>>>>>>       if (table != null) {
>>>>>>         scan = getScanner();
>>>>>>       }
>>>>>>     }
>>>>>>
>>>>>>     private HTable createTable() {
>>>>>>       LOG.info("Initializing HBaseConfiguration");
>>>>>>       // Uses files found in the classpath
>>>>>>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>>>>>>       printZookeeperConfig();
>>>>>>
>>>>>>       try {
>>>>>>         return new HTable(hConf, getTableName());
>>>>>>       } catch (Exception e) {
>>>>>>         LOG.error("Error instantiating a new HTable instance", e);
>>>>>>       }
>>>>>>       return null;
>>>>>>     }
>>>>>>
>>>>>>     @Override
>>>>>>     public String getTableName() {
>>>>>>       return "bugs:flink";
>>>>>>     }
>>>>>>
>>>>>>     @Override
>>>>>>     protected String mapResultToOutType(Result result) {
>>>>>>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>>>>>     }
>>>>>>
>>>>>>     @Override
>>>>>>     protected Scan getScanner() {
>>>>>>       return new Scan();
>>>>>>     }
>>>>>>   }
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>> I run this application with this command on my Yarn cluster (note:
>>>>>> first starting a yarn-cluster and then submitting the job yields the same
>>>>>> result).
>>>>>>
>>>>>> flink \
>>>>>>     run \
>>>>>>     -m yarn-cluster \
>>>>>>     --yarncontainer 1 \
>>>>>>     --yarnname "Flink on Yarn HBase problem" \
>>>>>>     --yarnslots                     1     \
>>>>>>     --yarnjobManagerMemory          4000  \
>>>>>>     --yarntaskManagerMemory         4000  \
>>>>>>     --yarnstreaming                       \
>>>>>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>>>>>>
>>>>>> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see
>>>>>>
>>>>>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>>>>>>
>>>>>> 2) The zookeeper settings of my experimental environent have been picked up by the software
>>>>>>
>>>>>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181
>>>>>>
>>>>>>
>>>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>>>
>>>>>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = *localhost*
>>>>>>
>>>>>>
>>>>>> and as a consequence
>>>>>>
>>>>>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181
>>>>>>
>>>>>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>>>>>>
>>>>>> java.net.ConnectException: Connection refused
>>>>>>
>>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>
>>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>>>>>
>>>>>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>
>>>>>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>>>>>
>>>>>> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>>>>>>
>>>>>>
>>>>>>
>>>>>> The value 'localhost:2181' has been defined within the HBase jar in
>>>>>> the hbase-default.xml as the default value for the zookeeper nodes.
>>>>>>
>>>>>> As a workaround I currently put this extra line in my code which I
>>>>>> know is nasty but "works on my cluster"
>>>>>>
>>>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>>>>>
>>>>>>
>>>>>> What am I doing wrong?
>>>>>>
>>>>>> What is the right way to fix this?
>>>>>>
>>>>>> --
>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>
>>>>>> Niels Basjes
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards / Met vriendelijke groeten,
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>>
>>>
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: HBase config settings go missing within Yarn.

Posted by Niels Basjes <Ni...@basjes.nl>.
Minor correction: The HBase jar files are on the classpath, just in a
different order.

On Tue, Oct 24, 2017 at 11:18 AM, Niels Basjes <Ni...@basjes.nl> wrote:

> I did some more digging.
>
> I added extra code to print both the environment variables and the
> classpath that is used by the HBaseConfiguration to load the resource files.
> I call this both locally and during startup of the job (i.e. these logs
> arrive in the jobmanager.log on the cluster)
>
> Summary of that I found locally:
>
> Environment
> 2017-10-24 08:50:15,612 INFO  com.bol.bugreports.Main
>                  - HADOOP_CONF_DIR = /etc/hadoop/conf/
> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>                  - HBASE_CONF_DIR = /etc/hbase/conf/
> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>                  - FLINK_CONF_DIR = /usr/local/flink-1.3.2/conf
> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>                  - HIVE_CONF_DIR = /etc/hive/conf/
> 2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
>                  - YARN_CONF_DIR = /etc/hadoop/conf/
>
> ClassPath
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - --> HBaseConfiguration: URLClassLoader =
> sun.misc.Launcher$AppClassLoader@1b6d3586
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/flink-python_2.11-1.3.2.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/flink-shaded-hadoop2-uber-1.3.2.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/joda-time-2.9.1.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/log4j-1.2.17.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/slf4j-log4j12-1.7.7.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/local/flink-1.3.2/
> lib/flink-dist_2.11-1.3.2.jar
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/home/nbasjes/FlinkHBaseConnect/
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/etc/hadoop/conf/
> 2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/etc/hadoop/conf/
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/etc/hbase/conf/
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/lib/jvm/java-1.8.0-
> openjdk-1.8.0.141-2.b16.el6_9.x86_64/lib/tools.jar
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hbase/
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/
> hbase/lib/activation-1.1.jar
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/
> hbase/lib/aopalliance-1.0.jar
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/
> hbase/lib/apacheds-i18n-2.0.0-M15.jar
> 2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
>                  - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/
> hbase/lib/apacheds-kerberos-codec-2.0.0-M
> ...
>
>
>
> On the cluster node in the jobmanager.log:
>
> ENVIRONMENT
> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                                       - HADOOP_CONF_DIR = /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
> 2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main                                       - TEZ_CONF_DIR = /etc/tez/conf
> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - YARN_CONF_DIR = /usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - LOG_DIRS = /var/log/hadoop-yarn/containers/application_1503304315746_0062/container_1503304315746_0062_01_000001
> 2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main                                       - HADOOP_YARN_HOME = /usr/hdp/2.3.4.0-3485/hadoop-yarn
> 2017-10-24 10:50:19,974 INFO  com.bol.bugreports.Main                                       - HADOOP_HOME = /usr/hdp/2.3.4.0-3485/hadoop
> 2017-10-24 10:50:19,975 INFO  com.bol.bugreports.Main                                       - HDP_VERSION = 2.3.4.0-3485
>
> And the classpath:
>
> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/15/flink-hbase-connect-1.0-SNAPSHOT.jar
> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-dist_2.11-1.3.2.jar
> 2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-python_2.11-1.3.2.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-shaded-hadoop2-uber-1.3.2.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/joda-time-2.9.1.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/log4j-1.2.17.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/slf4j-log4j12-1.7.7.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/14/log4j.properties
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/11/logback.xml
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/16/flink-dist_2.11-1.3.2.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/10/flink-conf.yaml
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/container_1503304315746_0062_01_000001/
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/etc/hadoop/conf/
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-nfs-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-annotations-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-auth-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-azure-2.7.1.2.3.4.0-3485.jar
> 2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main                                       - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-aws-2.7.1.2.3.4.0-3485.jar
>
>
> So apparently everything HBase that was specified clientside is missing
> when the task is running on my cluster.
>
> The thing is that when running for example a Pig script I get everything
> perfectly fine on this cluster as it is configured right now.
> Also the config 'shouldn't' (I think) need anything different because this
> application only needs the HBase client (Jar, packaged into application)
> and the HBase zookeeper settings (present on the machine where it is
> started).
>
> Niels Basjes
>
>
>
>
>
> On Mon, Oct 23, 2017 at 10:23 AM, Piotr Nowojski <pi...@data-artisans.com>
> wrote:
>
>> Till do you have some idea what is going on? I do not see any meaningful
>> difference between Niels code and HBaseWriteStreamExample.java. There is
>> also a very similar issue on mailing list as well: “Flink can't read hdfs
>> namenode logical url”
>>
>> Piotrek
>>
>> On 22 Oct 2017, at 12:56, Niels Basjes <Ni...@basjes.nl> wrote:
>>
>> HI,
>>
>> Yes, on all nodes the the same /etc/hbase/conf/hbase-site.xml that
>> contains the correct settings for hbase to find zookeeper.
>> That is why adding that files as an additional resource to the
>> configuration works.
>> I have created a very simple project that reproduces the problem on my
>> setup:
>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>
>> Niels Basjes
>>
>>
>> On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <pi...@data-artisans.com>
>> wrote:
>>
>>> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the
>>> machines? If yes, could you share your code?
>>>
>>> On 20 Oct 2017, at 16:29, Niels Basjes <Ni...@basjes.nl> wrote:
>>>
>>> I look at the logfiles from the Hadoop Yarn webinterface. I.e. actually
>>> looking in the jobmanager.log of the container running the Flink task.
>>> That is where I was able to find these messages .
>>>
>>> I do the
>>>  hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>> se-site.xml"));
>>> in all places directly after the  HBaseConfiguration.create();
>>> That way I simply force the task to look on the actual Hadoop node for
>>> the same file it already loaded locally.
>>>
>>> The reason I'm suspecting Flink is because the clientside part of the
>>> Flink application does have the right setting and the task/job actually
>>> running in the cluster does not have the same settings.
>>> So it seems in the transition into the cluster the application does not
>>> copy everything it has available locally for some reason.
>>>
>>> There is a very high probability I did something wrong, I'm just not
>>> seeing it at this moment.
>>>
>>> Niels
>>>
>>>
>>>
>>> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <piotr@data-artisans.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> What do you mean by saying:
>>>>
>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>
>>>>
>>>> The error doesn’t come from Flink? Where do you execute
>>>>
>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>>> se-site.xml"));
>>>>
>>>> ?
>>>>
>>>> To me it seems like it is a problem with misconfigured HBase and not
>>>> something related to Flink.
>>>>
>>>> Piotrek
>>>>
>>>> On 20 Oct 2017, at 13:44, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>
>>>> To facilitate you guys helping me I put this test project on github:
>>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>>>
>>>> Niels Basjes
>>>>
>>>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn
>>>>> cluster where I need to connect to HBase.
>>>>>
>>>>> What I have:
>>>>>
>>>>> In my environment:
>>>>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>>>>> HBASE_CONF_DIR=/etc/hbase/conf/
>>>>> HIVE_CONF_DIR=/etc/hive/conf/
>>>>> YARN_CONF_DIR=/etc/hadoop/conf/
>>>>>
>>>>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the
>>>>> zookeeper hosts for HBase.
>>>>>
>>>>> My test code is this:
>>>>>
>>>>> public class Main {
>>>>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>>>>>
>>>>>   public static void main(String[] args) throws Exception {
>>>>>     printZookeeperConfig();
>>>>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>>>>     env.createInput(new HBaseSource()).print();
>>>>>     env.execute("HBase config problem");
>>>>>   }
>>>>>
>>>>>   public static void printZookeeperConfig() {
>>>>>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>>>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>>>>>   }
>>>>>
>>>>>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>>>>>     @Override
>>>>>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>>>>>       table = createTable();
>>>>>       if (table != null) {
>>>>>         scan = getScanner();
>>>>>       }
>>>>>     }
>>>>>
>>>>>     private HTable createTable() {
>>>>>       LOG.info("Initializing HBaseConfiguration");
>>>>>       // Uses files found in the classpath
>>>>>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>>>>>       printZookeeperConfig();
>>>>>
>>>>>       try {
>>>>>         return new HTable(hConf, getTableName());
>>>>>       } catch (Exception e) {
>>>>>         LOG.error("Error instantiating a new HTable instance", e);
>>>>>       }
>>>>>       return null;
>>>>>     }
>>>>>
>>>>>     @Override
>>>>>     public String getTableName() {
>>>>>       return "bugs:flink";
>>>>>     }
>>>>>
>>>>>     @Override
>>>>>     protected String mapResultToOutType(Result result) {
>>>>>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>>>>     }
>>>>>
>>>>>     @Override
>>>>>     protected Scan getScanner() {
>>>>>       return new Scan();
>>>>>     }
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> I run this application with this command on my Yarn cluster (note:
>>>>> first starting a yarn-cluster and then submitting the job yields the same
>>>>> result).
>>>>>
>>>>> flink \
>>>>>     run \
>>>>>     -m yarn-cluster \
>>>>>     --yarncontainer 1 \
>>>>>     --yarnname "Flink on Yarn HBase problem" \
>>>>>     --yarnslots                     1     \
>>>>>     --yarnjobManagerMemory          4000  \
>>>>>     --yarntaskManagerMemory         4000  \
>>>>>     --yarnstreaming                       \
>>>>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>>>>>
>>>>> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see
>>>>>
>>>>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>>>>>
>>>>> 2) The zookeeper settings of my experimental environent have been picked up by the software
>>>>>
>>>>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181
>>>>>
>>>>>
>>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>>
>>>>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = *localhost*
>>>>>
>>>>>
>>>>> and as a consequence
>>>>>
>>>>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181
>>>>>
>>>>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>>>>>
>>>>> java.net.ConnectException: Connection refused
>>>>>
>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>
>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>>>>
>>>>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>
>>>>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>>>>
>>>>> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>>>>>
>>>>>
>>>>>
>>>>> The value 'localhost:2181' has been defined within the HBase jar in
>>>>> the hbase-default.xml as the default value for the zookeeper nodes.
>>>>>
>>>>> As a workaround I currently put this extra line in my code which I
>>>>> know is nasty but "works on my cluster"
>>>>>
>>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>>>>
>>>>>
>>>>> What am I doing wrong?
>>>>>
>>>>> What is the right way to fix this?
>>>>>
>>>>> --
>>>>> Best regards / Met vriendelijke groeten,
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>>
>>>
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>>
>>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: HBase config settings go missing within Yarn.

Posted by Niels Basjes <Ni...@basjes.nl>.
I did some more digging.

I added extra code to print both the environment variables and the
classpath that is used by the HBaseConfiguration to load the resource files.
I call this both locally and during startup of the job (i.e. these logs
arrive in the jobmanager.log on the cluster)

Summary of that I found locally:

Environment
2017-10-24 08:50:15,612 INFO  com.bol.bugreports.Main
                 - HADOOP_CONF_DIR = /etc/hadoop/conf/
2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
                 - HBASE_CONF_DIR = /etc/hbase/conf/
2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
                 - FLINK_CONF_DIR = /usr/local/flink-1.3.2/conf
2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
                 - HIVE_CONF_DIR = /etc/hive/conf/
2017-10-24 08:50:15,613 INFO  com.bol.bugreports.Main
                 - YARN_CONF_DIR = /etc/hadoop/conf/

ClassPath
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - --> HBaseConfiguration: URLClassLoader =
sun.misc.Launcher$AppClassLoader@1b6d3586
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/local/flink-1.3.2/lib/flink-python_2.11-1.3.2.jar
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/local/flink-1.3.2/lib/flink-shaded-hadoop2-uber-1.3.2.jar
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/local/flink-1.3.2/lib/joda-time-2.9.1.jar
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/local/flink-1.3.2/lib/log4j-1.2.17.jar
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/local/flink-1.3.2/lib/slf4j-log4j12-1.7.7.jar
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/local/flink-1.3.2/lib/flink-dist_2.11-1.3.2.jar
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - ----> ClassPath = file:/home/nbasjes/FlinkHBaseConnect/
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - ----> ClassPath = file:/etc/hadoop/conf/
2017-10-24 08:50:15,614 INFO  com.bol.bugreports.Main
                 - ----> ClassPath = file:/etc/hadoop/conf/
2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
                 - ----> ClassPath = file:/etc/hbase/conf/
2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-2.b16.el6_9.x86_64/lib/tools.jar
2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
                 - ----> ClassPath = file:/usr/hdp/2.3.4.0-3485/hbase/
2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hbase/lib/activation-1.1.jar
2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hbase/lib/aopalliance-1.0.jar
2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hbase/lib/apacheds-i18n-2.0.0-M15.jar
2017-10-24 08:50:15,615 INFO  com.bol.bugreports.Main
                 - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hbase/lib/apacheds-kerberos-codec-2.0.0-M
...



On the cluster node in the jobmanager.log:

ENVIRONMENT
2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main
                     - HADOOP_CONF_DIR =
/usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
2017-10-24 10:50:19,971 INFO  com.bol.bugreports.Main
                     - TEZ_CONF_DIR = /etc/tez/conf
2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main
                     - YARN_CONF_DIR =
/usr/hdp/current/hadoop-yarn-nodemanager/../hadoop/conf
2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main
                     - LOG_DIRS =
/var/log/hadoop-yarn/containers/application_1503304315746_0062/container_1503304315746_0062_01_000001
2017-10-24 10:50:19,973 INFO  com.bol.bugreports.Main
                     - HADOOP_YARN_HOME =
/usr/hdp/2.3.4.0-3485/hadoop-yarn
2017-10-24 10:50:19,974 INFO  com.bol.bugreports.Main
                     - HADOOP_HOME = /usr/hdp/2.3.4.0-3485/hadoop
2017-10-24 10:50:19,975 INFO  com.bol.bugreports.Main
                     - HDP_VERSION = 2.3.4.0-3485

And the classpath:

2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/15/flink-hbase-connect-1.0-SNAPSHOT.jar
2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-dist_2.11-1.3.2.jar
2017-10-24 10:50:19,977 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-python_2.11-1.3.2.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/flink-shaded-hadoop2-uber-1.3.2.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/joda-time-2.9.1.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/log4j-1.2.17.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/13/lib/slf4j-log4j12-1.7.7.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/14/log4j.properties
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/11/logback.xml
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/16/flink-dist_2.11-1.3.2.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/filecache/10/flink-conf.yaml
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/nbasjes/appcache/application_1503304315746_0062/container_1503304315746_0062_01_000001/
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath = file:/etc/hadoop/conf/
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-nfs-2.7.1.2.3.4.0-3485.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-common-2.7.1.2.3.4.0-3485.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-annotations-2.7.1.2.3.4.0-3485.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-auth-2.7.1.2.3.4.0-3485.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-azure-2.7.1.2.3.4.0-3485.jar
2017-10-24 10:50:19,978 INFO  com.bol.bugreports.Main
                     - ----> ClassPath =
file:/usr/hdp/2.3.4.0-3485/hadoop/hadoop-aws-2.7.1.2.3.4.0-3485.jar


So apparently everything HBase that was specified clientside is missing
when the task is running on my cluster.

The thing is that when running for example a Pig script I get everything
perfectly fine on this cluster as it is configured right now.
Also the config 'shouldn't' (I think) need anything different because this
application only needs the HBase client (Jar, packaged into application)
and the HBase zookeeper settings (present on the machine where it is
started).

Niels Basjes





On Mon, Oct 23, 2017 at 10:23 AM, Piotr Nowojski <pi...@data-artisans.com>
wrote:

> Till do you have some idea what is going on? I do not see any meaningful
> difference between Niels code and HBaseWriteStreamExample.java. There is
> also a very similar issue on mailing list as well: “Flink can't read hdfs
> namenode logical url”
>
> Piotrek
>
> On 22 Oct 2017, at 12:56, Niels Basjes <Ni...@basjes.nl> wrote:
>
> HI,
>
> Yes, on all nodes the the same /etc/hbase/conf/hbase-site.xml that
> contains the correct settings for hbase to find zookeeper.
> That is why adding that files as an additional resource to the
> configuration works.
> I have created a very simple project that reproduces the problem on my
> setup:
> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>
> Niels Basjes
>
>
> On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <pi...@data-artisans.com>
> wrote:
>
>> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the
>> machines? If yes, could you share your code?
>>
>> On 20 Oct 2017, at 16:29, Niels Basjes <Ni...@basjes.nl> wrote:
>>
>> I look at the logfiles from the Hadoop Yarn webinterface. I.e. actually
>> looking in the jobmanager.log of the container running the Flink task.
>> That is where I was able to find these messages .
>>
>> I do the
>>  hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>> se-site.xml"));
>> in all places directly after the  HBaseConfiguration.create();
>> That way I simply force the task to look on the actual Hadoop node for
>> the same file it already loaded locally.
>>
>> The reason I'm suspecting Flink is because the clientside part of the
>> Flink application does have the right setting and the task/job actually
>> running in the cluster does not have the same settings.
>> So it seems in the transition into the cluster the application does not
>> copy everything it has available locally for some reason.
>>
>> There is a very high probability I did something wrong, I'm just not
>> seeing it at this moment.
>>
>> Niels
>>
>>
>>
>> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <pi...@data-artisans.com>
>> wrote:
>>
>>> Hi,
>>>
>>> What do you mean by saying:
>>>
>>> When I open the logfiles on the Hadoop cluster I see this:
>>>
>>>
>>> The error doesn’t come from Flink? Where do you execute
>>>
>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hba
>>> se-site.xml"));
>>>
>>> ?
>>>
>>> To me it seems like it is a problem with misconfigured HBase and not
>>> something related to Flink.
>>>
>>> Piotrek
>>>
>>> On 20 Oct 2017, at 13:44, Niels Basjes <Ni...@basjes.nl> wrote:
>>>
>>> To facilitate you guys helping me I put this test project on github:
>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>>
>>> Niels Basjes
>>>
>>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>>>
>>>> Hi,
>>>>
>>>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn
>>>> cluster where I need to connect to HBase.
>>>>
>>>> What I have:
>>>>
>>>> In my environment:
>>>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>>>> HBASE_CONF_DIR=/etc/hbase/conf/
>>>> HIVE_CONF_DIR=/etc/hive/conf/
>>>> YARN_CONF_DIR=/etc/hadoop/conf/
>>>>
>>>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the
>>>> zookeeper hosts for HBase.
>>>>
>>>> My test code is this:
>>>>
>>>> public class Main {
>>>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>>>>
>>>>   public static void main(String[] args) throws Exception {
>>>>     printZookeeperConfig();
>>>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>>>     env.createInput(new HBaseSource()).print();
>>>>     env.execute("HBase config problem");
>>>>   }
>>>>
>>>>   public static void printZookeeperConfig() {
>>>>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>>>>   }
>>>>
>>>>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>>>>     @Override
>>>>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>>>>       table = createTable();
>>>>       if (table != null) {
>>>>         scan = getScanner();
>>>>       }
>>>>     }
>>>>
>>>>     private HTable createTable() {
>>>>       LOG.info("Initializing HBaseConfiguration");
>>>>       // Uses files found in the classpath
>>>>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>>>>       printZookeeperConfig();
>>>>
>>>>       try {
>>>>         return new HTable(hConf, getTableName());
>>>>       } catch (Exception e) {
>>>>         LOG.error("Error instantiating a new HTable instance", e);
>>>>       }
>>>>       return null;
>>>>     }
>>>>
>>>>     @Override
>>>>     public String getTableName() {
>>>>       return "bugs:flink";
>>>>     }
>>>>
>>>>     @Override
>>>>     protected String mapResultToOutType(Result result) {
>>>>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>>>     }
>>>>
>>>>     @Override
>>>>     protected Scan getScanner() {
>>>>       return new Scan();
>>>>     }
>>>>   }
>>>>
>>>> }
>>>>
>>>>
>>>> I run this application with this command on my Yarn cluster (note:
>>>> first starting a yarn-cluster and then submitting the job yields the same
>>>> result).
>>>>
>>>> flink \
>>>>     run \
>>>>     -m yarn-cluster \
>>>>     --yarncontainer 1 \
>>>>     --yarnname "Flink on Yarn HBase problem" \
>>>>     --yarnslots                     1     \
>>>>     --yarnjobManagerMemory          4000  \
>>>>     --yarntaskManagerMemory         4000  \
>>>>     --yarnstreaming                       \
>>>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>>>>
>>>> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see
>>>>
>>>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>>>>
>>>> 2) The zookeeper settings of my experimental environent have been picked up by the software
>>>>
>>>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181
>>>>
>>>>
>>>> When I open the logfiles on the Hadoop cluster I see this:
>>>>
>>>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = *localhost*
>>>>
>>>>
>>>> and as a consequence
>>>>
>>>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181
>>>>
>>>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>>>>
>>>> java.net.ConnectException: Connection refused
>>>>
>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>
>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>>>
>>>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>
>>>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>>>
>>>> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>>>>
>>>>
>>>>
>>>> The value 'localhost:2181' has been defined within the HBase jar in the
>>>> hbase-default.xml as the default value for the zookeeper nodes.
>>>>
>>>> As a workaround I currently put this extra line in my code which I know
>>>> is nasty but "works on my cluster"
>>>>
>>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>>>
>>>>
>>>> What am I doing wrong?
>>>>
>>>> What is the right way to fix this?
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>>
>>>
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>>
>>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: HBase config settings go missing within Yarn.

Posted by Piotr Nowojski <pi...@data-artisans.com>.
Till do you have some idea what is going on? I do not see any meaningful difference between Niels code and HBaseWriteStreamExample.java. There is also a very similar issue on mailing list as well: “Flink can't read hdfs namenode logical url” 

Piotrek

> On 22 Oct 2017, at 12:56, Niels Basjes <Ni...@basjes.nl> wrote:
> 
> HI,
> 
> Yes, on all nodes the the same /etc/hbase/conf/hbase-site.xml that contains the correct settings for hbase to find zookeeper.
> That is why adding that files as an additional resource to the configuration works.
> I have created a very simple project that reproduces the problem on my setup:
> https://github.com/nielsbasjes/FlinkHBaseConnectProblem <https://github.com/nielsbasjes/FlinkHBaseConnectProblem>
> 
> Niels Basjes
> 
> 
> On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <piotr@data-artisans.com <ma...@data-artisans.com>> wrote:
> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the machines? If yes, could you share your code?
> 
>> On 20 Oct 2017, at 16:29, Niels Basjes <Niels@basjes.nl <ma...@basjes.nl>> wrote:
>> 
>> I look at the logfiles from the Hadoop Yarn webinterface. I.e. actually looking in the jobmanager.log of the container running the Flink task.
>> That is where I was able to find these messages .
>> 
>> I do the
>>  hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>> in all places directly after the  HBaseConfiguration.create();
>> That way I simply force the task to look on the actual Hadoop node for the same file it already loaded locally.
>> 
>> The reason I'm suspecting Flink is because the clientside part of the Flink application does have the right setting and the task/job actually running in the cluster does not have the same settings.
>> So it seems in the transition into the cluster the application does not copy everything it has available locally for some reason.
>> 
>> There is a very high probability I did something wrong, I'm just not seeing it at this moment.
>> 
>> Niels
>> 
>> 
>> 
>> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <piotr@data-artisans.com <ma...@data-artisans.com>> wrote:
>> Hi,
>> 
>> What do you mean by saying:
>> 
>>> When I open the logfiles on the Hadoop cluster I see this:
>> 
>> 
>> The error doesn’t come from Flink? Where do you execute 
>> 
>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>> 
>> ?
>> 
>> To me it seems like it is a problem with misconfigured HBase and not something related to Flink.
>> 
>> Piotrek
>> 
>>> On 20 Oct 2017, at 13:44, Niels Basjes <Niels@basjes.nl <ma...@basjes.nl>> wrote:
>>> 
>>> To facilitate you guys helping me I put this test project on github:
>>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem <https://github.com/nielsbasjes/FlinkHBaseConnectProblem>
>>> 
>>> Niels Basjes
>>> 
>>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Niels@basjes.nl <ma...@basjes.nl>> wrote:
>>> Hi,
>>> 
>>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn cluster where I need to connect to HBase.
>>> 
>>> What I have:
>>> 
>>> In my environment:
>>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>>> HBASE_CONF_DIR=/etc/hbase/conf/
>>> HIVE_CONF_DIR=/etc/hive/conf/
>>> YARN_CONF_DIR=/etc/hadoop/conf/
>>> 
>>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the zookeeper hosts for HBase.
>>> 
>>> My test code is this:
>>> public class Main {
>>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>>> 
>>>   public static void main(String[] args) throws Exception {
>>>     printZookeeperConfig();
>>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>>     env.createInput(new HBaseSource()).print();
>>>     env.execute("HBase config problem");
>>>   }
>>> 
>>>   public static void printZookeeperConfig() {
>>>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>>>   }
>>> 
>>>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>>>     @Override
>>>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>>>       table = createTable();
>>>       if (table != null) {
>>>         scan = getScanner();
>>>       }
>>>     }
>>> 
>>>     private HTable createTable() {
>>>       LOG.info("Initializing HBaseConfiguration");
>>>       // Uses files found in the classpath
>>>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>>>       printZookeeperConfig();
>>> 
>>>       try {
>>>         return new HTable(hConf, getTableName());
>>>       } catch (Exception e) {
>>>         LOG.error("Error instantiating a new HTable instance", e);
>>>       }
>>>       return null;
>>>     }
>>> 
>>>     @Override
>>>     public String getTableName() {
>>>       return "bugs:flink";
>>>     }
>>> 
>>>     @Override
>>>     protected String mapResultToOutType(Result result) {
>>>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>>     }
>>> 
>>>     @Override
>>>     protected Scan getScanner() {
>>>       return new Scan();
>>>     }
>>>   }
>>> 
>>> }
>>> 
>>> I run this application with this command on my Yarn cluster (note: first starting a yarn-cluster and then submitting the job yields the same result).
>>> 
>>> flink \
>>>     run \
>>>     -m yarn-cluster \
>>>     --yarncontainer 1 \
>>>     --yarnname "Flink on Yarn HBase problem" \
>>>     --yarnslots                     1     \
>>>     --yarnjobManagerMemory          4000  \
>>>     --yarntaskManagerMemory         4000  \
>>>     --yarnstreaming                       \
>>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>>> 
>>> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see 
>>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>>> 2) The zookeeper settings of my experimental environent have been picked up by the software
>>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181 <http://node1.kluster.local.nl.bol.com:2181/>,node2.kluster.local.nl.bol.com:2181 <http://node2.kluster.local.nl.bol.com:2181/>,node3.kluster.local.nl.bol.com:2181 <http://node3.kluster.local.nl.bol.com:2181/>
>>> 
>>> When I open the logfiles on the Hadoop cluster I see this:
>>> 
>>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = localhost
>>> 
>>> and as a consequence
>>> 
>>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181 <http://127.0.0.1:2181/>
>>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>>> java.net.ConnectException: Connection refused
>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>>> 
>>> 
>>> The value 'localhost:2181' has been defined within the HBase jar in the hbase-default.xml as the default value for the zookeeper nodes.
>>> 
>>> As a workaround I currently put this extra line in my code which I know is nasty but "works on my cluster"
>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>> 
>>> What am I doing wrong?
>>> 
>>> What is the right way to fix this?
>>> 
>>> -- 
>>> Best regards / Met vriendelijke groeten,
>>> 
>>> Niels Basjes
>>> 
>>> 
>>> 
>>> -- 
>>> Best regards / Met vriendelijke groeten,
>>> 
>>> Niels Basjes
>> 
>> 
>> 
>> 
>> -- 
>> Best regards / Met vriendelijke groeten,
>> 
>> Niels Basjes
> 
> 
> 
> 
> -- 
> Best regards / Met vriendelijke groeten,
> 
> Niels Basjes


Re: HBase config settings go missing within Yarn.

Posted by Niels Basjes <Ni...@basjes.nl>.
HI,

Yes, on all nodes the the same /etc/hbase/conf/hbase-site.xml that contains
the correct settings for hbase to find zookeeper.
That is why adding that files as an additional resource to the
configuration works.
I have created a very simple project that reproduces the problem on my
setup:
https://github.com/nielsbasjes/FlinkHBaseConnectProblem

Niels Basjes


On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <pi...@data-artisans.com>
wrote:

> Is this /etc/hbase/conf/hbase-site.xml file is present on all of the
> machines? If yes, could you share your code?
>
> On 20 Oct 2017, at 16:29, Niels Basjes <Ni...@basjes.nl> wrote:
>
> I look at the logfiles from the Hadoop Yarn webinterface. I.e. actually
> looking in the jobmanager.log of the container running the Flink task.
> That is where I was able to find these messages .
>
> I do the
>  hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
> in all places directly after the  HBaseConfiguration.create();
> That way I simply force the task to look on the actual Hadoop node for the
> same file it already loaded locally.
>
> The reason I'm suspecting Flink is because the clientside part of the
> Flink application does have the right setting and the task/job actually
> running in the cluster does not have the same settings.
> So it seems in the transition into the cluster the application does not
> copy everything it has available locally for some reason.
>
> There is a very high probability I did something wrong, I'm just not
> seeing it at this moment.
>
> Niels
>
>
>
> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <pi...@data-artisans.com>
> wrote:
>
>> Hi,
>>
>> What do you mean by saying:
>>
>> When I open the logfiles on the Hadoop cluster I see this:
>>
>>
>> The error doesn’t come from Flink? Where do you execute
>>
>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>
>> ?
>>
>> To me it seems like it is a problem with misconfigured HBase and not
>> something related to Flink.
>>
>> Piotrek
>>
>> On 20 Oct 2017, at 13:44, Niels Basjes <Ni...@basjes.nl> wrote:
>>
>> To facilitate you guys helping me I put this test project on github:
>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>>
>> Niels Basjes
>>
>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>>
>>> Hi,
>>>
>>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn
>>> cluster where I need to connect to HBase.
>>>
>>> What I have:
>>>
>>> In my environment:
>>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>>> HBASE_CONF_DIR=/etc/hbase/conf/
>>> HIVE_CONF_DIR=/etc/hive/conf/
>>> YARN_CONF_DIR=/etc/hadoop/conf/
>>>
>>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the zookeeper
>>> hosts for HBase.
>>>
>>> My test code is this:
>>>
>>> public class Main {
>>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>>>
>>>   public static void main(String[] args) throws Exception {
>>>     printZookeeperConfig();
>>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>>     env.createInput(new HBaseSource()).print();
>>>     env.execute("HBase config problem");
>>>   }
>>>
>>>   public static void printZookeeperConfig() {
>>>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>>>   }
>>>
>>>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>>>     @Override
>>>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>>>       table = createTable();
>>>       if (table != null) {
>>>         scan = getScanner();
>>>       }
>>>     }
>>>
>>>     private HTable createTable() {
>>>       LOG.info("Initializing HBaseConfiguration");
>>>       // Uses files found in the classpath
>>>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>>>       printZookeeperConfig();
>>>
>>>       try {
>>>         return new HTable(hConf, getTableName());
>>>       } catch (Exception e) {
>>>         LOG.error("Error instantiating a new HTable instance", e);
>>>       }
>>>       return null;
>>>     }
>>>
>>>     @Override
>>>     public String getTableName() {
>>>       return "bugs:flink";
>>>     }
>>>
>>>     @Override
>>>     protected String mapResultToOutType(Result result) {
>>>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>>     }
>>>
>>>     @Override
>>>     protected Scan getScanner() {
>>>       return new Scan();
>>>     }
>>>   }
>>>
>>> }
>>>
>>>
>>> I run this application with this command on my Yarn cluster (note: first
>>> starting a yarn-cluster and then submitting the job yields the same result).
>>>
>>> flink \
>>>     run \
>>>     -m yarn-cluster \
>>>     --yarncontainer 1 \
>>>     --yarnname "Flink on Yarn HBase problem" \
>>>     --yarnslots                     1     \
>>>     --yarnjobManagerMemory          4000  \
>>>     --yarntaskManagerMemory         4000  \
>>>     --yarnstreaming                       \
>>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>>>
>>> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see
>>>
>>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>>>
>>> 2) The zookeeper settings of my experimental environent have been picked up by the software
>>>
>>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181
>>>
>>>
>>> When I open the logfiles on the Hadoop cluster I see this:
>>>
>>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = *localhost*
>>>
>>>
>>> and as a consequence
>>>
>>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181
>>>
>>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>>>
>>> java.net.ConnectException: Connection refused
>>>
>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>
>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>>
>>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>
>>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>>
>>> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>>>
>>>
>>>
>>> The value 'localhost:2181' has been defined within the HBase jar in the
>>> hbase-default.xml as the default value for the zookeeper nodes.
>>>
>>> As a workaround I currently put this extra line in my code which I know
>>> is nasty but "works on my cluster"
>>>
>>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>>
>>>
>>> What am I doing wrong?
>>>
>>> What is the right way to fix this?
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>>
>>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: HBase config settings go missing within Yarn.

Posted by Piotr Nowojski <pi...@data-artisans.com>.
Is this /etc/hbase/conf/hbase-site.xml file is present on all of the machines? If yes, could you share your code?

> On 20 Oct 2017, at 16:29, Niels Basjes <Ni...@basjes.nl> wrote:
> 
> I look at the logfiles from the Hadoop Yarn webinterface. I.e. actually looking in the jobmanager.log of the container running the Flink task.
> That is where I was able to find these messages .
> 
> I do the
>  hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
> in all places directly after the  HBaseConfiguration.create();
> That way I simply force the task to look on the actual Hadoop node for the same file it already loaded locally.
> 
> The reason I'm suspecting Flink is because the clientside part of the Flink application does have the right setting and the task/job actually running in the cluster does not have the same settings.
> So it seems in the transition into the cluster the application does not copy everything it has available locally for some reason.
> 
> There is a very high probability I did something wrong, I'm just not seeing it at this moment.
> 
> Niels
> 
> 
> 
> On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <piotr@data-artisans.com <ma...@data-artisans.com>> wrote:
> Hi,
> 
> What do you mean by saying:
> 
>> When I open the logfiles on the Hadoop cluster I see this:
> 
> 
> The error doesn’t come from Flink? Where do you execute 
> 
> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
> 
> ?
> 
> To me it seems like it is a problem with misconfigured HBase and not something related to Flink.
> 
> Piotrek
> 
>> On 20 Oct 2017, at 13:44, Niels Basjes <Niels@basjes.nl <ma...@basjes.nl>> wrote:
>> 
>> To facilitate you guys helping me I put this test project on github:
>> https://github.com/nielsbasjes/FlinkHBaseConnectProblem <https://github.com/nielsbasjes/FlinkHBaseConnectProblem>
>> 
>> Niels Basjes
>> 
>> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Niels@basjes.nl <ma...@basjes.nl>> wrote:
>> Hi,
>> 
>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn cluster where I need to connect to HBase.
>> 
>> What I have:
>> 
>> In my environment:
>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>> HBASE_CONF_DIR=/etc/hbase/conf/
>> HIVE_CONF_DIR=/etc/hive/conf/
>> YARN_CONF_DIR=/etc/hadoop/conf/
>> 
>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the zookeeper hosts for HBase.
>> 
>> My test code is this:
>> public class Main {
>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>> 
>>   public static void main(String[] args) throws Exception {
>>     printZookeeperConfig();
>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>     env.createInput(new HBaseSource()).print();
>>     env.execute("HBase config problem");
>>   }
>> 
>>   public static void printZookeeperConfig() {
>>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>>   }
>> 
>>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>>     @Override
>>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>>       table = createTable();
>>       if (table != null) {
>>         scan = getScanner();
>>       }
>>     }
>> 
>>     private HTable createTable() {
>>       LOG.info("Initializing HBaseConfiguration");
>>       // Uses files found in the classpath
>>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>>       printZookeeperConfig();
>> 
>>       try {
>>         return new HTable(hConf, getTableName());
>>       } catch (Exception e) {
>>         LOG.error("Error instantiating a new HTable instance", e);
>>       }
>>       return null;
>>     }
>> 
>>     @Override
>>     public String getTableName() {
>>       return "bugs:flink";
>>     }
>> 
>>     @Override
>>     protected String mapResultToOutType(Result result) {
>>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>     }
>> 
>>     @Override
>>     protected Scan getScanner() {
>>       return new Scan();
>>     }
>>   }
>> 
>> }
>> 
>> I run this application with this command on my Yarn cluster (note: first starting a yarn-cluster and then submitting the job yields the same result).
>> 
>> flink \
>>     run \
>>     -m yarn-cluster \
>>     --yarncontainer 1 \
>>     --yarnname "Flink on Yarn HBase problem" \
>>     --yarnslots                     1     \
>>     --yarnjobManagerMemory          4000  \
>>     --yarntaskManagerMemory         4000  \
>>     --yarnstreaming                       \
>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>> 
>> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see 
>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>> 2) The zookeeper settings of my experimental environent have been picked up by the software
>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181 <http://node1.kluster.local.nl.bol.com:2181/>,node2.kluster.local.nl.bol.com:2181 <http://node2.kluster.local.nl.bol.com:2181/>,node3.kluster.local.nl.bol.com:2181 <http://node3.kluster.local.nl.bol.com:2181/>
>> 
>> When I open the logfiles on the Hadoop cluster I see this:
>> 
>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = localhost
>> 
>> and as a consequence
>> 
>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181 <http://127.0.0.1:2181/>
>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>> java.net.ConnectException: Connection refused
>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>> 
>> 
>> The value 'localhost:2181' has been defined within the HBase jar in the hbase-default.xml as the default value for the zookeeper nodes.
>> 
>> As a workaround I currently put this extra line in my code which I know is nasty but "works on my cluster"
>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>> 
>> What am I doing wrong?
>> 
>> What is the right way to fix this?
>> 
>> -- 
>> Best regards / Met vriendelijke groeten,
>> 
>> Niels Basjes
>> 
>> 
>> 
>> -- 
>> Best regards / Met vriendelijke groeten,
>> 
>> Niels Basjes
> 
> 
> 
> 
> -- 
> Best regards / Met vriendelijke groeten,
> 
> Niels Basjes


Re: HBase config settings go missing within Yarn.

Posted by Niels Basjes <Ni...@basjes.nl>.
I look at the logfiles from the Hadoop Yarn webinterface. I.e. actually
looking in the jobmanager.log of the container running the Flink task.
That is where I was able to find these messages .

I do the
 hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
in all places directly after the  HBaseConfiguration.create();
That way I simply force the task to look on the actual Hadoop node for the
same file it already loaded locally.

The reason I'm suspecting Flink is because the clientside part of the Flink
application does have the right setting and the task/job actually running
in the cluster does not have the same settings.
So it seems in the transition into the cluster the application does not
copy everything it has available locally for some reason.

There is a very high probability I did something wrong, I'm just not seeing
it at this moment.

Niels



On Fri, Oct 20, 2017 at 2:53 PM, Piotr Nowojski <pi...@data-artisans.com>
wrote:

> Hi,
>
> What do you mean by saying:
>
> When I open the logfiles on the Hadoop cluster I see this:
>
>
> The error doesn’t come from Flink? Where do you execute
>
> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>
> ?
>
> To me it seems like it is a problem with misconfigured HBase and not
> something related to Flink.
>
> Piotrek
>
> On 20 Oct 2017, at 13:44, Niels Basjes <Ni...@basjes.nl> wrote:
>
> To facilitate you guys helping me I put this test project on github:
> https://github.com/nielsbasjes/FlinkHBaseConnectProblem
>
> Niels Basjes
>
> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> Hi,
>>
>> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn
>> cluster where I need to connect to HBase.
>>
>> What I have:
>>
>> In my environment:
>> HADOOP_CONF_DIR=/etc/hadoop/conf/
>> HBASE_CONF_DIR=/etc/hbase/conf/
>> HIVE_CONF_DIR=/etc/hive/conf/
>> YARN_CONF_DIR=/etc/hadoop/conf/
>>
>> In /etc/hbase/conf/hbase-site.xml I have correctly defined the zookeeper
>> hosts for HBase.
>>
>> My test code is this:
>>
>> public class Main {
>>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>>
>>   public static void main(String[] args) throws Exception {
>>     printZookeeperConfig();
>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>>     env.createInput(new HBaseSource()).print();
>>     env.execute("HBase config problem");
>>   }
>>
>>   public static void printZookeeperConfig() {
>>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>>   }
>>
>>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>>     @Override
>>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>>       table = createTable();
>>       if (table != null) {
>>         scan = getScanner();
>>       }
>>     }
>>
>>     private HTable createTable() {
>>       LOG.info("Initializing HBaseConfiguration");
>>       // Uses files found in the classpath
>>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>>       printZookeeperConfig();
>>
>>       try {
>>         return new HTable(hConf, getTableName());
>>       } catch (Exception e) {
>>         LOG.error("Error instantiating a new HTable instance", e);
>>       }
>>       return null;
>>     }
>>
>>     @Override
>>     public String getTableName() {
>>       return "bugs:flink";
>>     }
>>
>>     @Override
>>     protected String mapResultToOutType(Result result) {
>>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>>     }
>>
>>     @Override
>>     protected Scan getScanner() {
>>       return new Scan();
>>     }
>>   }
>>
>> }
>>
>>
>> I run this application with this command on my Yarn cluster (note: first
>> starting a yarn-cluster and then submitting the job yields the same result).
>>
>> flink \
>>     run \
>>     -m yarn-cluster \
>>     --yarncontainer 1 \
>>     --yarnname "Flink on Yarn HBase problem" \
>>     --yarnslots                     1     \
>>     --yarnjobManagerMemory          4000  \
>>     --yarntaskManagerMemory         4000  \
>>     --yarnstreaming                       \
>>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>>
>> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see
>>
>> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>>
>> 2) The zookeeper settings of my experimental environent have been picked up by the software
>>
>> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181
>>
>>
>> When I open the logfiles on the Hadoop cluster I see this:
>>
>> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = *localhost*
>>
>>
>> and as a consequence
>>
>> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181
>>
>> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>>
>> java.net.ConnectException: Connection refused
>>
>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>
>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>>
>> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>
>> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>>
>> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>>
>>
>>
>> The value 'localhost:2181' has been defined within the HBase jar in the
>> hbase-default.xml as the default value for the zookeeper nodes.
>>
>> As a workaround I currently put this extra line in my code which I know
>> is nasty but "works on my cluster"
>>
>> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>>
>>
>> What am I doing wrong?
>>
>> What is the right way to fix this?
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: HBase config settings go missing within Yarn.

Posted by Piotr Nowojski <pi...@data-artisans.com>.
Hi,

What do you mean by saying:

> When I open the logfiles on the Hadoop cluster I see this:


The error doesn’t come from Flink? Where do you execute 

hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));

?

To me it seems like it is a problem with misconfigured HBase and not something related to Flink.

Piotrek

> On 20 Oct 2017, at 13:44, Niels Basjes <Ni...@basjes.nl> wrote:
> 
> To facilitate you guys helping me I put this test project on github:
> https://github.com/nielsbasjes/FlinkHBaseConnectProblem <https://github.com/nielsbasjes/FlinkHBaseConnectProblem>
> 
> Niels Basjes
> 
> On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Niels@basjes.nl <ma...@basjes.nl>> wrote:
> Hi,
> 
> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn cluster where I need to connect to HBase.
> 
> What I have:
> 
> In my environment:
> HADOOP_CONF_DIR=/etc/hadoop/conf/
> HBASE_CONF_DIR=/etc/hbase/conf/
> HIVE_CONF_DIR=/etc/hive/conf/
> YARN_CONF_DIR=/etc/hadoop/conf/
> 
> In /etc/hbase/conf/hbase-site.xml I have correctly defined the zookeeper hosts for HBase.
> 
> My test code is this:
> public class Main {
>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
> 
>   public static void main(String[] args) throws Exception {
>     printZookeeperConfig();
>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>     env.createInput(new HBaseSource()).print();
>     env.execute("HBase config problem");
>   }
> 
>   public static void printZookeeperConfig() {
>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>   }
> 
>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>     @Override
>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>       table = createTable();
>       if (table != null) {
>         scan = getScanner();
>       }
>     }
> 
>     private HTable createTable() {
>       LOG.info("Initializing HBaseConfiguration");
>       // Uses files found in the classpath
>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>       printZookeeperConfig();
> 
>       try {
>         return new HTable(hConf, getTableName());
>       } catch (Exception e) {
>         LOG.error("Error instantiating a new HTable instance", e);
>       }
>       return null;
>     }
> 
>     @Override
>     public String getTableName() {
>       return "bugs:flink";
>     }
> 
>     @Override
>     protected String mapResultToOutType(Result result) {
>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>     }
> 
>     @Override
>     protected Scan getScanner() {
>       return new Scan();
>     }
>   }
> 
> }
> 
> I run this application with this command on my Yarn cluster (note: first starting a yarn-cluster and then submitting the job yields the same result).
> 
> flink \
>     run \
>     -m yarn-cluster \
>     --yarncontainer 1 \
>     --yarnname "Flink on Yarn HBase problem" \
>     --yarnslots                     1     \
>     --yarnjobManagerMemory          4000  \
>     --yarntaskManagerMemory         4000  \
>     --yarnstreaming                       \
>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
> 
> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see 
> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
> 2) The zookeeper settings of my experimental environent have been picked up by the software
> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181 <http://node1.kluster.local.nl.bol.com:2181/>,node2.kluster.local.nl.bol.com:2181 <http://node2.kluster.local.nl.bol.com:2181/>,node3.kluster.local.nl.bol.com:2181 <http://node3.kluster.local.nl.bol.com:2181/>
> 
> When I open the logfiles on the Hadoop cluster I see this:
> 
> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = localhost
> 
> and as a consequence
> 
> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181 <http://127.0.0.1:2181/>
> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> 
> 
> The value 'localhost:2181' has been defined within the HBase jar in the hbase-default.xml as the default value for the zookeeper nodes.
> 
> As a workaround I currently put this extra line in my code which I know is nasty but "works on my cluster"
> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
> 
> What am I doing wrong?
> 
> What is the right way to fix this?
> 
> -- 
> Best regards / Met vriendelijke groeten,
> 
> Niels Basjes
> 
> 
> 
> -- 
> Best regards / Met vriendelijke groeten,
> 
> Niels Basjes


Re: HBase config settings go missing within Yarn.

Posted by Niels Basjes <Ni...@basjes.nl>.
To facilitate you guys helping me I put this test project on github:
https://github.com/nielsbasjes/FlinkHBaseConnectProblem

Niels Basjes

On Fri, Oct 20, 2017 at 1:32 PM, Niels Basjes <Ni...@basjes.nl> wrote:

> Hi,
>
> Ik have a Flink 1.3.2 application that I want to run on a Hadoop yarn
> cluster where I need to connect to HBase.
>
> What I have:
>
> In my environment:
> HADOOP_CONF_DIR=/etc/hadoop/conf/
> HBASE_CONF_DIR=/etc/hbase/conf/
> HIVE_CONF_DIR=/etc/hive/conf/
> YARN_CONF_DIR=/etc/hadoop/conf/
>
> In /etc/hbase/conf/hbase-site.xml I have correctly defined the zookeeper
> hosts for HBase.
>
> My test code is this:
>
> public class Main {
>   private static final Logger LOG = LoggerFactory.getLogger(Main.class);
>
>   public static void main(String[] args) throws Exception {
>     printZookeeperConfig();
>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
>     env.createInput(new HBaseSource()).print();
>     env.execute("HBase config problem");
>   }
>
>   public static void printZookeeperConfig() {
>     String zookeeper = HBaseConfiguration.create().get("hbase.zookeeper.quorum");
>     LOG.info("----> Loading HBaseConfiguration: Zookeeper = {}", zookeeper);
>   }
>
>   public static class HBaseSource extends AbstractTableInputFormat<String> {
>     @Override
>     public void configure(org.apache.flink.configuration.Configuration parameters) {
>       table = createTable();
>       if (table != null) {
>         scan = getScanner();
>       }
>     }
>
>     private HTable createTable() {
>       LOG.info("Initializing HBaseConfiguration");
>       // Uses files found in the classpath
>       org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create();
>       printZookeeperConfig();
>
>       try {
>         return new HTable(hConf, getTableName());
>       } catch (Exception e) {
>         LOG.error("Error instantiating a new HTable instance", e);
>       }
>       return null;
>     }
>
>     @Override
>     public String getTableName() {
>       return "bugs:flink";
>     }
>
>     @Override
>     protected String mapResultToOutType(Result result) {
>       return new String(result.getFamilyMap("v".getBytes(UTF_8)).get("column".getBytes(UTF_8)));
>     }
>
>     @Override
>     protected Scan getScanner() {
>       return new Scan();
>     }
>   }
>
> }
>
>
> I run this application with this command on my Yarn cluster (note: first
> starting a yarn-cluster and then submitting the job yields the same result).
>
> flink \
>     run \
>     -m yarn-cluster \
>     --yarncontainer 1 \
>     --yarnname "Flink on Yarn HBase problem" \
>     --yarnslots                     1     \
>     --yarnjobManagerMemory          4000  \
>     --yarntaskManagerMemory         4000  \
>     --yarnstreaming                       \
>     target/flink-hbase-connect-1.0-SNAPSHOT.jar
>
> Now in the client side logfile /usr/local/flink-1.3.2/log/flink--client-80d2d21b10e0.log I see
>
> 1) Classpath actually contains /etc/hbase/conf/ both near the start and at the end.
>
> 2) The zookeeper settings of my experimental environent have been picked up by the software
>
> 2017-10-20 11:17:23,973 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = node1.kluster.local.nl.bol.com:2181,node2.kluster.local.nl.bol.com:2181,node3.kluster.local.nl.bol.com:2181
>
>
> When I open the logfiles on the Hadoop cluster I see this:
>
> 2017-10-20 13:17:33,250 INFO  com.bol.bugreports.Main                                       - ----> Loading HBaseConfiguration: Zookeeper = *localhost*
>
>
> and as a consequence
>
> 2017-10-20 13:17:33,368 INFO  org.apache.zookeeper.ClientCnxn                               - Opening socket connection to server localhost.localdomain/127.0.0.1:2181
>
> 2017-10-20 13:17:33,369 WARN  org.apache.zookeeper.ClientCnxn                               - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>
> java.net.ConnectException: Connection refused
>
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>
> 2017-10-20 13:17:33,475 WARN  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper        - Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>
>
>
> The value 'localhost:2181' has been defined within the HBase jar in the
> hbase-default.xml as the default value for the zookeeper nodes.
>
> As a workaround I currently put this extra line in my code which I know is
> nasty but "works on my cluster"
>
> hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml"));
>
>
> What am I doing wrong?
>
> What is the right way to fix this?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes