You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by "Miller, Clifford" <cl...@phoenix-opsgroup.com> on 2018/05/25 16:43:46 UTC

PIO not using HBase cluster

I'm attempting to use a remote cluster with PIO 0.12.1.  When I run
pio-start-all it starts the hbase locally and does not use the remote
cluster as configured.  I've copied the HBase and Hadoop conf files from
the cluster and put them into the locally configured directories.  I set
this up in the past using a similar configuration but was using PIO
0.10.0.  When doing this with this version I could start pio with only the
hbase and hadoop conf present.  This does not seem to be the case any
longer.

If I only put the cluster configs then it complains that it cannot find
start-hbase.sh.  If I put a hbase installation with cluster configs then it
will start a local hbase and not use the remote cluster.

Below is my PIO configuration

########

#!/usr/bin/env bash
#
# Safe config that will work if you expand your cluster later
SPARK_HOME=$PIO_HOME/vendors/spark
ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch
HADOOP_CONF_DIR=$PIO_HOME/vendors/hadoop/conf
HBASE_CONF_DIR==$PIO_HOME/vendors/hbase/conf


# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

# PredictionIO Storage Configuration
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

# Need to use HDFS here instead of LOCALFS to enable deploying to
# machines without the local model
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS

# What store to use for what data
# Elasticsearch Example
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch
# The next line should match the ES cluster.name in ES config
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=dsp_es_cluster

# For clustered Elasticsearch (use one host/port if not clustered)
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=ip-10-0-1-136.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal
#PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,9300,9300
#PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
# PIO 0.12.0+ uses the REST client for ES 5+ and this defaults to
# port 9200, change if appropriate but do not use the Transport Client port
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,9200,9200

PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs
PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://ip-10-0-1-138.us-gov-west-1.compute.internal:8020/models

# HBase Source config
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase

# Hbase clustered config (use one host/port if not clustered)
PIO_STORAGE_SOURCES_HBASE_HOSTS=ip-10-0-1-138.us-gov-west-1.compute.internal,ip-10-0-1-209.us-gov-west-1.compute.internal,ip-10-0-1-79.us-gov-west-1.compute.internal
~

Re: PIO not using HBase cluster

Posted by "Miller, Clifford" <cl...@phoenix-opsgroup.com>.
The problem I was having with HBase was a typo in my configuration.  After
correcting that and running 'pio eventserver &', I was able to submit
events and have them stored into my remote HBase.  I'm having issues with
Spark and will open a separate thread for that.

Thanks for the help.

--Cliff.


On Fri, May 25, 2018 at 7:21 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> How are you starting the EventServer? You should not use pio-start-all
> which assumes all services are local
>
> configurre pio-env.sh with your remote hbase
> start es with `pio eventserver &` or some method where it won’t kill the
> es when you log off like `nohup pio eventserver &`
> this should not start a local hbase so you should have your remote one
> running
> Same for the remote Elasticsearch and HDFS, they should be in pio-env.sh
> and already started
> pio status should be fine with the remote HBase
>
>
> From: Miller, Clifford <cl...@phoenix-opsgroup.com>
> <cl...@phoenix-opsgroup.com>
> Reply: Miller, Clifford <cl...@phoenix-opsgroup.com>
> <cl...@phoenix-opsgroup.com>
> Date: May 25, 2018 at 10:16:01 AM
> To: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
> Cc: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  Re: PIO not using HBase cluster
>
> I'll keep you informed.  However, I'm having issues getting past this.  If
> I have hbase installed with the clusters config files then it still does
> not communicate with the cluster.  It does start hbase but on the local PIO
> server.  If I ONLY have the hbase config (which worked in version 0.10.0)
> then pio-start-all gives the following message.
>
> ####
>  pio-start-all
> Starting Elasticsearch...
> Starting HBase...
> /home/centos/PredictionIO-0.12.1/bin/pio-start-all: line 65:
> /home/centos/PredictionIO-0.12.1/vendors/hbase/bin/start-hbase.sh: No
> such file or directory
> Waiting 10 seconds for Storage Repositories to fully initialize...
> Starting PredictionIO Event Server...
> ########
>
> "pio status" then returns:
>
> ####
>  pio status
> [INFO] [Management$] Inspecting PredictionIO...
> [INFO] [Management$] PredictionIO 0.12.1 is installed at
> /home/centos/PredictionIO-0.12.1
> [INFO] [Management$] Inspecting Apache Spark...
> [INFO] [Management$] Apache Spark is installed at
> /home/centos/PredictionIO-0.12.1/vendors/spark
> [INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum
> requirement of 1.3.0)
> [INFO] [Management$] Inspecting storage backend connections...
> [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> [INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
> [WARN] [DomainSocketFactory] The short-circuit local reads feature cannot
> be used because libhadoop cannot be loaded.
> [INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
> [ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts
> [ERROR] [ZooKeeperWatcher] hconnection-0x558756be, quorum=localhost:2181,
> baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
> [WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper
> [ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble:
> localhost). Please make sure that the configuration is pointing at the
> correct ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so
> if you have not configured HBase to use an external ZooKeeper, that means
> your HBase is not started or configured properly.
> [ERROR] [Storage$] Error initializing storage client for source HBASE.
> org.apache.hadoop.hbase.ZooKeeperConnectionException: Can't connect to
> ZooKeeper
>         at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(
> HBaseAdmin.java:2358)
>         at org.apache.predictionio.data.storage.hbase.StorageClient.<
> init>(StorageClient.scala:53)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.predictionio.data.storage.Storage$.getClient(
> Storage.scala:252)
>         at org.apache.predictionio.data.storage.Storage$.org$apache$
> predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
>         at org.apache.predictionio.data.storage.Storage$$anonfun$
> sourcesToClientMeta$1.apply(Storage.scala:244)
>         at org.apache.predictionio.data.storage.Storage$$anonfun$
> sourcesToClientMeta$1.apply(Storage.scala:244)
>         at scala.collection.mutable.MapLike$class.getOrElseUpdate(
> MapLike.scala:194)
>         at scala.collection.mutable.AbstractMap.getOrElseUpdate(
> Map.scala:80)
>         at org.apache.predictionio.data.storage.Storage$.
> sourcesToClientMeta(Storage.scala:244)
>         at org.apache.predictionio.data.storage.Storage$.
> getDataObject(Storage.scala:315)
>         at org.apache.predictionio.data.storage.Storage$.
> getDataObjectFromRepo(Storage.scala:300)
>         at org.apache.predictionio.data.storage.Storage$.getLEvents(
> Storage.scala:448)
>         at org.apache.predictionio.data.storage.Storage$.
> verifyAllDataObjects(Storage.scala:384)
>         at org.apache.predictionio.tools.commands.Management$.status(
> Management.scala:156)
>         at org.apache.predictionio.tools.console.Pio$.status(Pio.scala:
> 155)
>         at org.apache.predictionio.tools.console.Console$$anonfun$main$
> 1.apply(Console.scala:721)
>         at org.apache.predictionio.tools.console.Console$$anonfun$main$
> 1.apply(Console.scala:656)
>         at scala.Option.map(Option.scala:146)
>         at org.apache.predictionio.tools.console.Console$.main(Console.
> scala:656)
>         at org.apache.predictionio.tools.console.Console.main(Console.
> scala)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase
>         at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:99)
>         at org.apache.zookeeper.KeeperException.create(
> KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
>         at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
>         at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(
> HBaseAdmin.java:2349)
>         ... 23 more
>
>
>
> [ERROR] [Management$] Unable to connect to all storage backends
> successfully.
> The following shows the error message from the storage backend.
>
> Data source HBASE was not properly initialized.
> (org.apache.predictionio.data.storage.StorageClientException)
>
> Dumping configuration of initialized storage backend sources.
> Please make sure they are correct.
>
> Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS ->
> ip-10-0-1-136.us-gov-west-1.compute.internal,ip-10-0-1-
> 126.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,
> TYPE -> elasticsearch, CLUSTERNAME -> dsp_es_cluster, HOME ->
> /home/centos/PredictionIO-0.12.1/vendors/elasticsearch
> Source Name: HBASE; Type: (error); Configuration: (error)
> Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH ->
> hdfs://ip-10-0-1-138.us-gov-west-1.compute.internal:8020/models
>
> ####
>
>
>
> On Fri, May 25, 2018 at 5:07 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
>> No, you need to have HBase installed, or at least the config installed on
>> the PIO machine. The pio-env.sh defined servers will be  configured cluster
>> operations and will be started separately from PIO. PIO then will not start
>> hbase and try to sommunicate only, not start it. But PIO still needs config
>> for the client code that is in the pio assembly jar.
>>
>> Some services were not cleanly separated between client, master, and
>> slave so complete installation is easiest though you can figure out the
>> minimum with experimentation and I think it is just the conf directory.
>>
>> BTW we have a similar setup and are having trouble with the Spark
>> training phase getting a `classDefNotFound: org.apache.hadoop.hbase.ProtobufUtil`
>> so can you let us know how it goes?
>>
>>
>>
>> From: Miller, Clifford <cl...@phoenix-opsgroup.com>
>> <cl...@phoenix-opsgroup.com>
>> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
>> <us...@predictionio.apache.org>
>> Date: May 25, 2018 at 9:43:46 AM
>> To: user@predictionio.apache.org <us...@predictionio.apache.org>
>> <us...@predictionio.apache.org>
>> Subject:  PIO not using HBase cluster
>>
>> I'm attempting to use a remote cluster with PIO 0.12.1.  When I run
>> pio-start-all it starts the hbase locally and does not use the remote
>> cluster as configured.  I've copied the HBase and Hadoop conf files from
>> the cluster and put them into the locally configured directories.  I set
>> this up in the past using a similar configuration but was using PIO
>> 0.10.0.  When doing this with this version I could start pio with only the
>> hbase and hadoop conf present.  This does not seem to be the case any
>> longer.
>>
>> If I only put the cluster configs then it complains that it cannot find
>> start-hbase.sh.  If I put a hbase installation with cluster configs then it
>> will start a local hbase and not use the remote cluster.
>>
>> Below is my PIO configuration
>>
>> ########
>>
>> #!/usr/bin/env bash
>> #
>> # Safe config that will work if you expand your cluster later
>> SPARK_HOME=$PIO_HOME/vendors/spark
>> ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch
>> HADOOP_CONF_DIR=$PIO_HOME/vendors/hadoop/conf
>> HBASE_CONF_DIR==$PIO_HOME/vendors/hbase/conf
>>
>>
>> # Filesystem paths where PredictionIO uses as block storage.
>> PIO_FS_BASEDIR=$HOME/.pio_store
>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>>
>> # PredictionIO Storage Configuration
>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>>
>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>>
>> # Need to use HDFS here instead of LOCALFS to enable deploying to
>> # machines without the local model
>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS
>>
>> # What store to use for what data
>> # Elasticsearch Example
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch
>> # The next line should match the ES cluster.name in ES config
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=dsp_es_cluster
>>
>> # For clustered Elasticsearch (use one host/port if not clustered)
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=ip-10-0-1-136.us-
>> gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.
>> compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal
>> #PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,9300,9300
>> #PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>> # PIO 0.12.0+ uses the REST client for ES 5+ and this defaults to
>> # port 9200, change if appropriate but do not use the Transport Client
>> port
>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,9200,9200
>>
>> PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs
>> PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://ip-10-0-1-138.us-gov-we
>> st-1.compute.internal:8020/models
>>
>> # HBase Source config
>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase
>>
>> # Hbase clustered config (use one host/port if not clustered)
>> PIO_STORAGE_SOURCES_HBASE_HOSTS=ip-10-0-1-138.us-gov-west-1.
>> compute.internal,ip-10-0-1-209.us-gov-west-1.compute.inte
>> rnal,ip-10-0-1-79.us-gov-west-1.compute.internal
>> ~
>>
>>
>
>
>

Re: PIO not using HBase cluster

Posted by Pat Ferrel <pa...@occamsmachete.com>.
How are you starting the EventServer? You should not use pio-start-all
which assumes all services are local

configurre pio-env.sh with your remote hbase
start es with `pio eventserver &` or some method where it won’t kill the es
when you log off like `nohup pio eventserver &`
this should not start a local hbase so you should have your remote one
running
Same for the remote Elasticsearch and HDFS, they should be in pio-env.sh
and already started
pio status should be fine with the remote HBase


From: Miller, Clifford <cl...@phoenix-opsgroup.com>
<cl...@phoenix-opsgroup.com>
Reply: Miller, Clifford <cl...@phoenix-opsgroup.com>
<cl...@phoenix-opsgroup.com>
Date: May 25, 2018 at 10:16:01 AM
To: Pat Ferrel <pa...@occamsmachete.com> <pa...@occamsmachete.com>
Cc: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  Re: PIO not using HBase cluster

I'll keep you informed.  However, I'm having issues getting past this.  If
I have hbase installed with the clusters config files then it still does
not communicate with the cluster.  It does start hbase but on the local PIO
server.  If I ONLY have the hbase config (which worked in version 0.10.0)
then pio-start-all gives the following message.

####
 pio-start-all
Starting Elasticsearch...
Starting HBase...
/home/centos/PredictionIO-0.12.1/bin/pio-start-all: line 65:
/home/centos/PredictionIO-0.12.1/vendors/hbase/bin/start-hbase.sh: No such
file or directory
Waiting 10 seconds for Storage Repositories to fully initialize...
Starting PredictionIO Event Server...
########

"pio status" then returns:

####
 pio status
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.1 is installed at
/home/centos/PredictionIO-0.12.1
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at
/home/centos/PredictionIO-0.12.1/vendors/spark
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement
of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[WARN] [DomainSocketFactory] The short-circuit local reads feature cannot
be used because libhadoop cannot be loaded.
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts
[ERROR] [ZooKeeperWatcher] hconnection-0x558756be, quorum=localhost:2181,
baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
[WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper
[ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble:
localhost). Please make sure that the configuration is pointing at the
correct ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so
if you have not configured HBase to use an external ZooKeeper, that means
your HBase is not started or configured properly.
[ERROR] [Storage$] Error initializing storage client for source HBASE.
org.apache.hadoop.hbase.ZooKeeperConnectionException: Can't connect to
ZooKeeper
        at
org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2358)
        at
org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at
org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
        at
org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
        at
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
        at
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
        at
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at
scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at
org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
        at
org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
        at
org.apache.predictionio.data.storage.Storage$.getDataObjectFromRepo(Storage.scala:300)
        at
org.apache.predictionio.data.storage.Storage$.getLEvents(Storage.scala:448)
        at
org.apache.predictionio.data.storage.Storage$.verifyAllDataObjects(Storage.scala:384)
        at
org.apache.predictionio.tools.commands.Management$.status(Management.scala:156)
        at org.apache.predictionio.tools.console.Pio$.status(Pio.scala:155)
        at
org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:721)
        at
org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:656)
        at scala.Option.map(Option.scala:146)
        at
org.apache.predictionio.tools.console.Console$.main(Console.scala:656)
        at org.apache.predictionio.tools.console.Console.main(Console.scala)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
        at
org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2349)
        ... 23 more



[ERROR] [Management$] Unable to connect to all storage backends
successfully.
The following shows the error message from the storage backend.

Data source HBASE was not properly initialized.
(org.apache.predictionio.data.storage.StorageClientException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS ->
ip-10-0-1-136.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,
TYPE -> elasticsearch, CLUSTERNAME -> dsp_es_cluster, HOME ->
/home/centos/PredictionIO-0.12.1/vendors/elasticsearch
Source Name: HBASE; Type: (error); Configuration: (error)
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH ->
hdfs://ip-10-0-1-138.us-gov-west-1.compute.internal:8020/models

####



On Fri, May 25, 2018 at 5:07 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> No, you need to have HBase installed, or at least the config installed on
> the PIO machine. The pio-env.sh defined servers will be  configured cluster
> operations and will be started separately from PIO. PIO then will not start
> hbase and try to sommunicate only, not start it. But PIO still needs config
> for the client code that is in the pio assembly jar.
>
> Some services were not cleanly separated between client, master, and slave
> so complete installation is easiest though you can figure out the minimum
> with experimentation and I think it is just the conf directory.
>
> BTW we have a similar setup and are having trouble with the Spark training
> phase getting a `classDefNotFound: org.apache.hadoop.hbase.ProtobufUtil`
> so can you let us know how it goes?
>
>
>
> From: Miller, Clifford <cl...@phoenix-opsgroup.com>
> <cl...@phoenix-opsgroup.com>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 25, 2018 at 9:43:46 AM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  PIO not using HBase cluster
>
> I'm attempting to use a remote cluster with PIO 0.12.1.  When I run
> pio-start-all it starts the hbase locally and does not use the remote
> cluster as configured.  I've copied the HBase and Hadoop conf files from
> the cluster and put them into the locally configured directories.  I set
> this up in the past using a similar configuration but was using PIO
> 0.10.0.  When doing this with this version I could start pio with only the
> hbase and hadoop conf present.  This does not seem to be the case any
> longer.
>
> If I only put the cluster configs then it complains that it cannot find
> start-hbase.sh.  If I put a hbase installation with cluster configs then it
> will start a local hbase and not use the remote cluster.
>
> Below is my PIO configuration
>
> ########
>
> #!/usr/bin/env bash
> #
> # Safe config that will work if you expand your cluster later
> SPARK_HOME=$PIO_HOME/vendors/spark
> ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch
> HADOOP_CONF_DIR=$PIO_HOME/vendors/hadoop/conf
> HBASE_CONF_DIR==$PIO_HOME/vendors/hbase/conf
>
>
> # Filesystem paths where PredictionIO uses as block storage.
> PIO_FS_BASEDIR=$HOME/.pio_store
> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>
> # PredictionIO Storage Configuration
> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>
> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>
> # Need to use HDFS here instead of LOCALFS to enable deploying to
> # machines without the local model
> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS
>
> # What store to use for what data
> # Elasticsearch Example
> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch
> # The next line should match the ES cluster.name in ES config
> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=dsp_es_cluster
>
> # For clustered Elasticsearch (use one host/port if not clustered)
> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=ip-10-0-1-
> 136.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-
> west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal
> #PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,9300,9300
> #PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
> # PIO 0.12.0+ uses the REST client for ES 5+ and this defaults to
> # port 9200, change if appropriate but do not use the Transport Client port
> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,9200,9200
>
> PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs
> PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://ip-10-0-1-138.us-gov-
> west-1.compute.internal:8020/models
>
> # HBase Source config
> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase
>
> # Hbase clustered config (use one host/port if not clustered)
> PIO_STORAGE_SOURCES_HBASE_HOSTS=ip-10-0-1-138.us-gov-
> west-1.compute.internal,ip-10-0-1-209.us-gov-west-1.compute.
> internal,ip-10-0-1-79.us-gov-west-1.compute.internal
> ~
>
>


--
Clifford Miller
Mobile | 321.431.9089

Re: PIO not using HBase cluster

Posted by "Miller, Clifford" <cl...@phoenix-opsgroup.com>.
I'll keep you informed.  However, I'm having issues getting past this.  If
I have hbase installed with the clusters config files then it still does
not communicate with the cluster.  It does start hbase but on the local PIO
server.  If I ONLY have the hbase config (which worked in version 0.10.0)
then pio-start-all gives the following message.

####
 pio-start-all
Starting Elasticsearch...
Starting HBase...
/home/centos/PredictionIO-0.12.1/bin/pio-start-all: line 65:
/home/centos/PredictionIO-0.12.1/vendors/hbase/bin/start-hbase.sh: No such
file or directory
Waiting 10 seconds for Storage Repositories to fully initialize...
Starting PredictionIO Event Server...
########

"pio status" then returns:

####
 pio status
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.1 is installed at
/home/centos/PredictionIO-0.12.1
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at
/home/centos/PredictionIO-0.12.1/vendors/spark
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement
of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[WARN] [DomainSocketFactory] The short-circuit local reads feature cannot
be used because libhadoop cannot be loaded.
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts
[ERROR] [ZooKeeperWatcher] hconnection-0x558756be, quorum=localhost:2181,
baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
[WARN] [ZooKeeperRegistry] Can't retrieve clusterId from Zookeeper
[ERROR] [StorageClient] Cannot connect to ZooKeeper (ZooKeeper ensemble:
localhost). Please make sure that the configuration is pointing at the
correct ZooKeeper ensemble. By default, HBase manages its own ZooKeeper, so
if you have not configured HBase to use an external ZooKeeper, that means
your HBase is not started or configured properly.
[ERROR] [Storage$] Error initializing storage client for source HBASE.
org.apache.hadoop.hbase.ZooKeeperConnectionException: Can't connect to
ZooKeeper
        at
org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2358)
        at
org.apache.predictionio.data.storage.hbase.StorageClient.<init>(StorageClient.scala:53)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at
org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:252)
        at
org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:283)
        at
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
        at
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:244)
        at
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194)
        at
scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80)
        at
org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:244)
        at
org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:315)
        at
org.apache.predictionio.data.storage.Storage$.getDataObjectFromRepo(Storage.scala:300)
        at
org.apache.predictionio.data.storage.Storage$.getLEvents(Storage.scala:448)
        at
org.apache.predictionio.data.storage.Storage$.verifyAllDataObjects(Storage.scala:384)
        at
org.apache.predictionio.tools.commands.Management$.status(Management.scala:156)
        at org.apache.predictionio.tools.console.Pio$.status(Pio.scala:155)
        at
org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:721)
        at
org.apache.predictionio.tools.console.Console$$anonfun$main$1.apply(Console.scala:656)
        at scala.Option.map(Option.scala:146)
        at
org.apache.predictionio.tools.console.Console$.main(Console.scala:656)
        at org.apache.predictionio.tools.console.Console.main(Console.scala)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
        at
org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:2349)
        ... 23 more



[ERROR] [Management$] Unable to connect to all storage backends
successfully.
The following shows the error message from the storage backend.

Data source HBASE was not properly initialized.
(org.apache.predictionio.data.storage.StorageClientException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS ->
ip-10-0-1-136.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,
TYPE -> elasticsearch, CLUSTERNAME -> dsp_es_cluster, HOME ->
/home/centos/PredictionIO-0.12.1/vendors/elasticsearch
Source Name: HBASE; Type: (error); Configuration: (error)
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH ->
hdfs://ip-10-0-1-138.us-gov-west-1.compute.internal:8020/models

####



On Fri, May 25, 2018 at 5:07 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> No, you need to have HBase installed, or at least the config installed on
> the PIO machine. The pio-env.sh defined servers will be  configured cluster
> operations and will be started separately from PIO. PIO then will not start
> hbase and try to sommunicate only, not start it. But PIO still needs config
> for the client code that is in the pio assembly jar.
>
> Some services were not cleanly separated between client, master, and slave
> so complete installation is easiest though you can figure out the minimum
> with experimentation and I think it is just the conf directory.
>
> BTW we have a similar setup and are having trouble with the Spark training
> phase getting a `classDefNotFound: org.apache.hadoop.hbase.ProtobufUtil`
> so can you let us know how it goes?
>
>
>
> From: Miller, Clifford <cl...@phoenix-opsgroup.com>
> <cl...@phoenix-opsgroup.com>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: May 25, 2018 at 9:43:46 AM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  PIO not using HBase cluster
>
> I'm attempting to use a remote cluster with PIO 0.12.1.  When I run
> pio-start-all it starts the hbase locally and does not use the remote
> cluster as configured.  I've copied the HBase and Hadoop conf files from
> the cluster and put them into the locally configured directories.  I set
> this up in the past using a similar configuration but was using PIO
> 0.10.0.  When doing this with this version I could start pio with only the
> hbase and hadoop conf present.  This does not seem to be the case any
> longer.
>
> If I only put the cluster configs then it complains that it cannot find
> start-hbase.sh.  If I put a hbase installation with cluster configs then it
> will start a local hbase and not use the remote cluster.
>
> Below is my PIO configuration
>
> ########
>
> #!/usr/bin/env bash
> #
> # Safe config that will work if you expand your cluster later
> SPARK_HOME=$PIO_HOME/vendors/spark
> ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch
> HADOOP_CONF_DIR=$PIO_HOME/vendors/hadoop/conf
> HBASE_CONF_DIR==$PIO_HOME/vendors/hbase/conf
>
>
> # Filesystem paths where PredictionIO uses as block storage.
> PIO_FS_BASEDIR=$HOME/.pio_store
> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>
> # PredictionIO Storage Configuration
> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>
> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>
> # Need to use HDFS here instead of LOCALFS to enable deploying to
> # machines without the local model
> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS
>
> # What store to use for what data
> # Elasticsearch Example
> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch
> # The next line should match the ES cluster.name in ES config
> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=dsp_es_cluster
>
> # For clustered Elasticsearch (use one host/port if not clustered)
> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=ip-10-0-1-
> 136.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-
> west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal
> #PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,9300,9300
> #PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
> # PIO 0.12.0+ uses the REST client for ES 5+ and this defaults to
> # port 9200, change if appropriate but do not use the Transport Client port
> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,9200,9200
>
> PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs
> PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://ip-10-0-1-138.us-gov-
> west-1.compute.internal:8020/models
>
> # HBase Source config
> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase
>
> # Hbase clustered config (use one host/port if not clustered)
> PIO_STORAGE_SOURCES_HBASE_HOSTS=ip-10-0-1-138.us-gov-
> west-1.compute.internal,ip-10-0-1-209.us-gov-west-1.compute.
> internal,ip-10-0-1-79.us-gov-west-1.compute.internal
> ~
>
>


-- 
Clifford Miller
Mobile | 321.431.9089

Re: PIO not using HBase cluster

Posted by Pat Ferrel <pa...@occamsmachete.com>.
No, you need to have HBase installed, or at least the config installed on
the PIO machine. The pio-env.sh defined servers will be  configured cluster
operations and will be started separately from PIO. PIO then will not start
hbase and try to sommunicate only, not start it. But PIO still needs config
for the client code that is in the pio assembly jar.

Some services were not cleanly separated between client, master, and slave
so complete installation is easiest though you can figure out the minimum
with experimentation and I think it is just the conf directory.

BTW we have a similar setup and are having trouble with the Spark training
phase getting a `classDefNotFound: org.apache.hadoop.hbase.ProtobufUtil` so
can you let us know how it goes?



From: Miller, Clifford <cl...@phoenix-opsgroup.com>
<cl...@phoenix-opsgroup.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: May 25, 2018 at 9:43:46 AM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  PIO not using HBase cluster

I'm attempting to use a remote cluster with PIO 0.12.1.  When I run
pio-start-all it starts the hbase locally and does not use the remote
cluster as configured.  I've copied the HBase and Hadoop conf files from
the cluster and put them into the locally configured directories.  I set
this up in the past using a similar configuration but was using PIO
0.10.0.  When doing this with this version I could start pio with only the
hbase and hadoop conf present.  This does not seem to be the case any
longer.

If I only put the cluster configs then it complains that it cannot find
start-hbase.sh.  If I put a hbase installation with cluster configs then it
will start a local hbase and not use the remote cluster.

Below is my PIO configuration

########

#!/usr/bin/env bash
#
# Safe config that will work if you expand your cluster later
SPARK_HOME=$PIO_HOME/vendors/spark
ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch
HADOOP_CONF_DIR=$PIO_HOME/vendors/hadoop/conf
HBASE_CONF_DIR==$PIO_HOME/vendors/hbase/conf


# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

# PredictionIO Storage Configuration
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

# Need to use HDFS here instead of LOCALFS to enable deploying to
# machines without the local model
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS

# What store to use for what data
# Elasticsearch Example
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch
# The next line should match the ES cluster.name in ES config
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=dsp_es_cluster

# For clustered Elasticsearch (use one host/port if not clustered)
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=ip-10-0-1-136.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal,ip-10-0-1-126.us-gov-west-1.compute.internal
#PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,9300,9300
#PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
# PIO 0.12.0+ uses the REST client for ES 5+ and this defaults to
# port 9200, change if appropriate but do not use the Transport Client port
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,9200,9200

PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs
PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://ip-10-0-1-138.us-gov-west-1.compute.internal:8020/models

# HBase Source config
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase

# Hbase clustered config (use one host/port if not clustered)
PIO_STORAGE_SOURCES_HBASE_HOSTS=ip-10-0-1-138.us-gov-west-1.compute.internal,ip-10-0-1-209.us-gov-west-1.compute.internal,ip-10-0-1-79.us-gov-west-1.compute.internal
~