You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Selvaraju Sellamuthu <se...@gmail.com> on 2019/04/17 21:08:27 UTC

Source build of PredictionIO with Hadoop 3.x and Hbase 2.x

Hi Team,

I tried to rebuild PredictionIO package with Spark 2.3.2, Elastic Search
6.x, Hadoop 3.x and HBase 2.x. Seems PredictionIO supports Hadoop 2.x and
Hbase 1.x. I am getting the below errors. Anyone tried upgrading the
dependancies ?

Need guidance to build the package one by one in the PredictionIO source
build.


[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala:166:14:
too many arguments for method add: (x$1:
org.apache.hadoop.hbase.Cell)org.apache.hadoop.hbase.client.Put
[error]       put.add(eBytes, col, Bytes.toBytes(v))
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala:170:14:
too many arguments for method add: (x$1:
org.apache.hadoop.hbase.Cell)org.apache.hadoop.hbase.client.Put
[error]       put.add(eBytes, col, Bytes.toBytes(v))
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBLEvents.scala:45:60:
not found: type HTableInterface
[error]   def getTable(appId: Int, channelId: Option[Int] = None):
HTableInterface =
[error]                                                            ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/StorageClient.scala:29:8:
object HConnection is not a member of package org.apache.hadoop.hbase.client
[error] import org.apache.hadoop.hbase.client.HConnection
[error]        ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/StorageClient.scala:36:19:
not found: type HConnection
[error]   val connection: HConnection,
[error]                   ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:23:8:
object TableInputFormat is not a member of package
org.apache.hadoop.hbase.mapreduce
[error] import org.apache.hadoop.hbase.mapreduce.{TableInputFormat,
TableOutputFormat}
[error]        ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:34:57:
type mismatch;
[error]  found   : String
[error]  required: org.apache.hadoop.hbase.TableName
[error]     if (!client.admin.tableExists(HBEventsUtil.tableName(namespace,
appId, channelId))) {
[error]                                                         ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:63:14:
not found: value TableInputFormat
[error]     conf.set(TableInputFormat.INPUT_TABLE,
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:78:14:
not found: value TableInputFormat
[error]     conf.set(TableInputFormat.SCAN,
PIOHBaseUtil.convertScanToString(scan))
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:81:48:
not found: type TableInputFormat
[error]     val rdd = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
[error]                                                ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:84:55:
type mismatch;
[error]  found   : Any
[error]  required: org.apache.hadoop.hbase.client.Result
[error]         case (key, row) => HBEventsUtil.resultToEvent(row, appId)
[error]                                                       ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:97:14:
not found: value TableOutputFormat
[error]     conf.set(TableOutputFormat.OUTPUT_TABLE,
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:100:15:
not found: type TableOutputFormat
[error]       classOf[TableOutputFormat[Object]],
[error]               ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:119:16:
not found: value TableOutputFormat
[error]       conf.set(TableOutputFormat.OUTPUT_TABLE,
[error]                ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:122:19:
constructor HTable in class HTable cannot be accessed in class HBPEvents
[error]  Access to protected constructor HTable not permitted because
[error]  enclosing class HBPEvents in package hbase is not a subclass of
[error]  class HTable in package client where target is defined
[error]       val table = new HTable(conf, tableName)
[error]                   ^


RegardsS
Selva

Re: Source build of PredictionIO with Hadoop 3.x and Hbase 2.x

Posted by Selvaraju Sellamuthu <se...@gmail.com>.
Hi Pat,

Thank you for your time and your explanation. You are absolutely right.
I am using Hortonworks platform in which Spark and Hbase both built with
Hadoop 3.x, it removed few dependencies from Hadoop 2.7. PIO import is not
working due to the dependency libraries got removed. I tried to add the
Hadoop 2.7 client libraries in PIO import spark submit, it doest work and
raises metadata configuration issues.

PIO import Error:
*Caused by: java.lang.IllegalAccessError: class
org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface
org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator*

To resolve the dependency Added hadoop-hdfs-client-2.9.1.jar in spark
submit.

PIO IMPORT modified :
/usr/hdp/3.1.0.0-78/spark2/bin/spark-submit --class
org.apache.predictionio.tools.imprt.FileToEvents --jars
file:/data/PredictionIO-0.14.0/lib/spark/pio-data-localfs-assembly-0.14.0.jar,file:/data/PredictionIO-0.14.0/lib/spark/pio-data-hdfs-assembly-0.14.0.jar,file:/data/PredictionIO-0.14.0/lib/spark/pio-data-jdbc-assembly-0.14.0.jar,file:/data/PredictionIO-0.14.0/lib/spark/pio-data-elasticsearch-assembly-0.14.0.jar,file:/data/PredictionIO-0.14.0/lib/spark/pio-data-hbase-assembly-0.14.0.jar,file:/data/PredictionIO-0.14.0/lib/spark/pio-data-s3-assembly-0.14.0.jar,file:/data/corelib/hadoop-hdfs-client-2.9.1.jar
--files
file:/data/PredictionIO-0.14.0/conf/log4j.properties,file:/usr/hdp/3.1.0.0-78/hadoop/conf/core-site.xml,file:/usr/hdp/3.1.0.0-78/hbase/conf/hbase-site.xml
--driver-class-path
/data/PredictionIO-0.14.0/conf:/usr/hdp/3.1.0.0-78/hadoop/conf:/usr/hdp/3.1.0.0-78/hbase/conf
--driver-java-options -Dpio.log.dir=/home/bluedata
file:/data/PredictionIO-0.14.0/lib/pio-assembly-0.14.0.jar --appid 3
--input file:/data/SparkTest/productattribute.json --env
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/bluedata/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=<>,PIO_STORAGE_SOURCES_HBASE_HOME=/usr/hdp/3.1.0.0-78/hbase,PIO_HOME=/data/PredictionIO-0.14.0,PIO_FS_ENGINESDIR=/home/bluedata/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/home/bluedata/.pio_store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_FS_TMPDIR=/home/bluedata/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/data/PredictionIO-0.14.0/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs

Error:

[WARN] [Storage$] There is no properly configured data source.

[WARN] [Storage$] There is no properly configured repository.

[ERROR] [Storage$] Required repository (METADATA) configuration is missing.

[ERROR] [Storage$] There were 1 configuration errors. Exiting.


Official Apache component versions for HDP 3.1.0:

   -

   Apache Hadoop 3.1.1
   -

   Apache HBase 2.0.2
   -

   Apache Hive 3.1.0
   -

   Apache Spark 2.3.2

because of the above reason i tried to build from source.


Thanks & Regards
Selva

On Wed, Apr 17, 2019 at 2:40 PM Pat Ferrel <pa...@actionml.com> wrote:

> Upgrading to the latest is usually not needed and can cause problems. Code
> built for Spark 2.1 almost 100% run on Spark 2.4 so why build for 2.4?
>
> The magic combination in the default build is very likely to support your
> newer versions of installed services since they are all backwards
> compatible to some extent. Don’t make trouble for yourself.
>
> Also some things will not work on newer versions. For instance ES 6 made
> significant query changes vs ES 5 and so the UR template does not yet work
> with ES 6.
>
> My advice is always use the default build and plan to run on the latest
> stable services. So install Spark 2.4 but build for whatever is in the
> build.sbt. This is especially true since all templates have a build.sbt
> also and will need to be upgraded as you did for PIO.
>
> Avoid this time sync and use defaults unless absolutely required. But
> deploy newer more stable versions when they don't cross a major version
> number.
>
>
> From: Selvaraju Sellamuthu <se...@gmail.com>
> <se...@gmail.com>
> Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Date: April 17, 2019 at 2:08:45 PM
> To: user@predictionio.apache.org <us...@predictionio.apache.org>
> <us...@predictionio.apache.org>
> Subject:  Source build of PredictionIO with Hadoop 3.x and Hbase 2.x
>
> Hi Team,
>
> I tried to rebuild PredictionIO package with Spark 2.3.2, Elastic Search
> 6.x, Hadoop 3.x and HBase 2.x. Seems PredictionIO supports Hadoop 2.x and
> Hbase 1.x. I am getting the below errors. Anyone tried upgrading the
> dependancies ?
>
> Need guidance to build the package one by one in the PredictionIO source
> build.
>
>
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala:166:14:
> too many arguments for method add: (x$1:
> org.apache.hadoop.hbase.Cell)org.apache.hadoop.hbase.client.Put
> [error]       put.add(eBytes, col, Bytes.toBytes(v))
> [error]              ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala:170:14:
> too many arguments for method add: (x$1:
> org.apache.hadoop.hbase.Cell)org.apache.hadoop.hbase.client.Put
> [error]       put.add(eBytes, col, Bytes.toBytes(v))
> [error]              ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBLEvents.scala:45:60:
> not found: type HTableInterface
> [error]   def getTable(appId: Int, channelId: Option[Int] = None):
> HTableInterface =
> [error]                                                            ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/StorageClient.scala:29:8:
> object HConnection is not a member of package org.apache.hadoop.hbase.client
> [error] import org.apache.hadoop.hbase.client.HConnection
> [error]        ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/StorageClient.scala:36:19:
> not found: type HConnection
> [error]   val connection: HConnection,
> [error]                   ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:23:8:
> object TableInputFormat is not a member of package
> org.apache.hadoop.hbase.mapreduce
> [error] import org.apache.hadoop.hbase.mapreduce.{TableInputFormat,
> TableOutputFormat}
> [error]        ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:34:57:
> type mismatch;
> [error]  found   : String
> [error]  required: org.apache.hadoop.hbase.TableName
> [error]     if
> (!client.admin.tableExists(HBEventsUtil.tableName(namespace, appId,
> channelId))) {
> [error]                                                         ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:63:14:
> not found: value TableInputFormat
> [error]     conf.set(TableInputFormat.INPUT_TABLE,
> [error]              ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:78:14:
> not found: value TableInputFormat
> [error]     conf.set(TableInputFormat.SCAN,
> PIOHBaseUtil.convertScanToString(scan))
> [error]              ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:81:48:
> not found: type TableInputFormat
> [error]     val rdd = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
> [error]                                                ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:84:55:
> type mismatch;
> [error]  found   : Any
> [error]  required: org.apache.hadoop.hbase.client.Result
> [error]         case (key, row) => HBEventsUtil.resultToEvent(row, appId)
> [error]                                                       ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:97:14:
> not found: value TableOutputFormat
> [error]     conf.set(TableOutputFormat.OUTPUT_TABLE,
> [error]              ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:100:15:
> not found: type TableOutputFormat
> [error]       classOf[TableOutputFormat[Object]],
> [error]               ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:119:16:
> not found: value TableOutputFormat
> [error]       conf.set(TableOutputFormat.OUTPUT_TABLE,
> [error]                ^
> [error]
> /Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:122:19:
> constructor HTable in class HTable cannot be accessed in class HBPEvents
> [error]  Access to protected constructor HTable not permitted because
> [error]  enclosing class HBPEvents in package hbase is not a subclass of
> [error]  class HTable in package client where target is defined
> [error]       val table = new HTable(conf, tableName)
> [error]                   ^
>
>
> RegardsS
> Selva
>
>

Re: Source build of PredictionIO with Hadoop 3.x and Hbase 2.x

Posted by Pat Ferrel <pa...@actionml.com>.
Upgrading to the latest is usually not needed and can cause problems. Code
built for Spark 2.1 almost 100% run on Spark 2.4 so why build for 2.4?

The magic combination in the default build is very likely to support your
newer versions of installed services since they are all backwards
compatible to some extent. Don’t make trouble for yourself.

Also some things will not work on newer versions. For instance ES 6 made
significant query changes vs ES 5 and so the UR template does not yet work
with ES 6.

My advice is always use the default build and plan to run on the latest
stable services. So install Spark 2.4 but build for whatever is in the
build.sbt. This is especially true since all templates have a build.sbt
also and will need to be upgraded as you did for PIO.

Avoid this time sync and use defaults unless absolutely required. But
deploy newer more stable versions when they don't cross a major version
number.


From: Selvaraju Sellamuthu <se...@gmail.com>
<se...@gmail.com>
Reply: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Date: April 17, 2019 at 2:08:45 PM
To: user@predictionio.apache.org <us...@predictionio.apache.org>
<us...@predictionio.apache.org>
Subject:  Source build of PredictionIO with Hadoop 3.x and Hbase 2.x

Hi Team,

I tried to rebuild PredictionIO package with Spark 2.3.2, Elastic Search
6.x, Hadoop 3.x and HBase 2.x. Seems PredictionIO supports Hadoop 2.x and
Hbase 1.x. I am getting the below errors. Anyone tried upgrading the
dependancies ?

Need guidance to build the package one by one in the PredictionIO source
build.


[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala:166:14:
too many arguments for method add: (x$1:
org.apache.hadoop.hbase.Cell)org.apache.hadoop.hbase.client.Put
[error]       put.add(eBytes, col, Bytes.toBytes(v))
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala:170:14:
too many arguments for method add: (x$1:
org.apache.hadoop.hbase.Cell)org.apache.hadoop.hbase.client.Put
[error]       put.add(eBytes, col, Bytes.toBytes(v))
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBLEvents.scala:45:60:
not found: type HTableInterface
[error]   def getTable(appId: Int, channelId: Option[Int] = None):
HTableInterface =
[error]                                                            ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/StorageClient.scala:29:8:
object HConnection is not a member of package org.apache.hadoop.hbase.client
[error] import org.apache.hadoop.hbase.client.HConnection
[error]        ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/StorageClient.scala:36:19:
not found: type HConnection
[error]   val connection: HConnection,
[error]                   ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:23:8:
object TableInputFormat is not a member of package
org.apache.hadoop.hbase.mapreduce
[error] import org.apache.hadoop.hbase.mapreduce.{TableInputFormat,
TableOutputFormat}
[error]        ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:34:57:
type mismatch;
[error]  found   : String
[error]  required: org.apache.hadoop.hbase.TableName
[error]     if (!client.admin.tableExists(HBEventsUtil.tableName(namespace,
appId, channelId))) {
[error]                                                         ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:63:14:
not found: value TableInputFormat
[error]     conf.set(TableInputFormat.INPUT_TABLE,
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:78:14:
not found: value TableInputFormat
[error]     conf.set(TableInputFormat.SCAN,
PIOHBaseUtil.convertScanToString(scan))
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:81:48:
not found: type TableInputFormat
[error]     val rdd = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
[error]                                                ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:84:55:
type mismatch;
[error]  found   : Any
[error]  required: org.apache.hadoop.hbase.client.Result
[error]         case (key, row) => HBEventsUtil.resultToEvent(row, appId)
[error]                                                       ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:97:14:
not found: value TableOutputFormat
[error]     conf.set(TableOutputFormat.OUTPUT_TABLE,
[error]              ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:100:15:
not found: type TableOutputFormat
[error]       classOf[TableOutputFormat[Object]],
[error]               ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:119:16:
not found: value TableOutputFormat
[error]       conf.set(TableOutputFormat.OUTPUT_TABLE,
[error]                ^
[error]
/Users/admin/pionew/PredictionIO-0.14.0/storage/hbase/src/main/scala/org/apache/predictionio/data/storage/hbase/HBPEvents.scala:122:19:
constructor HTable in class HTable cannot be accessed in class HBPEvents
[error]  Access to protected constructor HTable not permitted because
[error]  enclosing class HBPEvents in package hbase is not a subclass of
[error]  class HTable in package client where target is defined
[error]       val table = new HTable(conf, tableName)
[error]                   ^


RegardsS
Selva