You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by luca paganotti <lu...@gmail.com> on 2019/02/03 19:15:40 UTC

New to hadoop

Hi all, I'm absolutely new to hadoop and trying to learn something about
it. I'm following and reading this book: "Big Data Analytics With Hadoop
3". I'm at the very begining.
I'm able to start and stop dfs and yarn via shell scripts (start/stop
_dfs.sh and start/stop _yarn.sh).
The book takes as a reference
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
and advise to use hadoop-3.1.0 from
http://apache.spinellicreations.com/hadoop/common/hadoop-3.1.0/.
Unfortunately this version is not available anymore and I downloaded 3.2.0.
Now trying to setup correctly YARN Timeline service v2.0 I've managed to
install and start an HBase cluster as suggested downloading Hbase 1.2.10
from http://mirror.cogentco.com/pub/apache/hbase/1.2.10/
HBase is up and running.
The next step should be "Enabling co-processor", these are the substeps
involved

   1. setup a co-processor location in HDFS
      1. hadoop fs -mkdir /hbase/coprocessor
      2. hadoop fs -put
      hadoop-yarn-server-timelineservice-hbase-3.0.0-alpha1-SNAPSHOT.jar/hbase/coprocessor/hadoop-yarn-server-timelineservice.jar

But this command is failing, I'm not able to locate the right jar as in
$HADOOP_HOME/share/hadoop/yarn/timelineservice I find different files:
hadoop-yarn-server-timelineservice-3.2.0.jar
hadoop-yarn-server-timelineservice-hbase-client-3.2.0.jar
hadoop-yarn-server-timelineservice-hbase-common-3.2.0.jar
hadoop-yarn-server-timelineservice-hbase-coprocessor-3.2.0.jar

which one I should use?

The apache hadoop online documention at
https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html
says something similar mentioning the
hadoop-yarn-server-timelineservice-hbase-coprocessor-3.2.0-SNAPSHOT.jar
file, but again I'm not able to find it. Why the SNAPSHOT suffix? It's not
present in my hadoop distribution.

More, I need to correctly setup the HADOOP_CLASSPATH environment variable.
The book tells me to assign to it the path to the lib folder in HBase
distribution with $HADOOP_HOME/sharehadoop/yarn/timelineservice folder.

I'm not sure which is the cause of the error I get issuing this command:

hadoop
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator
-create -skipExistingTable

I get:
$HADOOP_HOME/libexec/hadoop-functions.sh: riga 2364:
HADOOP_ORG.APACHE.HADOOP.YARN.SERVER.TIMELINESERVICE.STORAGE.TIMELINESCHEMACREATOR_USER:
sostituzione errata
$HADOOP_HOME/libexec/hadoop-functions.sh: riga 2459:
HADOOP_ORG.APACHE.HADOOP.YARN.SERVER.TIMELINESERVICE.STORAGE.TIMELINESCHEMACREATOR_OPTS:
sostituzione errata
Errore: impossibile trovare o caricare la classe principale
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator

is it because my classpath is not complete? or is it because I've not the
right jars? or both?

My HADOOP_HOME envaronment variable is set to the folder where I extracted
all the hadoop files.

I'm sorry for this very long text, I'm trying to be as clear as possible
and in the meantime writing in a language I do not know very well.

Thanks for any answer.

-- lp