You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gobblin.apache.org by Rohit Kalhans <ro...@gmail.com> on 2018/02/07 18:57:05 UTC

PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello
I am integrating gobblin in embedded mode with an existing application.
While submitting the job it seems like there is a unresolved
dependency/requirement to mapreduce launcher.

I have checked that  mapreduce.framework.name is set to yarn and the other
yarn application are running fine. Somehow I keep hitting the issue with
the gobblin mr job launcher.
I was hoping that you guys can help me setting up Gobblin in embedded mode
for my application.

Here is the stack. Do let me know if some other info is needed.


Launching Hadoop MR job Gobblin-test9
WARN  [2018-02-07 11:43:22,990]
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:<userName> (auth:SIMPLE) cause:java.io.IOException: Cannot initialize
Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.
INFO  [2018-02-07 11:43:22,991]
org.apache.gobblin.runtime.TaskStateCollectorService: Stopping the
TaskStateCollectorService
INFO  [2018-02-07 11:43:23,033]
org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working
directory /tmp/_test9_1518003781707/test9/job_test9_1518003782322
ERROR [2018-02-07 11:43:23,033]
org.apache.gobblin.runtime.AbstractJobLauncher: Failed to launch and run
job job_test9_1518003782322: java.io.IOException: Cannot initialize
Cluster. Please check your conf
iguration for mapreduce.framework.name and the correspond server addresses.
! java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.
! at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1277)
! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1273)
! at java.security.AccessController.doPrivileged(Native Method)
! at javax.security.auth.Subject.doAs(Subject.java:422)
! at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
! at org.apache.hadoop.mapreduce.Job.connect(Job.java:1272)
! at org.apache.hadoop.mapreduce.Job.submit(Job.java:1301)
! at
org.apache.gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:244)
! at
org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(AbstractJobLauncher.java:596)
! at
org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:443)
! at
org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:159)
! at
org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:147)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.lang.Thread.run(Thread.java:745)

-- 
Cheerio!

*Rohit*

Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Posted by Issac Buenrostro <is...@gmail.com>.
Hi Rohit,

Are the environment variables HADOOP_HOME and HADOOP_CONF_DIR set
correctly? It seems to me the runtime is somehow loading the wrong Hadoop
configurations.

On Wed, Feb 7, 2018 at 10:57 AM Rohit Kalhans <ro...@gmail.com>
wrote:

> Hello
> I am integrating gobblin in embedded mode with an existing application.
> While submitting the job it seems like there is a unresolved
> dependency/requirement to mapreduce launcher.
>
> I have checked that  mapreduce.framework.name is set to yarn and the
> other yarn application are running fine. Somehow I keep hitting the issue
> with the gobblin mr job launcher.
> I was hoping that you guys can help me setting up Gobblin in embedded mode
> for my application.
>
> Here is the stack. Do let me know if some other info is needed.
>
>
> Launching Hadoop MR job Gobblin-test9
> WARN  [2018-02-07 11:43:22,990]
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:<userName> (auth:SIMPLE) cause:java.io.IOException: Cannot initialize
> Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
> INFO  [2018-02-07 11:43:22,991]
> org.apache.gobblin.runtime.TaskStateCollectorService: Stopping the
> TaskStateCollectorService
> INFO  [2018-02-07 11:43:23,033]
> org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working
> directory /tmp/_test9_1518003781707/test9/job_test9_1518003782322
> ERROR [2018-02-07 11:43:23,033]
> org.apache.gobblin.runtime.AbstractJobLauncher: Failed to launch and run
> job job_test9_1518003782322: java.io.IOException: Cannot initialize
> Cluster. Please check your conf
> iguration for mapreduce.framework.name and the correspond server
> addresses.
> ! java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
> ! at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> ! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
> ! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
> ! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1277)
> ! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1273)
> ! at java.security.AccessController.doPrivileged(Native Method)
> ! at javax.security.auth.Subject.doAs(Subject.java:422)
> ! at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
> ! at org.apache.hadoop.mapreduce.Job.connect(Job.java:1272)
> ! at org.apache.hadoop.mapreduce.Job.submit(Job.java:1301)
> ! at
> org.apache.gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:244)
> ! at
> org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(AbstractJobLauncher.java:596)
> ! at
> org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:443)
> ! at
> org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:159)
> ! at
> org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:147)
> ! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ! at java.lang.Thread.run(Thread.java:745)
>
> --
> Cheerio!
>
> *Rohit*
>

Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Posted by Sudarshan Vasudevan <su...@linkedin.com>.
Hi Rohit,
I was able to successfully run the distcp job in my local environment with pretty much the config you provided with only one difference (outputformat set to “txt” instead of “AVRO”). The job config I used is listed below. I ran the distcp job from the command line (instead of the embedded mode that you are using). You can try running your job from command line and see if you can reproduce the issue. Otherwise, the config you have looks good. I have included a snippet of the execution logs for your reference.

Thanks,
Sudarshan

Command-line:

bin/gobblin-mapreduce.sh --conf ~/gobblin/conf/distcp.pull --workdir /tmp --jars lib/reactive-streams-1.0.0.jar

After the run, contents of data.publisher.final.dir:

hadoop fs -ls /tmp/rk_bak

18/02/13 14:44:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

-rw-r--r--   1 <username> supergroup         21 2018-02-13 14:08 /tmp/rk_bak/1.txt

-rw-r--r--   1 <username> supergroup         21 2018-02-13 14:08 /tmp/rk_bak/2.txt


=======My distcp.pull file========

job.name=distcp20

job.group=GobblinDistCp

job.description=Gobblin quick start job for DistCp

job.lock.enabled=false

job.commit.parallelize=true



from=hdfs://localhost:9000/tmp/distcptest

to=hdfs://localhost:9000/tmp/rk_bak



fs.uri=hdfs://localhost:9000



writer.builder.class=org.apache.gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder

writer.destination.type=HDFS

writer.fs.uri=hdfs://localhost:9000

writer.output.dir=/tmp/task-output

writer.output.format=txt

writer.staging.dir=/tmp/task-staging



converter.classes=org.apache.gobblin.converter.IdentityConverter



data.publisher.appendExtractToFinalDir=false

data.publisher.final.dir=${to}

data.publisher.metadata.output.dir=hdfs://localhost:9000/tmp/

data.publisher.type=org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher



distcp.persist.dir=/tmp/distcp-persist-dir

extract.namespace=org.apache.gobblin.copy



gobblin.copy.recursive.delete=true

gobblin.copy.recursive.deleteEmptyDirectories=true

gobblin.copy.recursive.update=true

gobblin.dataset.pattern=${from}

gobblin.dataset.profile.class=org.apache.gobblin.data.management.copy.CopyableGlobDatasetFinder

gobblin.runtime.commit.sequence.store.dir=/tmp/commit-sequence-store

gobblin.template.required_attributes=from,to

gobblin.trash.skip.trash=true

gobblin.workDir=/tmp



qualitychecker.row.err.file=/tmp/err

source.class=org.apache.gobblin.data.management.copy.CopySource

source.filebased.fs.uri=hdfs://localhost:9000



state.store.dir=/tmp/state-store

state.store.enabled=false

state.store.fs.uri=${fs.uri}



task.maxretries=0

task.status.reportintervalinms=5000

taskexecutor.threadpool.size=2

taskretry.threadpool.coresize=1

taskretry.threadpool.maxsize=2



workunit.retry.enabled=false



metrics.log.dir=/tmp/metrics

mr.jars.dir=/Users/suvasude/incubator-gobblin/gobblin-dist/lib

mr.job.root.dir=/tmp/distcp20



job.history.store.enabled=true

job.history.store.jdbc.driver=com.mysql.jdbc.Driver

job.history.store.password=abc123

job.history.store.url=jdbc:mysql://localhost:3306/gobblindb?zeroDateTimeBehavior=convertToNull

job.history.store.user=dbuser

===Logs from my job run:

2018-02-13 14:08:48 PST INFO  [TaskStateCollectorService STOPPING] org.apache.gobblin.runtime.TaskStateCollectorService  198 - Collected task state of 2 completed tasks

2018-02-13 14:08:48 PST INFO  [TaskStateCollectorService STOPPING] org.apache.gobblin.runtime.JobContext  414 - 2 more tasks of job job_distcp20_1518559706480 have completed

2018-02-13 14:08:48 PST INFO  [TaskStateCollectorService STOPPING] org.apache.gobblin.runtime.JobContext  404 - Writing job execution information to the job history store

2018-02-13 14:08:48 PST INFO  [main] org.apache.gobblin.runtime.mapreduce.MRJobLauncher  547 - Deleted working directory /tmp/distcp20/distcp20/job_distcp20_1518559706480

2018-02-13 14:08:48 PST INFO  [main] org.apache.gobblin.runtime.JobContext  457 - Persisting dataset urns.

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.runtime.SafeDatasetCommit  123 - Committing dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest) of job job_distcp20_1518559706480 with commit policy COMMIT_ON_FULL_SUCCESS and state SUCCESSFUL

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.data.management.copy.recovery.RecoveryHelper  146 - No persist directory to clean.

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher  188 - [b8756ce8f6302843697ab0baab0fe2c7af876988] Publishing fileSet from /tmp/task-output/distcp20/job_distcp20_1518559706480/b8756ce8f6302843697ab0baab0fe2c7af876988 for dataset /tmp/distcptest

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher  193 - [b8756ce8f6302843697ab0baab0fe2c7af876988] Found 0 prePublish steps and 0 postPublish steps.

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.util.HadoopUtils  489 - Recursively renaming /tmp/task-output/distcp20/job_distcp20_1518559706480/b8756ce8f6302843697ab0baab0fe2c7af876988 in hdfs://localhost:9000 to /.

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.util.HadoopUtils  514 - Recursive renaming of /tmp/task-output/distcp20/job_distcp20_1518559706480/b8756ce8f6302843697ab0baab0fe2c7af876988 to /. (details: used 3 futures)

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.util.ExecutorsUtils  186 - Attempting to shutdown ExecutorService: org.apache.gobblin.util.executors.ScalingThreadPoolExecutor@671dbfb3[Shutting down, pool size = 2, active threads = 0, queued tasks = 0, completed tasks = 3]

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.util.ExecutorsUtils  205 - Successfully shutdown ExecutorService: org.apache.gobblin.util.executors.ScalingThreadPoolExecutor@671dbfb3[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 3]

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.runtime.SafeDatasetCommit  248 - Submitted 1 lineage events for dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest)

2018-02-13 14:08:48 PST INFO  [Commit-thread-0] org.apache.gobblin.runtime.SafeDatasetCommit  417 - Persisting dataset state for dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest)

2018-02-13 14:08:48 PST INFO  [main] org.apache.gobblin.util.ExecutorsUtils  186 - Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@792e8181

2018-02-13 14:08:48 PST INFO  [main] org.apache.gobblin.util.ExecutorsUtils  205 - Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@792e8181

2018-02-13 14:08:48 PST INFO  [main] org.apache.gobblin.util.JobLauncherUtils  229 - Cleaning up staging directory /tmp/task-staging/distcp20/job_distcp20_1518559706480/task_distcp20_1518559706480_0_0

2018-02-13 14:08:48 PST INFO  [main] org.apache.gobblin.util.JobLauncherUtils  229 - Cleaning up staging directory /tmp/task-staging/distcp20/job_distcp20_1518559706480/task_distcp20_1518559706480_1_0

2018-02-13 14:08:48 PST INFO  [main] org.apache.gobblin.util.ExecutorsUtils  186 - Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@4119346d

2018-02-13 14:08:48 PST INFO  [main] org.apache.gobblin.util.ExecutorsUtils  205 - Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@4119346d

2018-02-13 14:08:48 PST INFO  [main] org.apache.gobblin.runtime.JobContext  404 - Writing job execution information to the job history store

2018-02-13 14:08:49 PST INFO  [main] org.apache.gobblin.util.ExecutorsUtils  186 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@476c137b[Shutting down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]

2018-02-13 14:08:49 PST INFO  [main] org.apache.gobblin.util.ExecutorsUtils  205 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@476c137b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]

2018-02-13 14:08:49 PST INFO  [main] org.apache.gobblin.runtime.app.ServiceBasedAppLauncher  186 - Shutting down the application

2018-02-13 14:08:49 PST INFO  [MetricsReportingService STOPPING] org.apache.gobblin.metrics.GobblinMetrics  436 - Metrics reporting will be stopped: GobblinMetrics org.apache.gobblin.metrics.GobblinMetrics@33bb7c3d

2018-02-13 14:08:49 PST INFO  [MetricsReportingService STOPPING] org.apache.gobblin.metrics.GobblinMetrics  469 - Metrics reporting stopped successfully

2018-02-13 14:08:49 PST WARN  [Thread-5] org.apache.gobblin.runtime.app.ServiceBasedAppLauncher  181 - ApplicationLauncher has already stopped

From: Rohit Kalhans <ro...@gmail.com>
Reply-To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Date: Monday, February 12, 2018 at 12:45 AM
To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello,

We don't have a conf file per say since we are building it on the fly (Since we are using embedded mode).
Here is the final Configuration which is passed to the driver.

{
  "GOBBLIN_WORK_DIR": "/tmp/${USER}/gobblin/work_dir",
  "cleanup.staging.data.per.task": false,
  "converter.classes": "org.apache.gobblin.converter.IdentityConverter",
  "data.publisher.appendExtractToFinalDir": false,
  "data.publisher.final.dir": "${to}",
  "data.publisher.metadata.output.dir": "hdfs://nodenameha/tmp/",
  "data.publisher.type": "org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher",
  "distcp.persist.dir": "/tmp/distcp-persist-dir",
  "extract.namespace": "org.apache.gobblin.copy",
  "from": "hdfs://nodenameha/tmp/distcptest",
  "fs.uri": "hdfs://nodenameha",
  "gobblin.copy.recursive.delete": "true",
  "gobblin.copy.recursive.deleteEmptyDirectories": "true",
  "gobblin.copy.recursive.update": "true",
  "gobblin.dataset.pattern": "${from}",
  "gobblin.dataset.profile.class": "org.apache.gobblin.data.management.copy.CopyableGlobDatasetFinder",
  "gobblin.runtime.commit.sequence.store.dir": "${GOBBLIN_WORK_DIR}/commit-sequence-store",
  "gobblin.template.required_attributes": "from,to",
  "gobblin.trash.skip.trash": true,
  "gobblin.workDir": "${GOBBLIN_WORK_DIR}",
  "job.commit.parallelize": true,
  "job.description": "Some descriprion hh ",
  "job.history.store.enabled": "true",
  "job.history.store.jdbc.driver": "com.mysql.jdbc.Driver",
  "job.history.store.password": "appuser",
  "job.history.store.url": "jdbc:mysql://mysqlserver:3306/gobblindb?zeroDateTimeBehavior=convertToNull",
  "job.history.store.user": "appuser",
  "job.lock.enabled": false,
  "job.name<http://job.name>": "distcp20",
  "metrics.log.dir": "${GOBBLIN_WORK_DIR}/metrics",
  "mr.jars.dir": "/tmp/${USER}/gobblin/_jars",
  "mr.job.root.dir": "/tmp/_distcp20_1518422272543",
  "qualitychecker.row.err.file": "${GOBBLIN_WORK_DIR}/err",
  "source.class": "org.apache.gobblin.data.management.copy.CopySource",
  "source.filebased.fs.uri": "hdfs://nodenameha",
  "state.store.dir": "${GOBBLIN_WORK_DIR}/state-store",
  "state.store.enabled": false,
  "state.store.fs.uri": "${fs.uri}",
  "task.maxretries": 0,
  "task.status.reportintervalinms": 5000,
  "taskexecutor.threadpool.size": 2,
  "taskretry.threadpool.coresize": 1,
  "taskretry.threadpool.maxsize": 2,
  "to": "hdfs://nodenameha/tmp/rk_bak",
  "workunit.retry.enabled": false,
  "writer.builder.class": "org.apache.gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder",
  "writer.destination.type": "HDFS",
  "writer.fs.uri": "hdfs://nodenameha",
  "writer.output.dir": "${GOBBLIN_WORK_DIR}/task-output",
  "writer.output.format": "AVRO",
  "writer.staging.dir": "${GOBBLIN_WORK_DIR}/task-staging"
}

best regards
Rohit.

On Mon, Feb 12, 2018 at 1:09 AM, Sudarshan Vasudevan <su...@linkedin.com>> wrote:
Hi Rohit,
Can you share the job config file for your distcp job?

Thanks,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>>
Reply-To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Date: Sunday, February 11, 2018 at 4:13 AM
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>

Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello Sudarshan, et. al,

Thanks for the help. Based on your response we were able to figure out the problem and were able to move past it after adding lib to the classpath.
Now the yarn job succeeds as per the counter/log as follows.

INFO  [2018-02-11 11:50:57,267] org.apache.gobblin.runtime.TaskStateCollectorService: Starting the TaskStateCollectorService
INFO  [2018-02-11 11:50:57,268] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Launching Hadoop MR job Gobblin-distcp20
WARN  [2018-02-11 11:50:57,607] org.apache.hadoop.mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO  [2018-02-11 11:50:57,734] org.apache.gobblin.runtime.mapreduce.GobblinWorkUnitsInputFormat: Found 1 input files at hdfs://namenodeha/tmp/_distcp20_1518349854235/distcp20/job_distcp20_1518349854763/input: [FileStatus{path=hdfs://namenodeha/tmp/_distcp20_1518349854235/distcp20/job_distcp20_1518349854763/input/task_distcp20_1518349854763_0.wu; isDirectory=false; length=9201; replication=3; blocksize=134217728; modification_time=1518349857234; access_time=1518349857214; owner=applicationetl; group=supergroup; permission=rw-r--r--; isSymlink=false}]
INFO  [2018-02-11 11:50:57,799] org.apache.hadoop.mapreduce.JobSubmitter: number of splits:1
INFO  [2018-02-11 11:50:57,891] org.apache.hadoop.mapreduce.JobSubmitter: Submitting tokens for job: job_1518179003398_40028
INFO  [2018-02-11 11:50:58,130] org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1518179003398_40028
INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: The url to track the job: http://jobtracker.application.example.com:8088/proxy/application_1518179003398_40028/
INFO  [2018-02-11 11:50:58,158] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Waiting for Hadoop MR job job_1518179003398_40028 to complete
INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: Running job: job_1518179003398_40028
INFO  [2018-02-11 11:51:04,362] org.apache.hadoop.mapreduce.Job: Job job_1518179003398_40028 running in uber mode : false
INFO  [2018-02-11 11:51:04,363] org.apache.hadoop.mapreduce.Job:  map 0% reduce 0%
INFO  [2018-02-11 11:51:11,421] org.apache.hadoop.mapreduce.Job:  map 100% reduce 0%
INFO  [2018-02-11 11:51:12,433] org.apache.hadoop.mapreduce.Job: Job job_1518179003398_40028 completed successfully
INFO  [2018-02-11 11:51:12,563] org.apache.hadoop.mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=152940
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=504209
HDFS: Number of bytes written=498190
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=9
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=9704
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4852
Total vcore-seconds taken by all map tasks=4852
Total megabyte-seconds taken by all map tasks=19873792
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=206
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=61
CPU time spent (ms)=6290
Physical memory (bytes) snapshot=515375104
Virtual memory (bytes) snapshot=5540597760<tel:05540%20597%20760>
Total committed heap usage (bytes)=1500512256<tel:01500%20512%20256>
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0


However, it seems that the publisher does not produce any output. I am not able to see any data in the sink folder although the job has successfully completed.

WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher: Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data

Also I can see a warning which points to an issue during mering of meta info.
WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata merger for branch 0 returned null - bug in merger?
INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata output path not set for branch 0, not publishing.
But this seems to be harmless.

INFO  [2018-02-11 11:51:12,659] org.apache.gobblin.runtime.TaskStateCollectorService: Collected task state of 1 completed tasks
INFO  [2018-02-11 11:51:12,660] org.apache.gobblin.runtime.JobContext: 1 more tasks of job job_distcp20_1518349854763 have completed
INFO  [2018-02-11 11:51:12,665] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working directory /tmp/_distcp20_1518349854235/distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,670] org.apache.gobblin.runtime.AbstractJobLauncher: Persisting dataset urns.
INFO  [2018-02-11 11:51:12,680] org.apache.gobblin.runtime.SafeDatasetCommit: Committing dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest) of job job_distcp20_1518349854763 with commit policy COMMIT_ON_FULL_SUCCESS and state SUCCESSFUL
INFO  [2018-02-11 11:51:12,701] org.apache.gobblin.publisher.BaseDataPublisher: Retry disabled for publish.
WARN  [2018-02-11 11:51:12,701] org.apache.gobblin.runtime.SafeDatasetCommit: Gobblin is set up to parallelize publishing, however the publisher org.apache.gobblin.publisher.BaseDataPublisher is not thread-safe. Falling back to serial publishing.
WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher: Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data
INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner: Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@a195448
INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner: Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@a195448
WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata merger for branch 0 returned null - bug in merger?
INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata output path not set for branch 0, not publishing.
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit: Submitted 1 lineage events for dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest)
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit: Persisting dataset state for dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest)
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.util.executors.IteratorExecutor: Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@b4864a4
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.util.executors.IteratorExecutor: Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@b4864a4
INFO  [2018-02-11 11:51:12,738] org.apache.gobblin.runtime.AbstractJobLauncher: Cleaning up staging directory /gobblin/task-staging/distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,743] org.apache.gobblin.runtime.AbstractJobLauncher: Deleting directory /gobblin/task-staging/distcp20
INFO  [2018-02-11 11:51:12,746] org.apache.gobblin.runtime.AbstractJobLauncher: Cleaning up output directory /gobblin/task-output/distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,751] org.apache.gobblin.runtime.AbstractJobLauncher: Deleting directory /gobblin/task-output/distcp20
INFO  [2018-02-11 11:51:12,757] com.example.applications.test.executor.jobs<http://executor.jobs>.testGobblinRunner.distcp20/1: jobCompletion: JobContext{jobName=distcp20, jobId=job_distcp20_1518349854763, jobState={
"job name": "distcp20",
"job id": "job_distcp20_1518349854763",
"job state": "COMMITTED",
"start time": 1518349855793,
"end time": 1518349872716,
"duration": 16923,
"tasks": 1,
"completed tasks": 1,
"task states": [
{
"task id": "task_distcp20_1518349854763_0",
"task state": "COMMITTED",
"start time": 1518349869446,
"end time": 1518349869981,
"duration": 535,
"retry count": 0
}
]
}}

Thanks for all the help.

Best regards
Rohit.


---------- Forwarded message ----------
From: Sudarshan Vasudevan <su...@linkedin.com>>
Date: Thu, Feb 8, 2018 at 3:08 AM
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Hi Rohit,
Your yarn.application.classpath is missing the following:
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*

I think this is a hunch, but the JobClient inside the yarn application is not finding the hadoop-mapreduce-client-jobclient-2.3.0.jar, which has the YarnClientProtocolProvider class and is defaulting to LocalClientProtocolProvider and hence unable to initiate a connection to your YARN cluster. The above jar is typically located under $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*.

Can you add the above to your yarn-site.xml, restart yarn and give it a go?

Thanks,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>>
Reply-To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Date: Wednesday, February 7, 2018 at 1:02 PM
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.


hello  all,

First of all, thanks for the quick rtt. really appreciate the help.

The environment variables have been set correctly(atleast that's what i think. ).  i am running this on a feeder box (gateway) of a cdh 5.7 cluster managed by cloudera manager.

the yarn-site.xml contains the following

  <property>
    <name>yarn.application.classpath</name>
      <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*  </value>
</property>

Before the execution of my application I  call the following.

export HADOOP_PREFIX="/opt/cloudera/parcels/CDH/"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=HADOOP_PREFIX/etc/hadoop/
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_CLIENT_CONF_DIR="/etc/hadoop/conf"
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin
source /etc/hadoop/conf/hadoop-env.sh

The hadoop-env.sh sets a few variables as well.


$>_ cat /etc/hadoop/conf/hadoop-env.sh

# Prepend/Append plugin parcel classpaths

if [ "$HADOOP_USER_CLASSPATH_FIRST" = 'true' ]; then
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH_APPEND}}
  :
else
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH}}
  :
fi
# JAVA_LIBRARY_PATH={{JAVA_LIBRARY_PATH}}

export HADOOP_MAPRED_HOME=$( ([[ ! '/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce' =~ CDH_MR2_HOME ]] && echo /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce ) || echo ${CDH_MR2_HOME:-/usr/lib/hadoop-mapreduce/}  )
export HADOOP_CLIENT_OPTS="-Xmx268435456 $HADOOP_CLIENT_OPTS"
export HADOOP_CLIENT_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS"
export YARN_OPTS="-Xmx825955249 -Djava.net.preferIPv4Stack=true $YARN_OPTS"


On Thu, Feb 8, 2018 at 12:48 AM, Sudarshan Vasudevan <su...@linkedin.com>> wrote:
Hi Rohit,
Can you share the properties in your yarn-site.xml file?

The following is an example config that worked for me:
I set the yarn.application.classpath in yarn-site.xml to the following:
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>

In my local Hadoop installation, I set the HADOOP_* environment variables as follows:
export HADOOP_PREFIX="/usr/local/hadoop-2.3.0"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin


Hope this helps,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>>
Reply-To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Date: Wednesday, February 7, 2018 at 10:57 AM
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Subject: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello
I am integrating gobblin in embedded mode with an existing application.  While submitting the job it seems like there is a unresolved dependency/requirement to mapreduce launcher.

I have checked that  mapreduce.framework.name<http://mapreduce.framework.name> is set to yarn and the other yarn application are running fine. Somehow I keep hitting the issue with the gobblin mr job launcher.
I was hoping that you guys can help me setting up Gobblin in embedded mode for my application.

Here is the stack. Do let me know if some other info is needed.


Launching Hadoop MR job Gobblin-test9
WARN  [2018-02-07 11:43:22,990] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:<userName> (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
INFO  [2018-02-07 11:43:22,991] org.apache.gobblin.runtime.TaskStateCollectorService: Stopping the TaskStateCollectorService
INFO  [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working directory /tmp/_test9_1518003781707/test9/job_test9_1518003782322
ERROR [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.AbstractJobLauncher: Failed to launch and run job job_test9_1518003782322: java.io.IOException: Cannot initialize Cluster. Please check your conf
iguration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! at org.apache.hadoop.mapreduce.Cl<http://org.apache.hadoop.mapreduce.Cl>uster.initialize(Cluster.java:120)
! at org.apache.hadoop.mapreduce.Cl<http://org.apache.hadoop.mapreduce.Cl>uster.<init>(Cluster.java:82)
! at org.apache.hadoop.mapreduce.Cl<http://org.apache.hadoop.mapreduce.Cl>uster.<init>(Cluster.java:75)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b$9.run(Job.java:1277)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b$9.run(Job.java:1273)
! at java.security.AccessController.doPrivileged(Native Method)
! at javax.security.auth.Subject.do<http://javax.security.auth.Subject.do>As(Subject.java:422)
! at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b.connect(Job.java:1272)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b.submit(Job.java:1301)
! at org.apache.gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:244)
! at org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(AbstractJobLauncher.java:596)
! at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:443)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:159)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:147)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.lang.Thread.run(Thread.java:745)

--
Cheerio!

Rohit



--
Cheerio!

Rohit



--
Cheerio!

Rohit



--
Cheerio!

Rohit

Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Posted by Sudarshan Vasudevan <su...@linkedin.com>.
Hi Rohit,

Likely, your code is setting the “data.publisher.type” to “org.apache.gobblin.publisher.BaseDataPublisher” instead of “org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher” as indicated in your configuration. The FileAwareInputStreamDataWriter is not compatible with BaseDataPublisher in the sense that the task output directory that this writer writes to is different from the one that the BaseDataPublisher expects. Hence, the data is not published to the intended directory.



Hope this helps,

Sudarshan

From: Rohit Kalhans <ro...@gmail.com>
Reply-To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Date: Monday, February 12, 2018 at 12:45 AM
To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello,

We don't have a conf file per say since we are building it on the fly (Since we are using embedded mode).
Here is the final Configuration which is passed to the driver.

{
  "GOBBLIN_WORK_DIR": "/tmp/${USER}/gobblin/work_dir",
  "cleanup.staging.data.per.task": false,
  "converter.classes": "org.apache.gobblin.converter.IdentityConverter",
  "data.publisher.appendExtractToFinalDir": false,
  "data.publisher.final.dir": "${to}",
  "data.publisher.metadata.output.dir": "hdfs://nodenameha/tmp/",
  "data.publisher.type": "org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher",
  "distcp.persist.dir": "/tmp/distcp-persist-dir",
  "extract.namespace": "org.apache.gobblin.copy",
  "from": "hdfs://nodenameha/tmp/distcptest",
  "fs.uri": "hdfs://nodenameha",
  "gobblin.copy.recursive.delete": "true",
  "gobblin.copy.recursive.deleteEmptyDirectories": "true",
  "gobblin.copy.recursive.update": "true",
  "gobblin.dataset.pattern": "${from}",
  "gobblin.dataset.profile.class": "org.apache.gobblin.data.management.copy.CopyableGlobDatasetFinder",
  "gobblin.runtime.commit.sequence.store.dir": "${GOBBLIN_WORK_DIR}/commit-sequence-store",
  "gobblin.template.required_attributes": "from,to",
  "gobblin.trash.skip.trash": true,
  "gobblin.workDir": "${GOBBLIN_WORK_DIR}",
  "job.commit.parallelize": true,
  "job.description": "Some descriprion hh ",
  "job.history.store.enabled": "true",
  "job.history.store.jdbc.driver": "com.mysql.jdbc.Driver",
  "job.history.store.password": "appuser",
  "job.history.store.url": "jdbc:mysql://mysqlserver:3306/gobblindb?zeroDateTimeBehavior=convertToNull",
  "job.history.store.user": "appuser",
  "job.lock.enabled": false,
  "job.name<http://job.name>": "distcp20",
  "metrics.log.dir": "${GOBBLIN_WORK_DIR}/metrics",
  "mr.jars.dir": "/tmp/${USER}/gobblin/_jars",
  "mr.job.root.dir": "/tmp/_distcp20_1518422272543",
  "qualitychecker.row.err.file": "${GOBBLIN_WORK_DIR}/err",
  "source.class": "org.apache.gobblin.data.management.copy.CopySource",
  "source.filebased.fs.uri": "hdfs://nodenameha",
  "state.store.dir": "${GOBBLIN_WORK_DIR}/state-store",
  "state.store.enabled": false,
  "state.store.fs.uri": "${fs.uri}",
  "task.maxretries": 0,
  "task.status.reportintervalinms": 5000,
  "taskexecutor.threadpool.size": 2,
  "taskretry.threadpool.coresize": 1,
  "taskretry.threadpool.maxsize": 2,
  "to": "hdfs://nodenameha/tmp/rk_bak",
  "workunit.retry.enabled": false,
  "writer.builder.class": "org.apache.gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder",
  "writer.destination.type": "HDFS",
  "writer.fs.uri": "hdfs://nodenameha",
  "writer.output.dir": "${GOBBLIN_WORK_DIR}/task-output",
  "writer.output.format": "AVRO",
  "writer.staging.dir": "${GOBBLIN_WORK_DIR}/task-staging"
}

best regards
Rohit.

On Mon, Feb 12, 2018 at 1:09 AM, Sudarshan Vasudevan <su...@linkedin.com>> wrote:
Hi Rohit,
Can you share the job config file for your distcp job?

Thanks,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>>
Reply-To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Date: Sunday, February 11, 2018 at 4:13 AM
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>

Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello Sudarshan, et. al,

Thanks for the help. Based on your response we were able to figure out the problem and were able to move past it after adding lib to the classpath.
Now the yarn job succeeds as per the counter/log as follows.

INFO  [2018-02-11 11:50:57,267] org.apache.gobblin.runtime.TaskStateCollectorService: Starting the TaskStateCollectorService
INFO  [2018-02-11 11:50:57,268] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Launching Hadoop MR job Gobblin-distcp20
WARN  [2018-02-11 11:50:57,607] org.apache.hadoop.mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO  [2018-02-11 11:50:57,734] org.apache.gobblin.runtime.mapreduce.GobblinWorkUnitsInputFormat: Found 1 input files at hdfs://namenodeha/tmp/_distcp20_1518349854235/distcp20/job_distcp20_1518349854763/input: [FileStatus{path=hdfs://namenodeha/tmp/_distcp20_1518349854235/distcp20/job_distcp20_1518349854763/input/task_distcp20_1518349854763_0.wu; isDirectory=false; length=9201; replication=3; blocksize=134217728; modification_time=1518349857234; access_time=1518349857214; owner=applicationetl; group=supergroup; permission=rw-r--r--; isSymlink=false}]
INFO  [2018-02-11 11:50:57,799] org.apache.hadoop.mapreduce.JobSubmitter: number of splits:1
INFO  [2018-02-11 11:50:57,891] org.apache.hadoop.mapreduce.JobSubmitter: Submitting tokens for job: job_1518179003398_40028
INFO  [2018-02-11 11:50:58,130] org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1518179003398_40028
INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: The url to track the job: http://jobtracker.application.example.com:8088/proxy/application_1518179003398_40028/
INFO  [2018-02-11 11:50:58,158] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Waiting for Hadoop MR job job_1518179003398_40028 to complete
INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: Running job: job_1518179003398_40028
INFO  [2018-02-11 11:51:04,362] org.apache.hadoop.mapreduce.Job: Job job_1518179003398_40028 running in uber mode : false
INFO  [2018-02-11 11:51:04,363] org.apache.hadoop.mapreduce.Job:  map 0% reduce 0%
INFO  [2018-02-11 11:51:11,421] org.apache.hadoop.mapreduce.Job:  map 100% reduce 0%
INFO  [2018-02-11 11:51:12,433] org.apache.hadoop.mapreduce.Job: Job job_1518179003398_40028 completed successfully
INFO  [2018-02-11 11:51:12,563] org.apache.hadoop.mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=152940
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=504209
HDFS: Number of bytes written=498190
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=9
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=9704
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4852
Total vcore-seconds taken by all map tasks=4852
Total megabyte-seconds taken by all map tasks=19873792
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=206
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=61
CPU time spent (ms)=6290
Physical memory (bytes) snapshot=515375104
Virtual memory (bytes) snapshot=5540597760<tel:05540%20597%20760>
Total committed heap usage (bytes)=1500512256<tel:01500%20512%20256>
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0


However, it seems that the publisher does not produce any output. I am not able to see any data in the sink folder although the job has successfully completed.

WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher: Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data

Also I can see a warning which points to an issue during mering of meta info.
WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata merger for branch 0 returned null - bug in merger?
INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata output path not set for branch 0, not publishing.
But this seems to be harmless.

INFO  [2018-02-11 11:51:12,659] org.apache.gobblin.runtime.TaskStateCollectorService: Collected task state of 1 completed tasks
INFO  [2018-02-11 11:51:12,660] org.apache.gobblin.runtime.JobContext: 1 more tasks of job job_distcp20_1518349854763 have completed
INFO  [2018-02-11 11:51:12,665] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working directory /tmp/_distcp20_1518349854235/distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,670] org.apache.gobblin.runtime.AbstractJobLauncher: Persisting dataset urns.
INFO  [2018-02-11 11:51:12,680] org.apache.gobblin.runtime.SafeDatasetCommit: Committing dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest) of job job_distcp20_1518349854763 with commit policy COMMIT_ON_FULL_SUCCESS and state SUCCESSFUL
INFO  [2018-02-11 11:51:12,701] org.apache.gobblin.publisher.BaseDataPublisher: Retry disabled for publish.
WARN  [2018-02-11 11:51:12,701] org.apache.gobblin.runtime.SafeDatasetCommit: Gobblin is set up to parallelize publishing, however the publisher org.apache.gobblin.publisher.BaseDataPublisher is not thread-safe. Falling back to serial publishing.
WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher: Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data
INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner: Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@a195448
INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner: Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@a195448
WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata merger for branch 0 returned null - bug in merger?
INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata output path not set for branch 0, not publishing.
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit: Submitted 1 lineage events for dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest)
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit: Persisting dataset state for dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest)
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.util.executors.IteratorExecutor: Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@b4864a4
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.util.executors.IteratorExecutor: Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@b4864a4
INFO  [2018-02-11 11:51:12,738] org.apache.gobblin.runtime.AbstractJobLauncher: Cleaning up staging directory /gobblin/task-staging/distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,743] org.apache.gobblin.runtime.AbstractJobLauncher: Deleting directory /gobblin/task-staging/distcp20
INFO  [2018-02-11 11:51:12,746] org.apache.gobblin.runtime.AbstractJobLauncher: Cleaning up output directory /gobblin/task-output/distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,751] org.apache.gobblin.runtime.AbstractJobLauncher: Deleting directory /gobblin/task-output/distcp20
INFO  [2018-02-11 11:51:12,757] com.example.applications.test.executor.jobs<http://executor.jobs>.testGobblinRunner.distcp20/1: jobCompletion: JobContext{jobName=distcp20, jobId=job_distcp20_1518349854763, jobState={
"job name": "distcp20",
"job id": "job_distcp20_1518349854763",
"job state": "COMMITTED",
"start time": 1518349855793,
"end time": 1518349872716,
"duration": 16923,
"tasks": 1,
"completed tasks": 1,
"task states": [
{
"task id": "task_distcp20_1518349854763_0",
"task state": "COMMITTED",
"start time": 1518349869446,
"end time": 1518349869981,
"duration": 535,
"retry count": 0
}
]
}}

Thanks for all the help.

Best regards
Rohit.


---------- Forwarded message ----------
From: Sudarshan Vasudevan <su...@linkedin.com>>
Date: Thu, Feb 8, 2018 at 3:08 AM
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Hi Rohit,
Your yarn.application.classpath is missing the following:
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*

I think this is a hunch, but the JobClient inside the yarn application is not finding the hadoop-mapreduce-client-jobclient-2.3.0.jar, which has the YarnClientProtocolProvider class and is defaulting to LocalClientProtocolProvider and hence unable to initiate a connection to your YARN cluster. The above jar is typically located under $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*.

Can you add the above to your yarn-site.xml, restart yarn and give it a go?

Thanks,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>>
Reply-To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Date: Wednesday, February 7, 2018 at 1:02 PM
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.


hello  all,

First of all, thanks for the quick rtt. really appreciate the help.

The environment variables have been set correctly(atleast that's what i think. ).  i am running this on a feeder box (gateway) of a cdh 5.7 cluster managed by cloudera manager.

the yarn-site.xml contains the following

  <property>
    <name>yarn.application.classpath</name>
      <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*  </value>
</property>

Before the execution of my application I  call the following.

export HADOOP_PREFIX="/opt/cloudera/parcels/CDH/"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=HADOOP_PREFIX/etc/hadoop/
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_CLIENT_CONF_DIR="/etc/hadoop/conf"
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin
source /etc/hadoop/conf/hadoop-env.sh

The hadoop-env.sh sets a few variables as well.


$>_ cat /etc/hadoop/conf/hadoop-env.sh

# Prepend/Append plugin parcel classpaths

if [ "$HADOOP_USER_CLASSPATH_FIRST" = 'true' ]; then
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH_APPEND}}
  :
else
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH}}
  :
fi
# JAVA_LIBRARY_PATH={{JAVA_LIBRARY_PATH}}

export HADOOP_MAPRED_HOME=$( ([[ ! '/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce' =~ CDH_MR2_HOME ]] && echo /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce ) || echo ${CDH_MR2_HOME:-/usr/lib/hadoop-mapreduce/}  )
export HADOOP_CLIENT_OPTS="-Xmx268435456 $HADOOP_CLIENT_OPTS"
export HADOOP_CLIENT_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS"
export YARN_OPTS="-Xmx825955249 -Djava.net.preferIPv4Stack=true $YARN_OPTS"


On Thu, Feb 8, 2018 at 12:48 AM, Sudarshan Vasudevan <su...@linkedin.com>> wrote:
Hi Rohit,
Can you share the properties in your yarn-site.xml file?

The following is an example config that worked for me:
I set the yarn.application.classpath in yarn-site.xml to the following:
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>

In my local Hadoop installation, I set the HADOOP_* environment variables as follows:
export HADOOP_PREFIX="/usr/local/hadoop-2.3.0"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin


Hope this helps,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>>
Reply-To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Date: Wednesday, February 7, 2018 at 10:57 AM
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Subject: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello
I am integrating gobblin in embedded mode with an existing application.  While submitting the job it seems like there is a unresolved dependency/requirement to mapreduce launcher.

I have checked that  mapreduce.framework.name<http://mapreduce.framework.name> is set to yarn and the other yarn application are running fine. Somehow I keep hitting the issue with the gobblin mr job launcher.
I was hoping that you guys can help me setting up Gobblin in embedded mode for my application.

Here is the stack. Do let me know if some other info is needed.


Launching Hadoop MR job Gobblin-test9
WARN  [2018-02-07 11:43:22,990] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:<userName> (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
INFO  [2018-02-07 11:43:22,991] org.apache.gobblin.runtime.TaskStateCollectorService: Stopping the TaskStateCollectorService
INFO  [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working directory /tmp/_test9_1518003781707/test9/job_test9_1518003782322
ERROR [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.AbstractJobLauncher: Failed to launch and run job job_test9_1518003782322: java.io.IOException: Cannot initialize Cluster. Please check your conf
iguration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! at org.apache.hadoop.mapreduce.Cl<http://org.apache.hadoop.mapreduce.Cl>uster.initialize(Cluster.java:120)
! at org.apache.hadoop.mapreduce.Cl<http://org.apache.hadoop.mapreduce.Cl>uster.<init>(Cluster.java:82)
! at org.apache.hadoop.mapreduce.Cl<http://org.apache.hadoop.mapreduce.Cl>uster.<init>(Cluster.java:75)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b$9.run(Job.java:1277)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b$9.run(Job.java:1273)
! at java.security.AccessController.doPrivileged(Native Method)
! at javax.security.auth.Subject.do<http://javax.security.auth.Subject.do>As(Subject.java:422)
! at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b.connect(Job.java:1272)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b.submit(Job.java:1301)
! at org.apache.gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:244)
! at org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(AbstractJobLauncher.java:596)
! at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:443)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:159)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:147)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.lang.Thread.run(Thread.java:745)

--
Cheerio!

Rohit



--
Cheerio!

Rohit



--
Cheerio!

Rohit



--
Cheerio!

Rohit

Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Posted by Rohit Kalhans <ro...@gmail.com>.
Hello,

We don't have a conf file per say since we are building it on the fly
(Since we are using embedded mode).
Here is the final Configuration which is passed to the driver.

{
  "GOBBLIN_WORK_DIR": "/tmp/${USER}/gobblin/work_dir",
  "cleanup.staging.data.per.task": false,
  "converter.classes": "org.apache.gobblin.converter.IdentityConverter",
  "data.publisher.appendExtractToFinalDir": false,
  "data.publisher.final.dir": "${to}",
  "data.publisher.metadata.output.dir": "hdfs://nodenameha/tmp/",
  "data.publisher.type":
"org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher",
  "distcp.persist.dir": "/tmp/distcp-persist-dir",
  "extract.namespace": "org.apache.gobblin.copy",
  "from": "hdfs://nodenameha/tmp/distcptest",
  "fs.uri": "hdfs://nodenameha",
  "gobblin.copy.recursive.delete": "true",
  "gobblin.copy.recursive.deleteEmptyDirectories": "true",
  "gobblin.copy.recursive.update": "true",
  "gobblin.dataset.pattern": "${from}",
  "gobblin.dataset.profile.class":
"org.apache.gobblin.data.management.copy.CopyableGlobDatasetFinder",
  "gobblin.runtime.commit.sequence.store.dir":
"${GOBBLIN_WORK_DIR}/commit-sequence-store",
  "gobblin.template.required_attributes": "from,to",
  "gobblin.trash.skip.trash": true,
  "gobblin.workDir": "${GOBBLIN_WORK_DIR}",
  "job.commit.parallelize": true,
  "job.description": "Some descriprion hh ",
  "job.history.store.enabled": "true",
  "job.history.store.jdbc.driver": "com.mysql.jdbc.Driver",
  "job.history.store.password": "appuser",
  "job.history.store.url":
"jdbc:mysql://mysqlserver:3306/gobblindb?zeroDateTimeBehavior=convertToNull",
  "job.history.store.user": "appuser",
  "job.lock.enabled": false,
  "job.name": "distcp20",
  "metrics.log.dir": "${GOBBLIN_WORK_DIR}/metrics",
  "mr.jars.dir": "/tmp/${USER}/gobblin/_jars",
  "mr.job.root.dir": "/tmp/_distcp20_1518422272543",
  "qualitychecker.row.err.file": "${GOBBLIN_WORK_DIR}/err",
  "source.class": "org.apache.gobblin.data.management.copy.CopySource",
  "source.filebased.fs.uri": "hdfs://nodenameha",
  "state.store.dir": "${GOBBLIN_WORK_DIR}/state-store",
  "state.store.enabled": false,
  "state.store.fs.uri": "${fs.uri}",
  "task.maxretries": 0,
  "task.status.reportintervalinms": 5000,
  "taskexecutor.threadpool.size": 2,
  "taskretry.threadpool.coresize": 1,
  "taskretry.threadpool.maxsize": 2,
  "to": "hdfs://nodenameha/tmp/rk_bak",
  "workunit.retry.enabled": false,
  "writer.builder.class":
"org.apache.gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder",
  "writer.destination.type": "HDFS",
  "writer.fs.uri": "hdfs://nodenameha",
  "writer.output.dir": "${GOBBLIN_WORK_DIR}/task-output",
  "writer.output.format": "AVRO",
  "writer.staging.dir": "${GOBBLIN_WORK_DIR}/task-staging"
}

best regards
Rohit.

On Mon, Feb 12, 2018 at 1:09 AM, Sudarshan Vasudevan <
suvasudevan@linkedin.com> wrote:

> Hi Rohit,
>
> Can you share the job config file for your distcp job?
>
>
>
> Thanks,
>
> Sudarshan
>
>
>
> *From: *Rohit Kalhans <ro...@gmail.com>
> *Reply-To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.
> apache.org>
> *Date: *Sunday, February 11, 2018 at 4:13 AM
> *To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.
> apache.org>
>
> *Subject: *Re: PriviledgedActionException while submitting a gobblin job
> to mapreduce.
>
>
>
> Hello Sudarshan, et. al,
>
>
>
> Thanks for the help. Based on your response we were able to figure out the
> problem and were able to move past it after adding lib to the classpath.
>
> Now the yarn job succeeds as per the counter/log as follows.
>
>
>
> INFO  [2018-02-11 11:50:57,267] org.apache.gobblin.runtime.TaskStateCollectorService:
> Starting the TaskStateCollectorService
>
> INFO  [2018-02-11 11:50:57,268] org.apache.gobblin.runtime.mapreduce.MRJobLauncher:
> Launching Hadoop MR job Gobblin-distcp20
>
> WARN  [2018-02-11 11:50:57,607] org.apache.hadoop.mapreduce.JobResourceUploader:
> Hadoop command-line option parsing not performed. Implement the Tool
> interface and execute your application with ToolRunner to remedy this.
>
> INFO  [2018-02-11 11:50:57,734] org.apache.gobblin.runtime.mapreduce.GobblinWorkUnitsInputFormat:
> Found 1 input files at hdfs://namenodeha/tmp/_distcp20_1518349854235/
> distcp20/job_distcp20_1518349854763/input: [FileStatus{path=hdfs://
> namenodeha/tmp/_distcp20_1518349854235/distcp20/job_
> distcp20_1518349854763/input/task_distcp20_1518349854763_0.wu;
> isDirectory=false; length=9201; replication=3; blocksize=134217728;
> modification_time=1518349857234; access_time=1518349857214;
> owner=applicationetl; group=supergroup; permission=rw-r--r--;
> isSymlink=false}]
>
> INFO  [2018-02-11 11:50:57,799] org.apache.hadoop.mapreduce.JobSubmitter:
> number of splits:1
>
> INFO  [2018-02-11 11:50:57,891] org.apache.hadoop.mapreduce.JobSubmitter:
> Submitting tokens for job: job_1518179003398_40028
>
> INFO  [2018-02-11 11:50:58,130] org.apache.hadoop.yarn.client.api.impl.YarnClientImpl:
> Submitted application application_1518179003398_40028
>
> INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: The url
> to track the job: http://jobtracker.application.example.com:8088/proxy/
> application_1518179003398_40028/
>
> INFO  [2018-02-11 11:50:58,158] org.apache.gobblin.runtime.mapreduce.MRJobLauncher:
> Waiting for Hadoop MR job job_1518179003398_40028 to complete
>
> INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: Running
> job: job_1518179003398_40028
>
> INFO  [2018-02-11 11:51:04,362] org.apache.hadoop.mapreduce.Job: Job
> job_1518179003398_40028 running in uber mode : false
>
> INFO  [2018-02-11 11:51:04,363] org.apache.hadoop.mapreduce.Job:  map 0%
> reduce 0%
>
> INFO  [2018-02-11 11:51:11,421] org.apache.hadoop.mapreduce.Job:  map
> 100% reduce 0%
>
> INFO  [2018-02-11 11:51:12,433] org.apache.hadoop.mapreduce.Job: Job
> job_1518179003398_40028 completed successfully
>
> INFO  [2018-02-11 11:51:12,563] org.apache.hadoop.mapreduce.Job:
> Counters: 30
>
> File System Counters
>
> FILE: Number of bytes read=0
>
> FILE: Number of bytes written=152940
>
> FILE: Number of read operations=0
>
> FILE: Number of large read operations=0
>
> FILE: Number of write operations=0
>
> HDFS: Number of bytes read=504209
>
> HDFS: Number of bytes written=498190
>
> HDFS: Number of read operations=15
>
> HDFS: Number of large read operations=0
>
> HDFS: Number of write operations=9
>
> Job Counters
>
> Launched map tasks=1
>
> Other local map tasks=1
>
> Total time spent by all maps in occupied slots (ms)=9704
>
> Total time spent by all reduces in occupied slots (ms)=0
>
> Total time spent by all map tasks (ms)=4852
>
> Total vcore-seconds taken by all map tasks=4852
>
> Total megabyte-seconds taken by all map tasks=19873792
>
> Map-Reduce Framework
>
> Map input records=1
>
> Map output records=0
>
> Input split bytes=206
>
> Spilled Records=0
>
> Failed Shuffles=0
>
> Merged Map outputs=0
>
> GC time elapsed (ms)=61
>
> CPU time spent (ms)=6290
>
> Physical memory (bytes) snapshot=515375104
>
> Virtual memory (bytes) snapshot=5540597760 <05540%20597%20760>
>
> Total committed heap usage (bytes)=1500512256 <01500%20512%20256>
>
> File Input Format Counters
>
> Bytes Read=0
>
> File Output Format Counters
>
> Bytes Written=0
>
>
>
>
>
> However, it seems that the publisher does not produce any output. I am not
> able to see any data in the sink folder although the job has successfully
> completed.
>
>
>
> WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher:
> Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data
>
>
>
> Also I can see a warning which points to an issue during mering of meta
> info.
>
> WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher:
> Metadata merger for branch 0 returned null - bug in merger?
>
> INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher:
> Metadata output path not set for branch 0, not publishing.
>
> But this seems to be harmless.
>
>
>
> INFO  [2018-02-11 11:51:12,659] org.apache.gobblin.runtime.TaskStateCollectorService:
> Collected task state of 1 completed tasks
>
> INFO  [2018-02-11 11:51:12,660] org.apache.gobblin.runtime.JobContext: 1
> more tasks of job job_distcp20_1518349854763 have completed
>
> INFO  [2018-02-11 11:51:12,665] org.apache.gobblin.runtime.mapreduce.MRJobLauncher:
> Deleted working directory /tmp/_distcp20_1518349854235/
> distcp20/job_distcp20_1518349854763
>
> INFO  [2018-02-11 11:51:12,670] org.apache.gobblin.runtime.AbstractJobLauncher:
> Persisting dataset urns.
>
> INFO  [2018-02-11 11:51:12,680] org.apache.gobblin.runtime.SafeDatasetCommit:
> Committing dataset CopyEntity.DatasetAndPartition(dataset=
> CopyableDatasetMetadata(datasetURN=/tmp/distcptest),
> partition=/tmp/distcptest) of job job_distcp20_1518349854763 with commit
> policy COMMIT_ON_FULL_SUCCESS and state SUCCESSFUL
>
> INFO  [2018-02-11 11:51:12,701] org.apache.gobblin.publisher.BaseDataPublisher:
> Retry disabled for publish.
>
> WARN  [2018-02-11 11:51:12,701] org.apache.gobblin.runtime.SafeDatasetCommit:
> Gobblin is set up to parallelize publishing, however the publisher
> org.apache.gobblin.publisher.BaseDataPublisher is not thread-safe.
> Falling back to serial publishing.
>
> WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher:
> Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data
>
> INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner:
> Attempting to shutdown ExecutorService: com.google.common.util.
> concurrent.MoreExecutors$ListeningDecorator@a195448
>
> INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner:
> Successfully shutdown ExecutorService: com.google.common.util.
> concurrent.MoreExecutors$ListeningDecorator@a195448
>
> WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher:
> Metadata merger for branch 0 returned null - bug in merger?
>
> INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher:
> Metadata output path not set for branch 0, not publishing.
>
> INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit:
> Submitted 1 lineage events for dataset CopyEntity.
> DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest),
> partition=/tmp/distcptest)
>
> INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit:
> Persisting dataset state for dataset CopyEntity.
> DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest),
> partition=/tmp/distcptest)
>
> INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.util.executors.IteratorExecutor:
> Attempting to shutdown ExecutorService: com.google.common.util.
> concurrent.MoreExecutors$ListeningDecorator@b4864a4
>
> INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.util.executors.IteratorExecutor:
> Successfully shutdown ExecutorService: com.google.common.util.
> concurrent.MoreExecutors$ListeningDecorator@b4864a4
>
> INFO  [2018-02-11 11:51:12,738] org.apache.gobblin.runtime.AbstractJobLauncher:
> Cleaning up staging directory /gobblin/task-staging/distcp20/job_distcp20_
> 1518349854763
>
> INFO  [2018-02-11 11:51:12,743] org.apache.gobblin.runtime.AbstractJobLauncher:
> Deleting directory /gobblin/task-staging/distcp20
>
> INFO  [2018-02-11 11:51:12,746] org.apache.gobblin.runtime.AbstractJobLauncher:
> Cleaning up output directory /gobblin/task-output/distcp20/
> job_distcp20_1518349854763
>
> INFO  [2018-02-11 11:51:12,751] org.apache.gobblin.runtime.AbstractJobLauncher:
> Deleting directory /gobblin/task-output/distcp20
>
> INFO  [2018-02-11 11:51:12,757] com.example.applications.test.
> executor.jobs.testGobblinRunner.distcp20/1: jobCompletion:
> JobContext{jobName=distcp20, jobId=job_distcp20_1518349854763, jobState={
>
> "job name": "distcp20",
>
> "job id": "job_distcp20_1518349854763",
>
> "job state": "COMMITTED",
>
> "start time": 1518349855793,
>
> "end time": 1518349872716,
>
> "duration": 16923,
>
> "tasks": 1,
>
> "completed tasks": 1,
>
> "task states": [
>
> {
>
> "task id": "task_distcp20_1518349854763_0",
>
> "task state": "COMMITTED",
>
> "start time": 1518349869446,
>
> "end time": 1518349869981,
>
> "duration": 535,
>
> "retry count": 0
>
> }
>
> ]
>
> }}
>
>
>
> Thanks for all the help.
>
>
>
> Best regards
>
> Rohit.
>
>
>
>
>
> ---------- Forwarded message ----------
> From: *Sudarshan Vasudevan* <su...@linkedin.com>
> Date: Thu, Feb 8, 2018 at 3:08 AM
> Subject: Re: PriviledgedActionException while submitting a gobblin job to
> mapreduce.
> To: "user@gobblin.incubator.apache.org" <user@gobblin.incubator.apache.org
> >
>
> Hi Rohit,
>
> Your yarn.application.classpath is missing the following:
>
> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_
> MAPRED_HOME/share/hadoop/mapreduce/lib/*
>
>
>
> I think this is a hunch, but the JobClient inside the yarn application is
> not finding the hadoop-mapreduce-client-jobclient-2.3.0.jar, which has
> the YarnClientProtocolProvider class and is defaulting to
> LocalClientProtocolProvider and hence unable to initiate a connection to
> your YARN cluster. The above jar is typically located under
> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*.
>
>
>
> Can you add the above to your yarn-site.xml, restart yarn and give it a go?
>
>
>
> Thanks,
>
> Sudarshan
>
>
>
> *From: *Rohit Kalhans <ro...@gmail.com>
> *Reply-To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.
> apache.org>
> *Date: *Wednesday, February 7, 2018 at 1:02 PM
> *To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.
> apache.org>
> *Subject: *Re: PriviledgedActionException while submitting a gobblin job
> to mapreduce.
>
>
>
>
>
> hello  all,
>
>
>
> First of all, thanks for the quick rtt. really appreciate the help.
>
>
>
> The environment variables have been set correctly(atleast that's what i
> think. ).  i am running this on a feeder box (gateway) of a cdh 5.7 cluster
> managed by cloudera manager.
>
>
>
> the yarn-site.xml contains the following
>
>
>
>   <property>
>
>     <name>yarn.application.classpath</name>
>
>       <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_
> COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*
> ,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
> </value>
>
> </property>
>
>
>
> Before the execution of my application I  call the following.
>
>
>
> export HADOOP_PREFIX="/opt/cloudera/parcels/CDH/"
>
> export HADOOP_HOME=$HADOOP_PREFIX
>
> export HADOOP_COMMON_HOME=$HADOOP_PREFIX
>
> export HADOOP_CONF_DIR=HADOOP_PREFIX/etc/hadoop/
>
> export HADOOP_HDFS_HOME=$HADOOP_PREFIX
>
> export HADOOP_CLIENT_CONF_DIR="/etc/hadoop/conf"
>
> export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
>
> export HADOOP_YARN_HOME=$HADOOP_PREFIX
>
> export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin
>
> source /etc/hadoop/conf/hadoop-env.sh
>
>
>
> The hadoop-env.sh sets a few variables as well.
>
>
>
>
>
> $>_ cat /etc/hadoop/conf/hadoop-env.sh
>
>
>
> # Prepend/Append plugin parcel classpaths
>
>
>
> if [ "$HADOOP_USER_CLASSPATH_FIRST" = 'true' ]; then
>
>   # HADOOP_CLASSPATH={{HADOOP_CLASSPATH_APPEND}}
>
>   :
>
> else
>
>   # HADOOP_CLASSPATH={{HADOOP_CLASSPATH}}
>
>   :
>
> fi
>
> # JAVA_LIBRARY_PATH={{JAVA_LIBRARY_PATH}}
>
>
>
> export HADOOP_MAPRED_HOME=$( ([[ ! '/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce'
> =~ CDH_MR2_HOME ]] && echo /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
> ) || echo ${CDH_MR2_HOME:-/usr/lib/hadoop-mapreduce/}  )
>
> export HADOOP_CLIENT_OPTS="-Xmx268435456 $HADOOP_CLIENT_OPTS"
>
> export HADOOP_CLIENT_OPTS="-Djava.net.preferIPv4Stack=true
> $HADOOP_CLIENT_OPTS"
>
> export YARN_OPTS="-Xmx825955249 -Djava.net.preferIPv4Stack=true
> $YARN_OPTS"
>
>
>
>
>
> On Thu, Feb 8, 2018 at 12:48 AM, Sudarshan Vasudevan <
> suvasudevan@linkedin.com> wrote:
>
> Hi Rohit,
>
> Can you share the properties in your yarn-site.xml file?
>
>
>
> The following is an example config that worked for me:
>
> I set the yarn.application.classpath in yarn-site.xml to the following:
>
> <property>
>
> <description>Classpath for typical applications.</description>
>
> <name>yarn.application.classpath</name>
>
> <value>
>
> $HADOOP_CONF_DIR,
>
> $HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_
> COMMON_HOME/share/hadoop/common/lib/*,
>
> $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_
> HOME/share/hadoop/hdfs/lib/*,
>
> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_
> MAPRED_HOME/share/hadoop/mapreduce/lib/*,
>
> $HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_
> HOME/share/hadoop/yarn/lib/*
>
> </value>
>
> </property>
>
>
>
> In my local Hadoop installation, I set the HADOOP_* environment variables
> as follows:
>
> export HADOOP_PREFIX="/usr/local/hadoop-2.3.0"
>
> export HADOOP_HOME=$HADOOP_PREFIX
>
> export HADOOP_COMMON_HOME=$HADOOP_PREFIX
>
> export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
>
> export HADOOP_HDFS_HOME=$HADOOP_PREFIX
>
> export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
>
> export HADOOP_YARN_HOME=$HADOOP_PREFIX
>
> export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin
>
>
>
>
>
> Hope this helps,
>
> Sudarshan
>
>
>
> *From: *Rohit Kalhans <ro...@gmail.com>
> *Reply-To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.
> apache.org>
> *Date: *Wednesday, February 7, 2018 at 10:57 AM
> *To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.
> apache.org>
> *Subject: *PriviledgedActionException while submitting a gobblin job to
> mapreduce.
>
>
>
> Hello
>
> I am integrating gobblin in embedded mode with an existing application.
> While submitting the job it seems like there is a unresolved
> dependency/requirement to mapreduce launcher.
>
>
>
> I have checked that  mapreduce.framework.name is set to yarn and the
> other yarn application are running fine. Somehow I keep hitting the issue
> with the gobblin mr job launcher.
>
> I was hoping that you guys can help me setting up Gobblin in embedded mode
> for my application.
>
>
>
> Here is the stack. Do let me know if some other info is needed.
>
>
>
>
>
> Launching Hadoop MR job Gobblin-test9
> WARN  [2018-02-07 11:43:22,990] org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:<userName> (auth:SIMPLE)
> cause:java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
> INFO  [2018-02-07 11:43:22,991] org.apache.gobblin.runtime.TaskStateCollectorService:
> Stopping the TaskStateCollectorService
> INFO  [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.mapreduce.MRJobLauncher:
> Deleted working directory /tmp/_test9_1518003781707/
> test9/job_test9_1518003782322
> ERROR [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.AbstractJobLauncher:
> Failed to launch and run job job_test9_1518003782322: java.io.IOException:
> Cannot initialize Cluster. Please check your conf
> iguration for mapreduce.framework.name and the correspond server
> addresses.
> ! java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
> ! at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> ! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
> ! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
> ! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1277)
> ! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1273)
> ! at java.security.AccessController.doPrivileged(Native Method)
> ! at javax.security.auth.Subject.doAs(Subject.java:422)
> ! at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1693)
> ! at org.apache.hadoop.mapreduce.Job.connect(Job.java:1272)
> ! at org.apache.hadoop.mapreduce.Job.submit(Job.java:1301)
> ! at org.apache.gobblin.runtime.mapreduce.MRJobLauncher.
> runWorkUnits(MRJobLauncher.java:244)
> ! at org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(
> AbstractJobLauncher.java:596)
> ! at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(
> AbstractJobLauncher.java:443)
> ! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$
> DriverRunnable.call(JobLauncherExecutionDriver.java:159)
> ! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$
> DriverRunnable.call(JobLauncherExecutionDriver.java:147)
> ! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ! at java.lang.Thread.run(Thread.java:745)
>
>
>
> --
>
> Cheerio!
>
> *Rohit*
>
>
>
>
>
> --
>
> Cheerio!
>
> *Rohit*
>
>
>
>
>
> --
>
> Cheerio!
>
> *Rohit*
>



-- 
Cheerio!

*Rohit*

Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Posted by Sudarshan Vasudevan <su...@linkedin.com>.
Hi Rohit,
Can you share the job config file for your distcp job?

Thanks,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>
Reply-To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Date: Sunday, February 11, 2018 at 4:13 AM
To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello Sudarshan, et. al,

Thanks for the help. Based on your response we were able to figure out the problem and were able to move past it after adding lib to the classpath.
Now the yarn job succeeds as per the counter/log as follows.

INFO  [2018-02-11 11:50:57,267] org.apache.gobblin.runtime.TaskStateCollectorService: Starting the TaskStateCollectorService
INFO  [2018-02-11 11:50:57,268] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Launching Hadoop MR job Gobblin-distcp20
WARN  [2018-02-11 11:50:57,607] org.apache.hadoop.mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO  [2018-02-11 11:50:57,734] org.apache.gobblin.runtime.mapreduce.GobblinWorkUnitsInputFormat: Found 1 input files at hdfs://namenodeha/tmp/_distcp20_1518349854235/distcp20/job_distcp20_1518349854763/input: [FileStatus{path=hdfs://namenodeha/tmp/_distcp20_1518349854235/distcp20/job_distcp20_1518349854763/input/task_distcp20_1518349854763_0.wu; isDirectory=false; length=9201; replication=3; blocksize=134217728; modification_time=1518349857234; access_time=1518349857214; owner=applicationetl; group=supergroup; permission=rw-r--r--; isSymlink=false}]
INFO  [2018-02-11 11:50:57,799] org.apache.hadoop.mapreduce.JobSubmitter: number of splits:1
INFO  [2018-02-11 11:50:57,891] org.apache.hadoop.mapreduce.JobSubmitter: Submitting tokens for job: job_1518179003398_40028
INFO  [2018-02-11 11:50:58,130] org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1518179003398_40028
INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: The url to track the job: http://jobtracker.application.example.com:8088/proxy/application_1518179003398_40028/
INFO  [2018-02-11 11:50:58,158] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Waiting for Hadoop MR job job_1518179003398_40028 to complete
INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: Running job: job_1518179003398_40028
INFO  [2018-02-11 11:51:04,362] org.apache.hadoop.mapreduce.Job: Job job_1518179003398_40028 running in uber mode : false
INFO  [2018-02-11 11:51:04,363] org.apache.hadoop.mapreduce.Job:  map 0% reduce 0%
INFO  [2018-02-11 11:51:11,421] org.apache.hadoop.mapreduce.Job:  map 100% reduce 0%
INFO  [2018-02-11 11:51:12,433] org.apache.hadoop.mapreduce.Job: Job job_1518179003398_40028 completed successfully
INFO  [2018-02-11 11:51:12,563] org.apache.hadoop.mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=152940
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=504209
HDFS: Number of bytes written=498190
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=9
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=9704
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4852
Total vcore-seconds taken by all map tasks=4852
Total megabyte-seconds taken by all map tasks=19873792
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=206
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=61
CPU time spent (ms)=6290
Physical memory (bytes) snapshot=515375104
Virtual memory (bytes) snapshot=5540597760<tel:05540%20597%20760>
Total committed heap usage (bytes)=1500512256<tel:01500%20512%20256>
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0


However, it seems that the publisher does not produce any output. I am not able to see any data in the sink folder although the job has successfully completed.

WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher: Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data

Also I can see a warning which points to an issue during mering of meta info.
WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata merger for branch 0 returned null - bug in merger?
INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata output path not set for branch 0, not publishing.
But this seems to be harmless.

INFO  [2018-02-11 11:51:12,659] org.apache.gobblin.runtime.TaskStateCollectorService: Collected task state of 1 completed tasks
INFO  [2018-02-11 11:51:12,660] org.apache.gobblin.runtime.JobContext: 1 more tasks of job job_distcp20_1518349854763 have completed
INFO  [2018-02-11 11:51:12,665] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working directory /tmp/_distcp20_1518349854235/distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,670] org.apache.gobblin.runtime.AbstractJobLauncher: Persisting dataset urns.
INFO  [2018-02-11 11:51:12,680] org.apache.gobblin.runtime.SafeDatasetCommit: Committing dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest) of job job_distcp20_1518349854763 with commit policy COMMIT_ON_FULL_SUCCESS and state SUCCESSFUL
INFO  [2018-02-11 11:51:12,701] org.apache.gobblin.publisher.BaseDataPublisher: Retry disabled for publish.
WARN  [2018-02-11 11:51:12,701] org.apache.gobblin.runtime.SafeDatasetCommit: Gobblin is set up to parallelize publishing, however the publisher org.apache.gobblin.publisher.BaseDataPublisher is not thread-safe. Falling back to serial publishing.
WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher: Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data
INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner: Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@a195448
INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner: Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@a195448
WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata merger for branch 0 returned null - bug in merger?
INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher: Metadata output path not set for branch 0, not publishing.
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit: Submitted 1 lineage events for dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest)
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit: Persisting dataset state for dataset CopyEntity.DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), partition=/tmp/distcptest)
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.util.executors.IteratorExecutor: Attempting to shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@b4864a4
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.util.executors.IteratorExecutor: Successfully shutdown ExecutorService: com.google.common.util.concurrent.MoreExecutors$ListeningDecorator@b4864a4
INFO  [2018-02-11 11:51:12,738] org.apache.gobblin.runtime.AbstractJobLauncher: Cleaning up staging directory /gobblin/task-staging/distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,743] org.apache.gobblin.runtime.AbstractJobLauncher: Deleting directory /gobblin/task-staging/distcp20
INFO  [2018-02-11 11:51:12,746] org.apache.gobblin.runtime.AbstractJobLauncher: Cleaning up output directory /gobblin/task-output/distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,751] org.apache.gobblin.runtime.AbstractJobLauncher: Deleting directory /gobblin/task-output/distcp20
INFO  [2018-02-11 11:51:12,757] com.example.applications.test.executor.jobs<http://executor.jobs>.testGobblinRunner.distcp20/1: jobCompletion: JobContext{jobName=distcp20, jobId=job_distcp20_1518349854763, jobState={
"job name": "distcp20",
"job id": "job_distcp20_1518349854763",
"job state": "COMMITTED",
"start time": 1518349855793,
"end time": 1518349872716,
"duration": 16923,
"tasks": 1,
"completed tasks": 1,
"task states": [
{
"task id": "task_distcp20_1518349854763_0",
"task state": "COMMITTED",
"start time": 1518349869446,
"end time": 1518349869981,
"duration": 535,
"retry count": 0
}
]
}}

Thanks for all the help.

Best regards
Rohit.


---------- Forwarded message ----------
From: Sudarshan Vasudevan <su...@linkedin.com>>
Date: Thu, Feb 8, 2018 at 3:08 AM
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>

Hi Rohit,
Your yarn.application.classpath is missing the following:
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*

I think this is a hunch, but the JobClient inside the yarn application is not finding the hadoop-mapreduce-client-jobclient-2.3.0.jar, which has the YarnClientProtocolProvider class and is defaulting to LocalClientProtocolProvider and hence unable to initiate a connection to your YARN cluster. The above jar is typically located under $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*.

Can you add the above to your yarn-site.xml, restart yarn and give it a go?

Thanks,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>>
Reply-To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Date: Wednesday, February 7, 2018 at 1:02 PM
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.


hello  all,

First of all, thanks for the quick rtt. really appreciate the help.

The environment variables have been set correctly(atleast that's what i think. ).  i am running this on a feeder box (gateway) of a cdh 5.7 cluster managed by cloudera manager.

the yarn-site.xml contains the following

  <property>
    <name>yarn.application.classpath</name>
      <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*  </value>
</property>

Before the execution of my application I  call the following.

export HADOOP_PREFIX="/opt/cloudera/parcels/CDH/"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=HADOOP_PREFIX/etc/hadoop/
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_CLIENT_CONF_DIR="/etc/hadoop/conf"
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin
source /etc/hadoop/conf/hadoop-env.sh

The hadoop-env.sh sets a few variables as well.


$>_ cat /etc/hadoop/conf/hadoop-env.sh

# Prepend/Append plugin parcel classpaths

if [ "$HADOOP_USER_CLASSPATH_FIRST" = 'true' ]; then
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH_APPEND}}
  :
else
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH}}
  :
fi
# JAVA_LIBRARY_PATH={{JAVA_LIBRARY_PATH}}

export HADOOP_MAPRED_HOME=$( ([[ ! '/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce' =~ CDH_MR2_HOME ]] && echo /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce ) || echo ${CDH_MR2_HOME:-/usr/lib/hadoop-mapreduce/}  )
export HADOOP_CLIENT_OPTS="-Xmx268435456 $HADOOP_CLIENT_OPTS"
export HADOOP_CLIENT_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS"
export YARN_OPTS="-Xmx825955249 -Djava.net.preferIPv4Stack=true $YARN_OPTS"


On Thu, Feb 8, 2018 at 12:48 AM, Sudarshan Vasudevan <su...@linkedin.com>> wrote:
Hi Rohit,
Can you share the properties in your yarn-site.xml file?

The following is an example config that worked for me:
I set the yarn.application.classpath in yarn-site.xml to the following:
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>

In my local Hadoop installation, I set the HADOOP_* environment variables as follows:
export HADOOP_PREFIX="/usr/local/hadoop-2.3.0"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin


Hope this helps,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>>
Reply-To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Date: Wednesday, February 7, 2018 at 10:57 AM
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Subject: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello
I am integrating gobblin in embedded mode with an existing application.  While submitting the job it seems like there is a unresolved dependency/requirement to mapreduce launcher.

I have checked that  mapreduce.framework.name<http://mapreduce.framework.name> is set to yarn and the other yarn application are running fine. Somehow I keep hitting the issue with the gobblin mr job launcher.
I was hoping that you guys can help me setting up Gobblin in embedded mode for my application.

Here is the stack. Do let me know if some other info is needed.


Launching Hadoop MR job Gobblin-test9
WARN  [2018-02-07 11:43:22,990] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:<userName> (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
INFO  [2018-02-07 11:43:22,991] org.apache.gobblin.runtime.TaskStateCollectorService: Stopping the TaskStateCollectorService
INFO  [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working directory /tmp/_test9_1518003781707/test9/job_test9_1518003782322
ERROR [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.AbstractJobLauncher: Failed to launch and run job job_test9_1518003782322: java.io.IOException: Cannot initialize Cluster. Please check your conf
iguration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! at org.apache.hadoop.mapreduce.Cl<http://org.apache.hadoop.mapreduce.Cl>uster.initialize(Cluster.java:120)
! at org.apache.hadoop.mapreduce.Cl<http://org.apache.hadoop.mapreduce.Cl>uster.<init>(Cluster.java:82)
! at org.apache.hadoop.mapreduce.Cl<http://org.apache.hadoop.mapreduce.Cl>uster.<init>(Cluster.java:75)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b$9.run(Job.java:1277)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b$9.run(Job.java:1273)
! at java.security.AccessController.doPrivileged(Native Method)
! at javax.security.auth.Subject.do<http://javax.security.auth.Subject.do>As(Subject.java:422)
! at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b.connect(Job.java:1272)
! at org.apache.hadoop.mapreduce.Jo<http://org.apache.hadoop.mapreduce.Jo>b.submit(Job.java:1301)
! at org.apache.gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:244)
! at org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(AbstractJobLauncher.java:596)
! at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:443)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:159)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:147)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.lang.Thread.run(Thread.java:745)

--
Cheerio!

Rohit



--
Cheerio!

Rohit



--
Cheerio!

Rohit

Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Posted by Rohit Kalhans <ro...@gmail.com>.
Hello Sudarshan, et. al,

Thanks for the help. Based on your response we were able to figure out the
problem and were able to move past it after adding lib to the classpath.
Now the yarn job succeeds as per the counter/log as follows.

INFO  [2018-02-11 11:50:57,267]
org.apache.gobblin.runtime.TaskStateCollectorService:
Starting the TaskStateCollectorService
INFO  [2018-02-11 11:50:57,268]
org.apache.gobblin.runtime.mapreduce.MRJobLauncher:
Launching Hadoop MR job Gobblin-distcp20
WARN  [2018-02-11 11:50:57,607]
org.apache.hadoop.mapreduce.JobResourceUploader:
Hadoop command-line option parsing not performed. Implement the Tool
interface and execute your application with ToolRunner to remedy this.
INFO  [2018-02-11 11:50:57,734]
org.apache.gobblin.runtime.mapreduce.GobblinWorkUnitsInputFormat:
Found 1 input files at hdfs://namenodeha/tmp/_distcp20_1518349854235/
distcp20/job_distcp20_1518349854763/input: [FileStatus{path=hdfs://
namenodeha/tmp/_distcp20_1518349854235/distcp20/job_
distcp20_1518349854763/input/task_distcp20_1518349854763_0.wu;
isDirectory=false; length=9201; replication=3; blocksize=134217728;
modification_time=1518349857234; access_time=1518349857214;
owner=applicationetl; group=supergroup; permission=rw-r--r--;
isSymlink=false}]
INFO  [2018-02-11 11:50:57,799] org.apache.hadoop.mapreduce.JobSubmitter:
number of splits:1
INFO  [2018-02-11 11:50:57,891] org.apache.hadoop.mapreduce.JobSubmitter:
Submitting tokens for job: job_1518179003398_40028
INFO  [2018-02-11 11:50:58,130]
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl:
Submitted application application_1518179003398_40028
INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: The url to
track the job: http://jobtracker.application.example.com:8088/proxy/
application_1518179003398_40028/
INFO  [2018-02-11 11:50:58,158]
org.apache.gobblin.runtime.mapreduce.MRJobLauncher:
Waiting for Hadoop MR job job_1518179003398_40028 to complete
INFO  [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: Running
job: job_1518179003398_40028
INFO  [2018-02-11 11:51:04,362] org.apache.hadoop.mapreduce.Job: Job
job_1518179003398_40028 running in uber mode : false
INFO  [2018-02-11 11:51:04,363] org.apache.hadoop.mapreduce.Job:  map 0%
reduce 0%
INFO  [2018-02-11 11:51:11,421] org.apache.hadoop.mapreduce.Job:  map 100%
reduce 0%
INFO  [2018-02-11 11:51:12,433] org.apache.hadoop.mapreduce.Job: Job
job_1518179003398_40028 completed successfully
INFO  [2018-02-11 11:51:12,563] org.apache.hadoop.mapreduce.Job: Counters:
30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=152940
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=504209
HDFS: Number of bytes written=498190
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=9
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=9704
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4852
Total vcore-seconds taken by all map tasks=4852
Total megabyte-seconds taken by all map tasks=19873792
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=206
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=61
CPU time spent (ms)=6290
Physical memory (bytes) snapshot=515375104
Virtual memory (bytes) snapshot=5540597760 <05540%20597%20760>
Total committed heap usage (bytes)=1500512256 <01500%20512%20256>
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0


However, it seems that the publisher does not produce any output. I am not
able to see any data in the sink folder although the job has successfully
completed.

WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher:
Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data

Also I can see a warning which points to an issue during mering of meta
info.
WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher:
Metadata merger for branch 0 returned null - bug in merger?
INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher:
Metadata output path not set for branch 0, not publishing.
But this seems to be harmless.

INFO  [2018-02-11 11:51:12,659]
org.apache.gobblin.runtime.TaskStateCollectorService:
Collected task state of 1 completed tasks
INFO  [2018-02-11 11:51:12,660] org.apache.gobblin.runtime.JobContext: 1
more tasks of job job_distcp20_1518349854763 have completed
INFO  [2018-02-11 11:51:12,665]
org.apache.gobblin.runtime.mapreduce.MRJobLauncher:
Deleted working directory /tmp/_distcp20_1518349854235/
distcp20/job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,670] org.apache.gobblin.runtime.AbstractJobLauncher:
Persisting dataset urns.
INFO  [2018-02-11 11:51:12,680] org.apache.gobblin.runtime.SafeDatasetCommit:
Committing dataset CopyEntity.DatasetAndPartition(dataset=
CopyableDatasetMetadata(datasetURN=/tmp/distcptest),
partition=/tmp/distcptest) of job job_distcp20_1518349854763 with commit
policy COMMIT_ON_FULL_SUCCESS and state SUCCESSFUL
INFO  [2018-02-11 11:51:12,701] org.apache.gobblin.publisher.BaseDataPublisher:
Retry disabled for publish.
WARN  [2018-02-11 11:51:12,701] org.apache.gobblin.runtime.SafeDatasetCommit:
Gobblin is set up to parallelize publishing, however the publisher
org.apache.gobblin.publisher.BaseDataPublisher is not thread-safe. Falling
back to serial publishing.
WARN  [2018-02-11 11:51:12,703] org.apache.gobblin.publisher.BaseDataPublisher:
Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data
INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner:
Attempting to shutdown ExecutorService: com.google.common.util.
concurrent.MoreExecutors$ListeningDecorator@a195448
INFO  [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner:
Successfully shutdown ExecutorService: com.google.common.util.
concurrent.MoreExecutors$ListeningDecorator@a195448
WARN  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher:
Metadata merger for branch 0 returned null - bug in merger?
INFO  [2018-02-11 11:51:12,708] org.apache.gobblin.publisher.BaseDataPublisher:
Metadata output path not set for branch 0, not publishing.
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit:
Submitted 1 lineage events for dataset CopyEntity.
DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest),
partition=/tmp/distcptest)
INFO  [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit:
Persisting dataset state for dataset CopyEntity.DatasetAndPartition(dataset=
CopyableDatasetMetadata(datasetURN=/tmp/distcptest),
partition=/tmp/distcptest)
INFO  [2018-02-11 11:51:12,711]
org.apache.gobblin.util.executors.IteratorExecutor:
Attempting to shutdown ExecutorService: com.google.common.util.
concurrent.MoreExecutors$ListeningDecorator@b4864a4
INFO  [2018-02-11 11:51:12,711]
org.apache.gobblin.util.executors.IteratorExecutor:
Successfully shutdown ExecutorService: com.google.common.util.
concurrent.MoreExecutors$ListeningDecorator@b4864a4
INFO  [2018-02-11 11:51:12,738] org.apache.gobblin.runtime.AbstractJobLauncher:
Cleaning up staging directory /gobblin/task-staging/distcp20/job_distcp20_
1518349854763
INFO  [2018-02-11 11:51:12,743] org.apache.gobblin.runtime.AbstractJobLauncher:
Deleting directory /gobblin/task-staging/distcp20
INFO  [2018-02-11 11:51:12,746] org.apache.gobblin.runtime.AbstractJobLauncher:
Cleaning up output directory /gobblin/task-output/distcp20/
job_distcp20_1518349854763
INFO  [2018-02-11 11:51:12,751] org.apache.gobblin.runtime.AbstractJobLauncher:
Deleting directory /gobblin/task-output/distcp20
INFO  [2018-02-11 11:51:12,757]
com.example.applications.test.executor.jobs.testGobblinRunner.distcp20/1:
jobCompletion: JobContext{jobName=distcp20, jobId=job_distcp20_1518349854763,
jobState={
"job name": "distcp20",
"job id": "job_distcp20_1518349854763",
"job state": "COMMITTED",
"start time": 1518349855793,
"end time": 1518349872716,
"duration": 16923,
"tasks": 1,
"completed tasks": 1,
"task states": [
{
"task id": "task_distcp20_1518349854763_0",
"task state": "COMMITTED",
"start time": 1518349869446,
"end time": 1518349869981,
"duration": 535,
"retry count": 0
}
]
}}

Thanks for all the help.

Best regards
Rohit.


---------- Forwarded message ----------
From: Sudarshan Vasudevan <su...@linkedin.com>
Date: Thu, Feb 8, 2018 at 3:08 AM
Subject: Re: PriviledgedActionException while submitting a gobblin job to
mapreduce.
To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>


Hi Rohit,

Your yarn.application.classpath is missing the following:

$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_
HOME/share/hadoop/mapreduce/lib/*



I think this is a hunch, but the JobClient inside the yarn application is
not finding the hadoop-mapreduce-client-jobclient-2.3.0.jar, which has the
YarnClientProtocolProvider class and is defaulting to
LocalClientProtocolProvider and hence unable to initiate a connection to
your YARN cluster. The above jar is typically located under
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*.



Can you add the above to your yarn-site.xml, restart yarn and give it a go?



Thanks,

Sudarshan



*From: *Rohit Kalhans <ro...@gmail.com>
*Reply-To: *"user@gobblin.incubator.apache.org" <
user@gobblin.incubator.apache.org>
*Date: *Wednesday, February 7, 2018 at 1:02 PM
*To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.apache.org
>
*Subject: *Re: PriviledgedActionException while submitting a gobblin job to
mapreduce.





hello  all,



First of all, thanks for the quick rtt. really appreciate the help.



The environment variables have been set correctly(atleast that's what i
think. ).  i am running this on a feeder box (gateway) of a cdh 5.7 cluster
managed by cloudera manager.



the yarn-site.xml contains the following



  <property>

    <name>yarn.application.classpath</name>

      <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMM
ON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$
HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*  </value>

</property>



Before the execution of my application I  call the following.



export HADOOP_PREFIX="/opt/cloudera/parcels/CDH/"

export HADOOP_HOME=$HADOOP_PREFIX

export HADOOP_COMMON_HOME=$HADOOP_PREFIX

export HADOOP_CONF_DIR=HADOOP_PREFIX/etc/hadoop/

export HADOOP_HDFS_HOME=$HADOOP_PREFIX

export HADOOP_CLIENT_CONF_DIR="/etc/hadoop/conf"

export HADOOP_MAPRED_HOME=$HADOOP_PREFIX

export HADOOP_YARN_HOME=$HADOOP_PREFIX

export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin

source /etc/hadoop/conf/hadoop-env.sh



The hadoop-env.sh sets a few variables as well.





$>_ cat /etc/hadoop/conf/hadoop-env.sh



# Prepend/Append plugin parcel classpaths



if [ "$HADOOP_USER_CLASSPATH_FIRST" = 'true' ]; then

  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH_APPEND}}

  :

else

  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH}}

  :

fi

# JAVA_LIBRARY_PATH={{JAVA_LIBRARY_PATH}}



export HADOOP_MAPRED_HOME=$( ([[ !
'/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce'
=~ CDH_MR2_HOME ]] && echo /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce )
|| echo ${CDH_MR2_HOME:-/usr/lib/hadoop-mapreduce/}  )

export HADOOP_CLIENT_OPTS="-Xmx268435456 $HADOOP_CLIENT_OPTS"

export HADOOP_CLIENT_OPTS="-Djava.net.preferIPv4Stack=true
$HADOOP_CLIENT_OPTS"

export YARN_OPTS="-Xmx825955249 -Djava.net.preferIPv4Stack=true $YARN_OPTS"





On Thu, Feb 8, 2018 at 12:48 AM, Sudarshan Vasudevan <
suvasudevan@linkedin.com> wrote:

Hi Rohit,

Can you share the properties in your yarn-site.xml file?



The following is an example config that worked for me:

I set the yarn.application.classpath in yarn-site.xml to the following:

<property>

<description>Classpath for typical applications.</description>

<name>yarn.application.classpath</name>

<value>

$HADOOP_CONF_DIR,

$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_
HOME/share/hadoop/common/lib/*,

$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/
share/hadoop/hdfs/lib/*,

$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_
HOME/share/hadoop/mapreduce/lib/*,

$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/
share/hadoop/yarn/lib/*

</value>

</property>



In my local Hadoop installation, I set the HADOOP_* environment variables
as follows:

export HADOOP_PREFIX="/usr/local/hadoop-2.3.0"

export HADOOP_HOME=$HADOOP_PREFIX

export HADOOP_COMMON_HOME=$HADOOP_PREFIX

export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

export HADOOP_HDFS_HOME=$HADOOP_PREFIX

export HADOOP_MAPRED_HOME=$HADOOP_PREFIX

export HADOOP_YARN_HOME=$HADOOP_PREFIX

export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin





Hope this helps,

Sudarshan



*From: *Rohit Kalhans <ro...@gmail.com>
*Reply-To: *"user@gobblin.incubator.apache.org" <
user@gobblin.incubator.apache.org>
*Date: *Wednesday, February 7, 2018 at 10:57 AM
*To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.apache.org
>
*Subject: *PriviledgedActionException while submitting a gobblin job to
mapreduce.



Hello

I am integrating gobblin in embedded mode with an existing application.
While submitting the job it seems like there is a unresolved
dependency/requirement to mapreduce launcher.



I have checked that  mapreduce.framework.name is set to yarn and the other
yarn application are running fine. Somehow I keep hitting the issue with
the gobblin mr job launcher.

I was hoping that you guys can help me setting up Gobblin in embedded mode
for my application.



Here is the stack. Do let me know if some other info is needed.





Launching Hadoop MR job Gobblin-test9
WARN  [2018-02-07 11:43:22,990]
org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:<userName> (auth:SIMPLE)
cause:java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.
INFO  [2018-02-07 11:43:22,991]
org.apache.gobblin.runtime.TaskStateCollectorService:
Stopping the TaskStateCollectorService
INFO  [2018-02-07 11:43:23,033]
org.apache.gobblin.runtime.mapreduce.MRJobLauncher:
Deleted working directory /tmp/_test9_1518003781707/test
9/job_test9_1518003782322
ERROR [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.AbstractJobLauncher:
Failed to launch and run job job_test9_1518003782322: java.io.IOException:
Cannot initialize Cluster. Please check your conf
iguration for mapreduce.framework.name and the correspond server addresses.
! java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.
! at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1277)
! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1273)
! at java.security.AccessController.doPrivileged(Native Method)
! at javax.security.auth.Subject.doAs(Subject.java:422)
! at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
upInformation.java:1693)
! at org.apache.hadoop.mapreduce.Job.connect(Job.java:1272)
! at org.apache.hadoop.mapreduce.Job.submit(Job.java:1301)
! at org.apache.gobblin.runtime.mapreduce.MRJobLauncher.runWorkUn
its(MRJobLauncher.java:244)
! at org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitSt
ream(AbstractJobLauncher.java:596)
! at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(Abs
tractJobLauncher.java:443)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriv
er$DriverRunnable.call(JobLauncherExecutionDriver.java:159)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriv
er$DriverRunnable.call(JobLauncherExecutionDriver.java:147)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.lang.Thread.run(Thread.java:745)



-- 

Cheerio!

*Rohit*





-- 

Cheerio!

*Rohit*



-- 
Cheerio!

*Rohit*

Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Posted by Sudarshan Vasudevan <su...@linkedin.com>.
Hi Rohit,
Your yarn.application.classpath is missing the following:
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*

I think this is a hunch, but the JobClient inside the yarn application is not finding the hadoop-mapreduce-client-jobclient-2.3.0.jar, which has the YarnClientProtocolProvider class and is defaulting to LocalClientProtocolProvider and hence unable to initiate a connection to your YARN cluster. The above jar is typically located under $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*.

Can you add the above to your yarn-site.xml, restart yarn and give it a go?

Thanks,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>
Reply-To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Date: Wednesday, February 7, 2018 at 1:02 PM
To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Subject: Re: PriviledgedActionException while submitting a gobblin job to mapreduce.


hello  all,

First of all, thanks for the quick rtt. really appreciate the help.

The environment variables have been set correctly(atleast that's what i think. ).  i am running this on a feeder box (gateway) of a cdh 5.7 cluster managed by cloudera manager.

the yarn-site.xml contains the following

  <property>
    <name>yarn.application.classpath</name>
      <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*  </value>
</property>

Before the execution of my application I  call the following.

export HADOOP_PREFIX="/opt/cloudera/parcels/CDH/"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=HADOOP_PREFIX/etc/hadoop/
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_CLIENT_CONF_DIR="/etc/hadoop/conf"
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin
source /etc/hadoop/conf/hadoop-env.sh

The hadoop-env.sh sets a few variables as well.


$>_ cat /etc/hadoop/conf/hadoop-env.sh

# Prepend/Append plugin parcel classpaths

if [ "$HADOOP_USER_CLASSPATH_FIRST" = 'true' ]; then
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH_APPEND}}
  :
else
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH}}
  :
fi
# JAVA_LIBRARY_PATH={{JAVA_LIBRARY_PATH}}

export HADOOP_MAPRED_HOME=$( ([[ ! '/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce' =~ CDH_MR2_HOME ]] && echo /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce ) || echo ${CDH_MR2_HOME:-/usr/lib/hadoop-mapreduce/}  )
export HADOOP_CLIENT_OPTS="-Xmx268435456 $HADOOP_CLIENT_OPTS"
export HADOOP_CLIENT_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS"
export YARN_OPTS="-Xmx825955249 -Djava.net.preferIPv4Stack=true $YARN_OPTS"


On Thu, Feb 8, 2018 at 12:48 AM, Sudarshan Vasudevan <su...@linkedin.com>> wrote:
Hi Rohit,
Can you share the properties in your yarn-site.xml file?

The following is an example config that worked for me:
I set the yarn.application.classpath in yarn-site.xml to the following:
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>

In my local Hadoop installation, I set the HADOOP_* environment variables as follows:
export HADOOP_PREFIX="/usr/local/hadoop-2.3.0"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin


Hope this helps,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>>
Reply-To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Date: Wednesday, February 7, 2018 at 10:57 AM
To: "user@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>" <us...@gobblin.incubator.apache.org>>
Subject: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello
I am integrating gobblin in embedded mode with an existing application.  While submitting the job it seems like there is a unresolved dependency/requirement to mapreduce launcher.

I have checked that  mapreduce.framework.name<http://mapreduce.framework.name> is set to yarn and the other yarn application are running fine. Somehow I keep hitting the issue with the gobblin mr job launcher.
I was hoping that you guys can help me setting up Gobblin in embedded mode for my application.

Here is the stack. Do let me know if some other info is needed.


Launching Hadoop MR job Gobblin-test9
WARN  [2018-02-07 11:43:22,990] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:<userName> (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
INFO  [2018-02-07 11:43:22,991] org.apache.gobblin.runtime.TaskStateCollectorService: Stopping the TaskStateCollectorService
INFO  [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working directory /tmp/_test9_1518003781707/test9/job_test9_1518003782322
ERROR [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.AbstractJobLauncher: Failed to launch and run job job_test9_1518003782322: java.io.IOException: Cannot initialize Cluster. Please check your conf
iguration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1277)
! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1273)
! at java.security.AccessController.doPrivileged(Native Method)
! at javax.security.auth.Subject.doAs(Subject.java:422)
! at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
! at org.apache.hadoop.mapreduce.Job.connect(Job.java:1272)
! at org.apache.hadoop.mapreduce.Job.submit(Job.java:1301)
! at org.apache.gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:244)
! at org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(AbstractJobLauncher.java:596)
! at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:443)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:159)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:147)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.lang.Thread.run(Thread.java:745)

--
Cheerio!

Rohit



--
Cheerio!

Rohit

Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Posted by Rohit Kalhans <ro...@gmail.com>.
hello  all,

First of all, thanks for the quick rtt. really appreciate the help.

The environment variables have been set correctly(atleast that's what i
think. ).  i am running this on a feeder box (gateway) of a cdh 5.7 cluster
managed by cloudera manager.

the yarn-site.xml contains the following

  <property>
    <name>yarn.application.classpath</name>

<value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
</property>

Before the execution of my application I  call the following.

export HADOOP_PREFIX="/opt/cloudera/parcels/CDH/"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=HADOOP_PREFIX/etc/hadoop/
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_CLIENT_CONF_DIR="/etc/hadoop/conf"
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin
source /etc/hadoop/conf/hadoop-env.sh

The hadoop-env.sh sets a few variables as well.


$>_ cat /etc/hadoop/conf/hadoop-env.sh

# Prepend/Append plugin parcel classpaths

if [ "$HADOOP_USER_CLASSPATH_FIRST" = 'true' ]; then
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH_APPEND}}
  :
else
  # HADOOP_CLASSPATH={{HADOOP_CLASSPATH}}
  :
fi
# JAVA_LIBRARY_PATH={{JAVA_LIBRARY_PATH}}

export HADOOP_MAPRED_HOME=$( ([[ !
'/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce' =~ CDH_MR2_HOME ]] && echo
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce ) || echo
${CDH_MR2_HOME:-/usr/lib/hadoop-mapreduce/}  )
export HADOOP_CLIENT_OPTS="-Xmx268435456 $HADOOP_CLIENT_OPTS"
export HADOOP_CLIENT_OPTS="-Djava.net.preferIPv4Stack=true
$HADOOP_CLIENT_OPTS"
export YARN_OPTS="-Xmx825955249 -Djava.net.preferIPv4Stack=true $YARN_OPTS"


On Thu, Feb 8, 2018 at 12:48 AM, Sudarshan Vasudevan <
suvasudevan@linkedin.com> wrote:

> Hi Rohit,
>
> Can you share the properties in your yarn-site.xml file?
>
>
>
> The following is an example config that worked for me:
>
> I set the yarn.application.classpath in yarn-site.xml to the following:
>
> <property>
>
> <description>Classpath for typical applications.</description>
>
> <name>yarn.application.classpath</name>
>
> <value>
>
> $HADOOP_CONF_DIR,
>
> $HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_
> COMMON_HOME/share/hadoop/common/lib/*,
>
> $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_
> HOME/share/hadoop/hdfs/lib/*,
>
> $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_
> MAPRED_HOME/share/hadoop/mapreduce/lib/*,
>
> $HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_
> HOME/share/hadoop/yarn/lib/*
>
> </value>
>
> </property>
>
>
>
> In my local Hadoop installation, I set the HADOOP_* environment variables
> as follows:
>
> export HADOOP_PREFIX="/usr/local/hadoop-2.3.0"
>
> export HADOOP_HOME=$HADOOP_PREFIX
>
> export HADOOP_COMMON_HOME=$HADOOP_PREFIX
>
> export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
>
> export HADOOP_HDFS_HOME=$HADOOP_PREFIX
>
> export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
>
> export HADOOP_YARN_HOME=$HADOOP_PREFIX
>
> export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin
>
>
>
>
>
> Hope this helps,
>
> Sudarshan
>
>
>
> *From: *Rohit Kalhans <ro...@gmail.com>
> *Reply-To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.
> apache.org>
> *Date: *Wednesday, February 7, 2018 at 10:57 AM
> *To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator.
> apache.org>
> *Subject: *PriviledgedActionException while submitting a gobblin job to
> mapreduce.
>
>
>
> Hello
>
> I am integrating gobblin in embedded mode with an existing application.
> While submitting the job it seems like there is a unresolved
> dependency/requirement to mapreduce launcher.
>
>
>
> I have checked that  mapreduce.framework.name is set to yarn and the
> other yarn application are running fine. Somehow I keep hitting the issue
> with the gobblin mr job launcher.
>
> I was hoping that you guys can help me setting up Gobblin in embedded mode
> for my application.
>
>
>
> Here is the stack. Do let me know if some other info is needed.
>
>
>
>
>
> Launching Hadoop MR job Gobblin-test9
> WARN  [2018-02-07 11:43:22,990] org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:<userName> (auth:SIMPLE)
> cause:java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
> INFO  [2018-02-07 11:43:22,991] org.apache.gobblin.runtime.TaskStateCollectorService:
> Stopping the TaskStateCollectorService
> INFO  [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.mapreduce.MRJobLauncher:
> Deleted working directory /tmp/_test9_1518003781707/
> test9/job_test9_1518003782322
> ERROR [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.AbstractJobLauncher:
> Failed to launch and run job job_test9_1518003782322: java.io.IOException:
> Cannot initialize Cluster. Please check your conf
> iguration for mapreduce.framework.name and the correspond server
> addresses.
> ! java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
> ! at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> ! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
> ! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
> ! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1277)
> ! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1273)
> ! at java.security.AccessController.doPrivileged(Native Method)
> ! at javax.security.auth.Subject.doAs(Subject.java:422)
> ! at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1693)
> ! at org.apache.hadoop.mapreduce.Job.connect(Job.java:1272)
> ! at org.apache.hadoop.mapreduce.Job.submit(Job.java:1301)
> ! at org.apache.gobblin.runtime.mapreduce.MRJobLauncher.
> runWorkUnits(MRJobLauncher.java:244)
> ! at org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(
> AbstractJobLauncher.java:596)
> ! at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(
> AbstractJobLauncher.java:443)
> ! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$
> DriverRunnable.call(JobLauncherExecutionDriver.java:159)
> ! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$
> DriverRunnable.call(JobLauncherExecutionDriver.java:147)
> ! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ! at java.lang.Thread.run(Thread.java:745)
>
>
>
> --
>
> Cheerio!
>
> *Rohit*
>



-- 
Cheerio!

*Rohit*

Re: PriviledgedActionException while submitting a gobblin job to mapreduce.

Posted by Sudarshan Vasudevan <su...@linkedin.com>.
Hi Rohit,
Can you share the properties in your yarn-site.xml file?

The following is an example config that worked for me:
I set the yarn.application.classpath in yarn-site.xml to the following:
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>

In my local Hadoop installation, I set the HADOOP_* environment variables as follows:
export HADOOP_PREFIX="/usr/local/hadoop-2.3.0"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin


Hope this helps,
Sudarshan

From: Rohit Kalhans <ro...@gmail.com>
Reply-To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Date: Wednesday, February 7, 2018 at 10:57 AM
To: "user@gobblin.incubator.apache.org" <us...@gobblin.incubator.apache.org>
Subject: PriviledgedActionException while submitting a gobblin job to mapreduce.

Hello
I am integrating gobblin in embedded mode with an existing application.  While submitting the job it seems like there is a unresolved dependency/requirement to mapreduce launcher.

I have checked that  mapreduce.framework.name<http://mapreduce.framework.name> is set to yarn and the other yarn application are running fine. Somehow I keep hitting the issue with the gobblin mr job launcher.
I was hoping that you guys can help me setting up Gobblin in embedded mode for my application.


Here is the stack. Do let me know if some other info is needed.


Launching Hadoop MR job Gobblin-test9
WARN  [2018-02-07 11:43:22,990] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:<userName> (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
INFO  [2018-02-07 11:43:22,991] org.apache.gobblin.runtime.TaskStateCollectorService: Stopping the TaskStateCollectorService
INFO  [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.mapreduce.MRJobLauncher: Deleted working directory /tmp/_test9_1518003781707/test9/job_test9_1518003782322
ERROR [2018-02-07 11:43:23,033] org.apache.gobblin.runtime.AbstractJobLauncher: Failed to launch and run job job_test9_1518003782322: java.io.IOException: Cannot initialize Cluster. Please check your conf
iguration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name<http://mapreduce.framework.name> and the correspond server addresses.
! at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1277)
! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1273)
! at java.security.AccessController.doPrivileged(Native Method)
! at javax.security.auth.Subject.doAs(Subject.java:422)
! at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
! at org.apache.hadoop.mapreduce.Job.connect(Job.java:1272)
! at org.apache.hadoop.mapreduce.Job.submit(Job.java:1301)
! at org.apache.gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:244)
! at org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream(AbstractJobLauncher.java:596)
! at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:443)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:159)
! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$DriverRunnable.call(JobLauncherExecutionDriver.java:147)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.lang.Thread.run(Thread.java:745)

--
Cheerio!

Rohit