You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@oozie.apache.org by Marko Dinic <ma...@nissatech.com> on 2016/05/05 08:39:58 UTC

MR jobs from Java action run locally

Hello everyone,

I'm trying to run a sequence of MR jobs using Java action for their 
drivers in Oozie.

The problem is that MR job are run locally instead on Hadoop cluster. 
How to fix this?

First job reads from HBase, performs some processing and puts the result 
on HDFS, while next job should read from it. There are 10 mappers in 
first job, but I'm only showing the last one as an example.

Here is the error log from HBase MR job:

         Aw==, start row: 9-777-1123456789113, end row: 
9-777-1123456789114, region location: hdp-slave1.nissatech.local:16020)
     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process 
identifier=hconnection-0x860ce79 connecting to ZooKeeper 
ensemble=192.168.84.27:2181
     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.zookeeper.ZooKeeper: Initiating client connection, 
connectString=192.168.84.27:2181 sessionTimeout=90000 
watcher=hconnection-0x860ce790x0, quorum=192.168.84.27:2181, 
baseZNode=/hbase-unsecure
     2016-05-04 14:33:48,378 INFO [LocalJobRunner Map Task Executor 
#0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn: 
Opening socket connection to server 192.168.84.27/192.168.84.27:2181. 
Will not attempt to authenticate using SASL (unknown error)
     2016-05-04 14:33:48,379 INFO [LocalJobRunner Map Task Executor 
#0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn: 
Socket connection established to 192.168.84.27/192.168.84.27:2181, 
initiating session
     2016-05-04 14:33:48,391 INFO [LocalJobRunner Map Task Executor 
#0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn: 
Session establishment complete on server 
192.168.84.27/192.168.84.27:2181, sessionid = 0x152f8f85214096b, 
negotiated timeout = 40000
     2016-05-04 14:33:48,394 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Input split 
length: 0 bytes.
     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: soft limit at 83886080
     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
     2016-05-04 14:33:48,591 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
     2016-05-04 14:33:48,592 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: Map output collector class = 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
     2016-05-04 14:33:48,801 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.LocalJobRunner:
     2016-05-04 14:33:48,802 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: 
Closing zookeeper sessionid=0x152f8f85214096b
     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor 
#0-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.zookeeper.ZooKeeper: Session: 0x152f8f85214096b closed
     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: Starting flush of map output
     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: Spilling map output
     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 5734062; 
bufvoid = 104857600
     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 
26210008(104840032); length = 4389/6553600
     2016-05-04 14:33:48,874 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.MapTask: Finished spill 0
     2016-05-04 14:33:48,877 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.Task: 
Task:attempt_local1149688163_0001_m_000009_0 is done. And is in the 
process of committing
     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.LocalJobRunner: map
     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.Task: Task 
'attempt_local1149688163_0001_m_000009_0' done.
     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0] 
org.apache.hadoop.mapred.LocalJobRunner: Finishing task: 
attempt_local1149688163_0001_m_000009_0
     2016-05-04 14:33:48,897 INFO [Thread-42] 
org.apache.hadoop.mapred.LocalJobRunner: map task executor complete.
     2016-05-04 14:33:48,901 INFO [Thread-42] 
org.apache.hadoop.mapred.LocalJobRunner: Waiting for reduce tasks
     2016-05-04 14:33:48,901 INFO [pool-9-thread-1] 
org.apache.hadoop.mapred.LocalJobRunner: Starting task: 
attempt_local1149688163_0001_r_000000_0
     2016-05-04 14:33:48,918 INFO [pool-9-thread-1] 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output 
Committer Algorithm version is 1
     2016-05-04 14:33:48,919 INFO [pool-9-thread-1] 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: 
FileOutputCommitter skip cleanup _temporary folders under output 
directory:false, ignore cleanup failures: false
     2016-05-04 14:33:48,919 INFO [pool-9-thread-1] 
org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
     2016-05-04 14:33:48,932 INFO [pool-9-thread-1] 
org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin: 
org.apache.hadoop.mapreduce.task.reduce.Shuffle@697f13c9
     2016-05-04 14:33:48,959 INFO [pool-9-thread-1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: 
memoryLimit=289931264, maxSingleShuffleLimit=72482816, 
mergeThreshold=191354640, ioSortFactor=10, memToMemMergeOutputsThreshold=10
     2016-05-04 14:33:48,965 INFO [EventFetcher for fetching Map 
Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_local1149688163_0001_r_000000_0 Thread started: EventFetcher for 
fetching Map Completion Events
     2016-05-04 14:33:49,035 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000007_0 
decomp: 5381537 len: 5381541 to MEMORY
     2016-05-04 14:33:49,056 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5381537 
bytes from map-output for attempt_local1149688163_0001_m_000007_0
     2016-05-04 14:33:49,061 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5381537, 
inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->5381537
     2016-05-04 14:33:49,070 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000000_0 
decomp: 5472201 len: 5472205 to MEMORY
     2016-05-04 14:33:49,084 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5472201 
bytes from map-output for attempt_local1149688163_0001_m_000000_0
     2016-05-04 14:33:49,084 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5472201, 
inMemoryMapOutputs.size() -> 2, commitMemory -> 5381537, usedMemory 
->10853738
     2016-05-04 14:33:49,110 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000001_0 
decomp: 5387977 len: 5387981 to MEMORY
     2016-05-04 14:33:49,124 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5387977 
bytes from map-output for attempt_local1149688163_0001_m_000001_0
     2016-05-04 14:33:49,125 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5387977, 
inMemoryMapOutputs.size() -> 3, commitMemory -> 10853738, usedMemory 
->16241715
     2016-05-04 14:33:49,129 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000004_0 
decomp: 5347914 len: 5347918 to MEMORY
     2016-05-04 14:33:49,143 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5347914 
bytes from map-output for attempt_local1149688163_0001_m_000004_0
     2016-05-04 14:33:49,144 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5347914, 
inMemoryMapOutputs.size() -> 4, commitMemory -> 16241715, usedMemory 
->21589629
     2016-05-04 14:33:49,148 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000002_0 
decomp: 5671398 len: 5671402 to MEMORY
     2016-05-04 14:33:49,161 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5671398 
bytes from map-output for attempt_local1149688163_0001_m_000002_0
     2016-05-04 14:33:49,161 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5671398, 
inMemoryMapOutputs.size() -> 5, commitMemory -> 21589629, usedMemory 
->27261027
     2016-05-04 14:33:49,166 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000005_0 
decomp: 5743249 len: 5743253 to MEMORY
     2016-05-04 14:33:49,180 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5743249 
bytes from map-output for attempt_local1149688163_0001_m_000005_0
     2016-05-04 14:33:49,180 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5743249, 
inMemoryMapOutputs.size() -> 6, commitMemory -> 27261027, usedMemory 
->33004276
     2016-05-04 14:33:49,184 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000008_0 
decomp: 5471488 len: 5471492 to MEMORY
     2016-05-04 14:33:49,197 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5471488 
bytes from map-output for attempt_local1149688163_0001_m_000008_0
     2016-05-04 14:33:49,197 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5471488, 
inMemoryMapOutputs.size() -> 7, commitMemory -> 33004276, usedMemory 
->38475764
     2016-05-04 14:33:49,313 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000003_0 
decomp: 5579502 len: 5579506 to MEMORY
     2016-05-04 14:33:49,327 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5579502 
bytes from map-output for attempt_local1149688163_0001_m_000003_0
     2016-05-04 14:33:49,327 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5579502, 
inMemoryMapOutputs.size() -> 8, commitMemory -> 38475764, usedMemory 
->44055266
     2016-05-04 14:33:49,332 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000006_0 
decomp: 5605456 len: 5605460 to MEMORY
     2016-05-04 14:33:49,344 INFO [main] 
org.apache.hadoop.mapreduce.Job:  map 100% reduce 0%
     2016-05-04 14:33:49,349 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5605456 
bytes from map-output for attempt_local1149688163_0001_m_000006_0
     2016-05-04 14:33:49,349 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5605456, 
inMemoryMapOutputs.size() -> 9, commitMemory -> 44055266, usedMemory 
->49660722
     2016-05-04 14:33:49,354 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
about to shuffle output of map attempt_local1149688163_0001_m_000009_0 
decomp: 5738455 len: 5738459 to MEMORY
     2016-05-04 14:33:49,370 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5738455 
bytes from map-output for attempt_local1149688163_0001_m_000009_0
     2016-05-04 14:33:49,370 INFO [localfetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
closeInMemoryFile -> map-output of size: 5738455, 
inMemoryMapOutputs.size() -> 10, commitMemory -> 49660722, usedMemory 
->55399177
     2016-05-04 14:33:49,373 INFO [EventFetcher for fetching Map 
Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
EventFetcher is interrupted.. Returning
     2016-05-04 14:33:49,375 INFO [pool-9-thread-1] 
org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
     2016-05-04 14:33:49,376 INFO [pool-9-thread-1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge 
called with 10 in-memory map-outputs and 0 on-disk map-outputs
     2016-05-04 14:33:49,388 INFO [pool-9-thread-1] 
org.apache.hadoop.mapred.Merger: Merging 10 sorted segments
     2016-05-04 14:33:49,389 INFO [pool-9-thread-1] 
org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 
segments left of total size: 55398877 bytes
     2016-05-04 14:33:49,711 INFO [pool-9-thread-1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 10 
segments, 55399177 bytes to disk to satisfy reduce memory limit
     2016-05-04 14:33:49,712 INFO [pool-9-thread-1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 1 
files, 55399163 bytes from disk
     2016-05-04 14:33:49,713 INFO [pool-9-thread-1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0 
segments, 0 bytes from memory into reduce
     2016-05-04 14:33:49,714 INFO [pool-9-thread-1] 
org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
     2016-05-04 14:33:49,714 INFO [pool-9-thread-1] 
org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 
segments left of total size: 55399129 bytes
     2016-05-04 14:33:49,715 INFO [pool-9-thread-1] 
org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
     2016-05-04 14:33:49,742 INFO [Thread-42] 
org.apache.hadoop.mapred.LocalJobRunner: reduce task executor complete.
     2016-05-04 14:33:49,797 WARN [Thread-42] 
org.apache.hadoop.mapred.LocalJobRunner: job_local1149688163_0001
     java.lang.Exception: java.io.IOException: Mkdirs failed to create 
file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0 
(exists=false, 
cwd=file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
     Caused by: java.io.IOException: Mkdirs failed to create 
file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0 
(exists=false, 
cwd=file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002)
         at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449)
         at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
         at 
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1074)
         at 
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
         at 
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
         at 
org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
         at 
org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
         at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:540)
         at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)
     2016-05-04 14:33:50,346 INFO [main] 
org.apache.hadoop.mapreduce.Job: Job job_local1149688163_0001 failed 
with state FAILED due to: NA
     2016-05-04 14:33:50,407 INFO [main] 
org.apache.hadoop.mapreduce.Job: Counters: 38
         File System Counters
             FILE: Number of bytes read=1287449333
             FILE: Number of bytes written=1607139426
             FILE: Number of read operations=0
             FILE: Number of large read operations=0
             FILE: Number of write operations=0
             HDFS: Number of bytes read=1111590
             HDFS: Number of bytes written=220
             HDFS: Number of read operations=40
             HDFS: Number of large read operations=0
             HDFS: Number of write operations=20
         Map-Reduce Framework
             Map input records=10906
             Map output records=10906
             Map output bytes=55355550
             Map output materialized bytes=55399217
             Input split bytes=2900
             Combine input records=0
             Combine output records=0
             Reduce input groups=0
             Reduce shuffle bytes=55399217
             Reduce input records=0
             Reduce output records=0
             Spilled Records=10906
             Shuffled Maps =10
             Failed Shuffles=0
             Merged Map outputs=10
             GC time elapsed (ms)=641
             CPU time spent (ms)=11290
             Physical memory (bytes) snapshot=4507889664
             Virtual memory (bytes) snapshot=22225674240
             Total committed heap usage (bytes)=2925002752
         Shuffle Errors
             BAD_ID=0
             CONNECTION=0
             IO_ERROR=0
             WRONG_LENGTH=0
             WRONG_MAP=0
             WRONG_REDUCE=0
         File Input Format Counters
             Bytes Read=0
         File Output Format Counters
             Bytes Written=0

And here is the exception from next job:

     Failing Oozie Launcher, Main class

     [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, 
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path 
does not exist: file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
     org.apache.oozie.action.hadoop.JavaMainException: 
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path 
does not exist: file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
         at 
org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
         at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:497)
         at 
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
     Caused by: 
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path 
does not exist: file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
         at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
         at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
         at 
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
         at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
         at 
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
         at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
         at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:422)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
         at 
com.nissatech.kmedoidsusingfames.algorithms.initialization.RandomSeedDriver.generateRandomSeed(RandomSeedDriver.java:52)
         at 
com.nissatech.kmedoidsusingfames.algorithms.initialization.ScalableKMeansPPInitialization.performInitialization(ScalableKMeansPPInitialization.java:43)
         at 
com.nissatech.kmedoidsusingfames.algorithms.kmedoids.KMedoidsUsingFAMES.perform(KMedoidsUsingFAMES.java:54)
         at 
com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmRepetitor.performIteratingForSameNoOfClusters(ClusteringAlgorithmRepetitor.java:43)
         at 
com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmIterator.performTraining(ClusteringAlgorithmIterator.java:46)
         at 
com.nissatech.kmedoidsusingfames.orchestration.Orchestrator.main(Orchestrator.java:74)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:497)
         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)
         ... 15 more

It seems to me that first job is run locally and hence there is no 
result for the next one on the HDFS. Am I wrong?

___________________________


I was able to make my MR job run on HDP cluster by adding this to 
configuration (based on the following link):

     Configuration conf = new Configuration(false);
     conf.addResource(new Path("file:///", 
System.getProperty("oozie.action.conf.xml")));

But why do I need to do that and how to avoid it? I have a sequence of 
MR jobs run from this Java action and I don't won't to bind myself to 
using Oozie and adding this to config of each job. Is there a way to 
make my jobs run on cluster from Oozie by default?

I should probably mention that this is an HDP cluster and setup was 
performed through Ambari.
-- 
signature *Marko Dinic'*
/Software engineer @/
Nissatech
Kajmakc(alanska 8
18000 Nis(, Serbia
website <http://www.nissatech.com> | email 
<ma...@nissatech.com>
tel/fax: +381 18 288 111
mobile: +381 63 82 49 556
skype: vesto91

Re: MR jobs from Java action run locally

Posted by Micah Whitacre <mk...@gmail.com>.

Did you try adding?

conf.addResource("mapred-site.xml");
conf.addResource("yarn-site.xml");


If that doesn't work then I'd guess the config on your Oozie server might
not be setup correctly to have the right RM configuration.

On Fri, May 6, 2016 at 3:30 AM, Marko Dinic <ma...@nissatech.com>
wrote:

> Hello Micah,
>
> Thank you for your answer. There are a couple of problems with this
> approach in my case:
>
> - When I use the Job definition that you have given (using Configured and
> Tool) my configuration still gets initialized to local.
> - My jobs are generally not defined as classes with main method, but there
> is only one main() method in class which performs orchestration, and it
> uses Job definitions in separate classes which do not have a main method.
> That is, I am not able to implement Tool since my job definitions don't
> have a main class.
>
> I do not understand why is my configuration initialized to local, do you
> have any idea? So I still have:
>
> mapreduce.jobtracker.address = localmapreduce.framework.name = local
>
> I do get execution on cluster when I add:
>
> conf.addResource(new Path("file:///", System.getProperty("oozie.action.conf.xml")));
>
> But then I have the problem that some of the jars are added to my
> Distributed cache, which makes me a problem when I try to get something
> that I added to it (since I don't know it's location any-longer). Fore
> example, here's what's located in my distributed cache:
>
> /user/hdfs/training/lib/KMedoidsUsingFAMES-2.0-SNAPSHOT-jar-with-dependencies.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/aws-java-sdk-1.7.4.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/azure-storage-2.2.0.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/commons-lang3-3.3.2.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/guava-11.0.2.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/hadoop-aws-2.7.1.2.3.4.0-3485.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/hadoop-azure-2.7.1.2.3.4.0-3485.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/jackson-annotations-2.2.3.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/jackson-core-2.2.3.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/jackson-databind-2.2.3.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/joda-time-2.1.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/json-simple-1.1.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/oozie-hadoop-utils-hadoop-2-4.2.0.2.3.4.0-3485.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/oozie-sharelib-oozie-4.2.0.2.3.4.0-3485.jar
> /user/hdfs/sessions/777/11072010/initialSeed/part-r-00000
>
>
> As you can see, the file that I have added to distributed cache is now
> last (and was first before), so this could be a problem for me.
>
> Are you aware of such behaviour - that Distributed cache gets "polluted"
> by jar locations that you don't specify?
>
> Best regards,
>
>
> On 05/05/2016 05:51 PM, Micah Whitacre wrote:
>
> Not sure how your main class is structured but a lot of our Java Actions
> extend the Hadoop Tool class
>
> public class MyJob extends Configured implements Tool {    public static void main(String[] args) throws Exception {
>         MyJob job = new MyJob();
>
>         ToolRunner.run(new Configuration(), job, args);
>     }
>
>     @Override    public int run(String[] args) throws Exception {
>         Configuration config = getConf();
>
>         //do stuff
>
>         return 0;
>     }
>
> }
>
>  Inside the run method it will usually have populated the config for kicking off jobs.  We have found in some occasions that adding the Oozie conf helps in secured clusters when dealing with tokens etc.  So we have code that looks like the following:
>
>  if (System.getProperty("oozie.action.conf.xml") != null) {
>     conf.addResource(new Path("file:///", System.getProperty("oozie.action.conf.xml")));
> }
>
> conf.addResource("core-site.xml");
> conf.addResource("hdfs-site.xml");
> conf.addResource("mapred-site.xml");
> conf.addResource("yarn-site.xml");
> conf.addResource("hive-site.xml");
>
> With that code we can handle running on the command line or through Oozie without caring.  Also we can talk to Hive without extra command line config.
>
>
>
> On Thu, May 5, 2016 at 9:31 AM, Marko Dinic <ma...@nissatech.com>
> wrote:
>
>> I should add that this is what my Configuration looks like when I create
>> it using default constructor
>>
>> Configuration conf = new Configuration();
>>
>> mapreduce.jobtracker.address = localmapreduce.framework.name = local
>>
>> And here is what happens when using
>>
>> Configuration conf = new Configuration(false);
>> conf.addResource(new Path("file:///", System.getProperty("oozie.action.conf.xml")));
>>
>> mapreduce.jobtracker.address = 192.168.84.27:8050mapreduce.framework.name = yarn
>>
>> Any help would be highly appreciated.
>>
>>
>> On 05/05/2016 10:39 AM, Marko Dinic wrote:
>>
>> Hello everyone,
>>
>> I'm trying to run a sequence of MR jobs using Java action for their
>> drivers in Oozie.
>>
>> The problem is that MR job are run locally instead on Hadoop cluster. How
>> to fix this?
>>
>> First job reads from HBase, performs some processing and puts the result
>> on HDFS, while next job should read from it. There are 10 mappers in first
>> job, but I'm only showing the last one as an example.
>>
>> Here is the error log from HBase MR job:
>>
>>         Aw==, start row: 9-777-1123456789113, end row:
>> 9-777-1123456789114, region location: hdp-slave1.nissatech.local:16020)
>>     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>> identifier=hconnection-0x860ce79 connecting to ZooKeeper ensemble=
>> 192.168.84.27:2181
>>     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=
>> 192.168.84.27:2181 sessionTimeout=90000
>> watcher=hconnection-0x860ce790x0, quorum=192.168.84.27:2181,
>> baseZNode=/hbase-unsecure
>>     2016-05-04 14:33:48,378 INFO [LocalJobRunner Map Task Executor
>> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn:
>> Opening socket connection to server 192.168.84.27/192.168.84.27:2181.
>> Will not attempt to authenticate using SASL (unknown error)
>>     2016-05-04 14:33:48,379 INFO [LocalJobRunner Map Task Executor
>> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn:
>> Socket connection established to 192.168.84.27/192.168.84.27:2181,
>> initiating session
>>     2016-05-04 14:33:48,391 INFO [LocalJobRunner Map Task Executor
>> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn:
>> Session establishment complete on server 192.168.84.27/192.168.84.27:2181,
>> sessionid = 0x152f8f85214096b, negotiated timeout = 40000
>>     2016-05-04 14:33:48,394 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Input split length:
>> 0 bytes.
>>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
>>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
>>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: soft limit at 83886080
>>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
>>     2016-05-04 14:33:48,591 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
>>     2016-05-04 14:33:48,592 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: Map output collector class =
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer
>>     2016-05-04 14:33:48,801 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.LocalJobRunner:
>>     2016-05-04 14:33:48,802 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>> Closing zookeeper sessionid=0x152f8f85214096b
>>     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor
>> #0-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
>>     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.zookeeper.ZooKeeper: Session: 0x152f8f85214096b closed
>>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: Starting flush of map output
>>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: Spilling map output
>>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 5734062; bufvoid =
>> 104857600
>>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend =
>> 26210008(104840032); length = 4389/6553600
>>     2016-05-04 14:33:48,874 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: Finished spill 0
>>     2016-05-04 14:33:48,877 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.Task: Task:attempt_local1149688163_0001_m_000009_0
>> is done. And is in the process of committing
>>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.LocalJobRunner: map
>>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.Task: Task
>> 'attempt_local1149688163_0001_m_000009_0' done.
>>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.LocalJobRunner: Finishing task:
>> attempt_local1149688163_0001_m_000009_0
>>     2016-05-04 14:33:48,897 INFO [Thread-42]
>> org.apache.hadoop.mapred.LocalJobRunner: map task executor complete.
>>     2016-05-04 14:33:48,901 INFO [Thread-42]
>> org.apache.hadoop.mapred.LocalJobRunner: Waiting for reduce tasks
>>     2016-05-04 14:33:48,901 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.LocalJobRunner: Starting task:
>> attempt_local1149688163_0001_r_000000_0
>>     2016-05-04 14:33:48,918 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output
>> Committer Algorithm version is 1
>>     2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
>> FileOutputCommitter skip cleanup _temporary folders under output
>> directory:false, ignore cleanup failures: false
>>     2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
>>     2016-05-04 14:33:48,932 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin:
>> org.apache.hadoop.mapreduce.task.reduce.Shuffle@697f13c9
>>     2016-05-04 14:33:48,959 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager:
>> memoryLimit=289931264, maxSingleShuffleLimit=72482816,
>> mergeThreshold=191354640, ioSortFactor=10, memToMemMergeOutputsThreshold=10
>>     2016-05-04 14:33:48,965 INFO [EventFetcher for fetching Map
>> Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
>> attempt_local1149688163_0001_r_000000_0 Thread started: EventFetcher for
>> fetching Map Completion Events
>>     2016-05-04 14:33:49,035 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000007_0 decomp:
>> 5381537 len: 5381541 to MEMORY
>>     2016-05-04 14:33:49,056 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5381537
>> bytes from map-output for attempt_local1149688163_0001_m_000007_0
>>     2016-05-04 14:33:49,061 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5381537, inMemoryMapOutputs.size() -> 1,
>> commitMemory -> 0, usedMemory ->5381537
>>     2016-05-04 14:33:49,070 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000000_0 decomp:
>> 5472201 len: 5472205 to MEMORY
>>     2016-05-04 14:33:49,084 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5472201
>> bytes from map-output for attempt_local1149688163_0001_m_000000_0
>>     2016-05-04 14:33:49,084 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5472201, inMemoryMapOutputs.size() -> 2,
>> commitMemory -> 5381537, usedMemory ->10853738
>>     2016-05-04 14:33:49,110 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000001_0 decomp:
>> 5387977 len: 5387981 to MEMORY
>>     2016-05-04 14:33:49,124 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5387977
>> bytes from map-output for attempt_local1149688163_0001_m_000001_0
>>     2016-05-04 14:33:49,125 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5387977, inMemoryMapOutputs.size() -> 3,
>> commitMemory -> 10853738, usedMemory ->16241715
>>     2016-05-04 14:33:49,129 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000004_0 decomp:
>> 5347914 len: 5347918 to MEMORY
>>     2016-05-04 14:33:49,143 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5347914
>> bytes from map-output for attempt_local1149688163_0001_m_000004_0
>>     2016-05-04 14:33:49,144 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5347914, inMemoryMapOutputs.size() -> 4,
>> commitMemory -> 16241715, usedMemory ->21589629
>>     2016-05-04 14:33:49,148 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000002_0 decomp:
>> 5671398 len: 5671402 to MEMORY
>>     2016-05-04 14:33:49,161 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5671398
>> bytes from map-output for attempt_local1149688163_0001_m_000002_0
>>     2016-05-04 14:33:49,161 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5671398, inMemoryMapOutputs.size() -> 5,
>> commitMemory -> 21589629, usedMemory ->27261027
>>     2016-05-04 14:33:49,166 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000005_0 decomp:
>> 5743249 len: 5743253 to MEMORY
>>     2016-05-04 14:33:49,180 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5743249
>> bytes from map-output for attempt_local1149688163_0001_m_000005_0
>>     2016-05-04 14:33:49,180 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5743249, inMemoryMapOutputs.size() -> 6,
>> commitMemory -> 27261027, usedMemory ->33004276
>>     2016-05-04 14:33:49,184 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000008_0 decomp:
>> 5471488 len: 5471492 to MEMORY
>>     2016-05-04 14:33:49,197 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5471488
>> bytes from map-output for attempt_local1149688163_0001_m_000008_0
>>     2016-05-04 14:33:49,197 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5471488, inMemoryMapOutputs.size() -> 7,
>> commitMemory -> 33004276, usedMemory ->38475764
>>     2016-05-04 14:33:49,313 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000003_0 decomp:
>> 5579502 len: 5579506 to MEMORY
>>     2016-05-04 14:33:49,327 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5579502
>> bytes from map-output for attempt_local1149688163_0001_m_000003_0
>>     2016-05-04 14:33:49,327 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5579502, inMemoryMapOutputs.size() -> 8,
>> commitMemory -> 38475764, usedMemory ->44055266
>>     2016-05-04 14:33:49,332 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000006_0 decomp:
>> 5605456 len: 5605460 to MEMORY
>>     2016-05-04 14:33:49,344 INFO [main] org.apache.hadoop.mapreduce.Job:
>> map 100% reduce 0%
>>     2016-05-04 14:33:49,349 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5605456
>> bytes from map-output for attempt_local1149688163_0001_m_000006_0
>>     2016-05-04 14:33:49,349 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5605456, inMemoryMapOutputs.size() -> 9,
>> commitMemory -> 44055266, usedMemory ->49660722
>>     2016-05-04 14:33:49,354 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000009_0 decomp:
>> 5738455 len: 5738459 to MEMORY
>>     2016-05-04 14:33:49,370 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5738455
>> bytes from map-output for attempt_local1149688163_0001_m_000009_0
>>     2016-05-04 14:33:49,370 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5738455, inMemoryMapOutputs.size() -> 10,
>> commitMemory -> 49660722, usedMemory ->55399177
>>     2016-05-04 14:33:49,373 INFO [EventFetcher for fetching Map
>> Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
>> EventFetcher is interrupted.. Returning
>>     2016-05-04 14:33:49,375 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>>     2016-05-04 14:33:49,376 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge called
>> with 10 in-memory map-outputs and 0 on-disk map-outputs
>>     2016-05-04 14:33:49,388 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Merger: Merging 10 sorted segments
>>     2016-05-04 14:33:49,389 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
>> segments left of total size: 55398877 bytes
>>     2016-05-04 14:33:49,711 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 10
>> segments, 55399177 bytes to disk to satisfy reduce memory limit
>>     2016-05-04 14:33:49,712 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 1 files,
>> 55399163 bytes from disk
>>     2016-05-04 14:33:49,713 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0
>> segments, 0 bytes from memory into reduce
>>     2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
>>     2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1
>> segments left of total size: 55399129 bytes
>>     2016-05-04 14:33:49,715 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>>     2016-05-04 14:33:49,742 INFO [Thread-42]
>> org.apache.hadoop.mapred.LocalJobRunner: reduce task executor complete.
>>     2016-05-04 14:33:49,797 WARN [Thread-42]
>> org.apache.hadoop.mapred.LocalJobRunner: job_local1149688163_0001
>>     java.lang.Exception: java.io.IOException: Mkdirs failed to create
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
>> (exists=false, cwd=
>> file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002
>> )
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
>>     Caused by: java.io.IOException: Mkdirs failed to create
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
>> (exists=false, cwd=
>> file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002
>> )
>>         at
>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449)
>>         at
>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
>>         at
>> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1074)
>>         at
>> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
>>         at
>> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
>>         at
>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
>>         at
>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
>>         at
>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:540)
>>         at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>         at java.lang.Thread.run(Thread.java:745)
>>     2016-05-04 14:33:50,346 INFO [main] org.apache.hadoop.mapreduce.Job:
>> Job job_local1149688163_0001 failed with state FAILED due to: NA
>>     2016-05-04 14:33:50,407 INFO [main] org.apache.hadoop.mapreduce.Job:
>> Counters: 38
>>         File System Counters
>>             FILE: Number of bytes read=1287449333
>>             FILE: Number of bytes written=1607139426
>>             FILE: Number of read operations=0
>>             FILE: Number of large read operations=0
>>             FILE: Number of write operations=0
>>             HDFS: Number of bytes read=1111590
>>             HDFS: Number of bytes written=220
>>             HDFS: Number of read operations=40
>>             HDFS: Number of large read operations=0
>>             HDFS: Number of write operations=20
>>         Map-Reduce Framework
>>             Map input records=10906
>>             Map output records=10906
>>             Map output bytes=55355550
>>             Map output materialized bytes=55399217
>>             Input split bytes=2900
>>             Combine input records=0
>>             Combine output records=0
>>             Reduce input groups=0
>>             Reduce shuffle bytes=55399217
>>             Reduce input records=0
>>             Reduce output records=0
>>             Spilled Records=10906
>>             Shuffled Maps =10
>>             Failed Shuffles=0
>>             Merged Map outputs=10
>>             GC time elapsed (ms)=641
>>             CPU time spent (ms)=11290
>>             Physical memory (bytes) snapshot=4507889664
>>             Virtual memory (bytes) snapshot=22225674240
>>             Total committed heap usage (bytes)=2925002752
>>         Shuffle Errors
>>             BAD_ID=0
>>             CONNECTION=0
>>             IO_ERROR=0
>>             WRONG_LENGTH=0
>>             WRONG_MAP=0
>>             WRONG_REDUCE=0
>>         File Input Format Counters
>>             Bytes Read=0
>>         File Output Format Counters
>>             Bytes Written=0
>>
>> And here is the exception from next job:
>>
>>     Failing Oozie Launcher, Main class
>>
>>     [org.apache.oozie.action.hadoop.JavaMain], main() threw exception,
>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
>> does not exist:
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>>     org.apache.oozie.action.hadoop.JavaMainException:
>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
>> does not exist:
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>>         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
>>         at
>> org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
>>         at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:497)
>>         at
>> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
>>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>     Caused by:
>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
>> does not exist:
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>>         at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
>>         at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
>>         at
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
>>         at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
>>         at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
>>         at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
>>         at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
>>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>>         at
>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.initialization.RandomSeedDriver.generateRandomSeed(RandomSeedDriver.java:52)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.initialization.ScalableKMeansPPInitialization.performInitialization(ScalableKMeansPPInitialization.java:43)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.kmedoids.KMedoidsUsingFAMES.perform(KMedoidsUsingFAMES.java:54)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmRepetitor.performIteratingForSameNoOfClusters(ClusteringAlgorithmRepetitor.java:43)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmIterator.performTraining(ClusteringAlgorithmIterator.java:46)
>>         at
>> com.nissatech.kmedoidsusingfames.orchestration.Orchestrator.main(Orchestrator.java:74)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:497)
>>         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)
>>         ... 15 more
>>
>> It seems to me that first job is run locally and hence there is no result
>> for the next one on the HDFS. Am I wrong?
>>
>> ___________________________
>>
>>
>> I was able to make my MR job run on HDP cluster by adding this to
>> configuration (based on the following link):
>>
>>     Configuration conf = new Configuration(false);
>>     conf.addResource(new Path("file:///",
>> System.getProperty("oozie.action.conf.xml")));
>>
>> But why do I need to do that and how to avoid it? I have a sequence of MR
>> jobs run from this Java action and I don't won't to bind myself to using
>> Oozie and adding this to config of each job. Is there a way to make my jobs
>> run on cluster from Oozie by default?
>>
>> I should probably mention that this is an HDP cluster and setup was
>> performed through Ambari.
>> --
>> *Marko Dinić*
>> *Software engineer @*
>> [image: Nissatech]
>> Kajmakčalanska 8
>> 18000 Niš, Serbia
>> website <http://www.nissatech.com> | email <ma...@nissatech.com>
>> tel/fax: +381 18 288 111 <%2B381%2018%20288%20111>
>> mobile: +381 63 82 49 556
>> skype: vesto91
>>
>>
>> --
>> *Marko Dinić*
>> *Software engineer @*
>> [image: Nissatech]
>> Kajmakčalanska 8
>> 18000 Niš, Serbia
>> website <http://www.nissatech.com> | email <ma...@nissatech.com>
>> tel/fax: +381 18 288 111 <%2B381%2018%20288%20111>
>> mobile: +381 63 82 49 556
>> skype: vesto91
>>
>
>
> --
> *Marko Dinić*
> *Software engineer @*
> [image: Nissatech]
> Kajmakčalanska 8
> 18000 Niš, Serbia
> website <http://www.nissatech.com> | email <ma...@nissatech.com>
> tel/fax: +381 18 288 111
> mobile: +381 63 82 49 556
> skype: vesto91
>

Re: MR jobs from Java action run locally

Posted by Marko Dinic <ma...@nissatech.com>.

Hello Micah,

Thank you for your answer. There are a couple of problems with this 
approach in my case:

- When I use the Job definition that you have given (using Configured 
and Tool) my configuration still gets initialized to local.
- My jobs are generally not defined as classes with main method, but 
there is only one main() method in class which performs orchestration, 
and it uses Job definitions in separate classes which do not have a main 
method. That is, I am not able to implement Tool since my job 
definitions don't have a main class.

I do not understand why is my configuration initialized to local, do you 
have any idea? So I still have:

|mapreduce.jobtracker.address = local
mapreduce.framework.name  <http://mapreduce.framework.name>  = local|

I do get execution on cluster when I add:

conf.addResource(newPath("file:///", System.getProperty("oozie.action.conf.xml")));

But then I have the problem that some of the jars are added to my 
Distributed cache, which makes me a problem when I try to get something 
that I added to it (since I don't know it's location any-longer). Fore 
example, here's what's located in my distributed cache:

/user/hdfs/training/lib/KMedoidsUsingFAMES-2.0-SNAPSHOT-jar-with-dependencies.jar
/user/oozie/share/lib/lib_20160128122044/oozie/aws-java-sdk-1.7.4.jar
/user/oozie/share/lib/lib_20160128122044/oozie/azure-storage-2.2.0.jar
/user/oozie/share/lib/lib_20160128122044/oozie/commons-lang3-3.3.2.jar
/user/oozie/share/lib/lib_20160128122044/oozie/guava-11.0.2.jar
/user/oozie/share/lib/lib_20160128122044/oozie/hadoop-aws-2.7.1.2.3.4.0-3485.jar
/user/oozie/share/lib/lib_20160128122044/oozie/hadoop-azure-2.7.1.2.3.4.0-3485.jar
/user/oozie/share/lib/lib_20160128122044/oozie/jackson-annotations-2.2.3.jar
/user/oozie/share/lib/lib_20160128122044/oozie/jackson-core-2.2.3.jar
/user/oozie/share/lib/lib_20160128122044/oozie/jackson-databind-2.2.3.jar
/user/oozie/share/lib/lib_20160128122044/oozie/joda-time-2.1.jar
/user/oozie/share/lib/lib_20160128122044/oozie/json-simple-1.1.jar
/user/oozie/share/lib/lib_20160128122044/oozie/oozie-hadoop-utils-hadoop-2-4.2.0.2.3.4.0-3485.jar
/user/oozie/share/lib/lib_20160128122044/oozie/oozie-sharelib-oozie-4.2.0.2.3.4.0-3485.jar
/user/hdfs/sessions/777/11072010/initialSeed/part-r-00000


As you can see, the file that I have added to distributed cache is now 
last (and was first before), so this could be a problem for me.

Are you aware of such behaviour - that Distributed cache gets "polluted" 
by jar locations that you don't specify?

Best regards,

On 05/05/2016 05:51 PM, Micah Whitacre wrote:
> Not sure how your main class is structured but a lot of our Java 
> Actions extend the Hadoop Tool class
>
> public classMyJobextendsConfiguredimplementsTool {
>
>      public static voidmain(String[] args)throwsException {
>          MyJob job = new MyJob();
>
>          ToolRunner.run(new Configuration(), job, args);
>      }
>
>      @Override
>      public intrun(String[] args)throwsException {
>          Configuration config = getConf();
>          //do stuff
>          return0;
>      }
> }
>
> Inside the run method it will usually have populated the config for kicking off jobs.  We have found in some occasions that adding the Oozie conf
> helps in secured clusters when dealing with tokens etc.  So we have code that looks like the following:
>
> if(System.getProperty("oozie.action.conf.xml") !=null) {
>      conf.addResource(newPath("file:///", System.getProperty("oozie.action.conf.xml")));
> }
>
> conf.addResource("core-site.xml");
> conf.addResource("hdfs-site.xml");
> conf.addResource("mapred-site.xml");
> conf.addResource("yarn-site.xml");
> conf.addResource("hive-site.xml");
> With that code we can handle running on the command line or through Oozie without caring.  Also we can talk to Hive without extra command line config.
>
>
> On Thu, May 5, 2016 at 9:31 AM, Marko Dinic <marko.dinic@nissatech.com 
> <ma...@nissatech.com>> wrote:
>
>     I should add that this is what my Configuration looks like when I
>     create it using default constructor
>
>     Configuration conf = new Configuration();
>
>     |mapreduce.jobtracker.address = local
>     mapreduce.framework.name  <http://mapreduce.framework.name>  = local|
>
>     And here is what happens when using
>
>     |Configuration conf = new Configuration(false);
>     conf.addResource(new Path("file:///", System.getProperty("oozie.action.conf.xml")));|
>
>     |mapreduce.jobtracker.address =192.168.84.27:8050  <http://192.168.84.27:8050>
>     mapreduce.framework.name  <http://mapreduce.framework.name>  = yarn|
>
>     Any help would be highly appreciated.
>
>
>     On 05/05/2016 10:39 AM, Marko Dinic wrote:
>>     Hello everyone,
>>
>>     I'm trying to run a sequence of MR jobs using Java action for
>>     their drivers in Oozie.
>>
>>     The problem is that MR job are run locally instead on Hadoop
>>     cluster. How to fix this?
>>
>>     First job reads from HBase, performs some processing and puts the
>>     result on HDFS, while next job should read from it. There are 10
>>     mappers in first job, but I'm only showing the last one as an
>>     example.
>>
>>     Here is the error log from HBase MR job:
>>
>>             Aw==, start row: 9-777-1123456789113, end row:
>>     9-777-1123456789114, region location:
>>     hdp-slave1.nissatech.local:16020)
>>         2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task
>>     Executor #0]
>>     org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>>     identifier=hconnection-0x860ce79 connecting to ZooKeeper
>>     ensemble=192.168.84.27:2181 <http://192.168.84.27:2181>
>>         2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.zookeeper.ZooKeeper: Initiating client
>>     connection, connectString=192.168.84.27:2181
>>     <http://192.168.84.27:2181> sessionTimeout=90000
>>     watcher=hconnection-0x860ce790x0, quorum=192.168.84.27:2181
>>     <http://192.168.84.27:2181>, baseZNode=/hbase-unsecure
>>         2016-05-04 14:33:48,378 INFO [LocalJobRunner Map Task
>>     Executor #0-SendThread(192.168.84.27:2181
>>     <http://192.168.84.27:2181>)] org.apache.zookeeper.ClientCnxn:
>>     Opening socket connection to server
>>     192.168.84.27/192.168.84.27:2181
>>     <http://192.168.84.27/192.168.84.27:2181>. Will not attempt to
>>     authenticate using SASL (unknown error)
>>         2016-05-04 14:33:48,379 INFO [LocalJobRunner Map Task
>>     Executor #0-SendThread(192.168.84.27:2181
>>     <http://192.168.84.27:2181>)] org.apache.zookeeper.ClientCnxn:
>>     Socket connection established to 192.168.84.27/192.168.84.27:2181
>>     <http://192.168.84.27/192.168.84.27:2181>, initiating session
>>         2016-05-04 14:33:48,391 INFO [LocalJobRunner Map Task
>>     Executor #0-SendThread(192.168.84.27:2181
>>     <http://192.168.84.27:2181>)] org.apache.zookeeper.ClientCnxn:
>>     Session establishment complete on server
>>     192.168.84.27/192.168.84.27:2181
>>     <http://192.168.84.27/192.168.84.27:2181>, sessionid =
>>     0x152f8f85214096b, negotiated timeout = 40000
>>         2016-05-04 14:33:48,394 INFO [LocalJobRunner Map Task
>>     Executor #0]
>>     org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Input
>>     split length: 0 bytes.
>>         2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi
>>     26214396(104857584)
>>         2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask:
>>     mapreduce.task.io.sort.mb: 100
>>         2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: soft limit at 83886080
>>         2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: bufstart = 0;
>>     bufvoid = 104857600
>>         2016-05-04 14:33:48,591 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: kvstart =
>>     26214396; length = 6553600
>>         2016-05-04 14:33:48,592 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: Map output
>>     collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
>>         2016-05-04 14:33:48,801 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.LocalJobRunner:
>>         2016-05-04 14:33:48,802 INFO [LocalJobRunner Map Task
>>     Executor #0]
>>     org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>>     Closing zookeeper sessionid=0x152f8f85214096b
>>         2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task
>>     Executor #0-EventThread] org.apache.zookeeper.ClientCnxn:
>>     EventThread shut down
>>         2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.zookeeper.ZooKeeper: Session:
>>     0x152f8f85214096b closed
>>         2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: Starting flush of
>>     map output
>>         2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: Spilling map output
>>         2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: bufstart = 0;
>>     bufend = 5734062; bufvoid = 104857600
>>         2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: kvstart =
>>     26214396(104857584); kvend = 26210008(104840032); length =
>>     4389/6553600
>>         2016-05-04 14:33:48,874 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.MapTask: Finished spill 0
>>         2016-05-04 14:33:48,877 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.Task:
>>     Task:attempt_local1149688163_0001_m_000009_0 is done. And is in
>>     the process of committing
>>         2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.LocalJobRunner: map
>>         2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.Task: Task
>>     'attempt_local1149688163_0001_m_000009_0' done.
>>         2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task
>>     Executor #0] org.apache.hadoop.mapred.LocalJobRunner: Finishing
>>     task: attempt_local1149688163_0001_m_000009_0
>>         2016-05-04 14:33:48,897 INFO [Thread-42]
>>     org.apache.hadoop.mapred.LocalJobRunner: map task executor complete.
>>         2016-05-04 14:33:48,901 INFO [Thread-42]
>>     org.apache.hadoop.mapred.LocalJobRunner: Waiting for reduce tasks
>>         2016-05-04 14:33:48,901 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapred.LocalJobRunner: Starting task:
>>     attempt_local1149688163_0001_r_000000_0
>>         2016-05-04 14:33:48,918 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File
>>     Output Committer Algorithm version is 1
>>         2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
>>     FileOutputCommitter skip cleanup _temporary folders under output
>>     directory:false, ignore cleanup failures: false
>>         2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapred.Task:  Using
>>     ResourceCalculatorProcessTree : [ ]
>>         2016-05-04 14:33:48,932 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin:
>>     org.apache.hadoop.mapreduce.task.reduce.Shuffle@697f13c9
>>         2016-05-04 14:33:48,959 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     MergerManager: memoryLimit=289931264,
>>     maxSingleShuffleLimit=72482816, mergeThreshold=191354640,
>>     ioSortFactor=10, memToMemMergeOutputsThreshold=10
>>         2016-05-04 14:33:48,965 INFO [EventFetcher for fetching Map
>>     Completion Events]
>>     org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
>>     attempt_local1149688163_0001_r_000000_0 Thread started:
>>     EventFetcher for fetching Map Completion Events
>>         2016-05-04 14:33:49,035 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000007_0 decomp: 5381537 len:
>>     5381541 to MEMORY
>>         2016-05-04 14:33:49,056 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5381537 bytes from map-output for
>>     attempt_local1149688163_0001_m_000007_0
>>         2016-05-04 14:33:49,061 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5381537,
>>     inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory
>>     ->5381537
>>         2016-05-04 14:33:49,070 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000000_0 decomp: 5472201 len:
>>     5472205 to MEMORY
>>         2016-05-04 14:33:49,084 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5472201 bytes from map-output for
>>     attempt_local1149688163_0001_m_000000_0
>>         2016-05-04 14:33:49,084 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5472201,
>>     inMemoryMapOutputs.size() -> 2, commitMemory -> 5381537,
>>     usedMemory ->10853738
>>         2016-05-04 14:33:49,110 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000001_0 decomp: 5387977 len:
>>     5387981 to MEMORY
>>         2016-05-04 14:33:49,124 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5387977 bytes from map-output for
>>     attempt_local1149688163_0001_m_000001_0
>>         2016-05-04 14:33:49,125 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5387977,
>>     inMemoryMapOutputs.size() -> 3, commitMemory -> 10853738,
>>     usedMemory ->16241715
>>         2016-05-04 14:33:49,129 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000004_0 decomp: 5347914 len:
>>     5347918 to MEMORY
>>         2016-05-04 14:33:49,143 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5347914 bytes from map-output for
>>     attempt_local1149688163_0001_m_000004_0
>>         2016-05-04 14:33:49,144 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5347914,
>>     inMemoryMapOutputs.size() -> 4, commitMemory -> 16241715,
>>     usedMemory ->21589629
>>         2016-05-04 14:33:49,148 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000002_0 decomp: 5671398 len:
>>     5671402 to MEMORY
>>         2016-05-04 14:33:49,161 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5671398 bytes from map-output for
>>     attempt_local1149688163_0001_m_000002_0
>>         2016-05-04 14:33:49,161 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5671398,
>>     inMemoryMapOutputs.size() -> 5, commitMemory -> 21589629,
>>     usedMemory ->27261027
>>         2016-05-04 14:33:49,166 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000005_0 decomp: 5743249 len:
>>     5743253 to MEMORY
>>         2016-05-04 14:33:49,180 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5743249 bytes from map-output for
>>     attempt_local1149688163_0001_m_000005_0
>>         2016-05-04 14:33:49,180 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5743249,
>>     inMemoryMapOutputs.size() -> 6, commitMemory -> 27261027,
>>     usedMemory ->33004276
>>         2016-05-04 14:33:49,184 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000008_0 decomp: 5471488 len:
>>     5471492 to MEMORY
>>         2016-05-04 14:33:49,197 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5471488 bytes from map-output for
>>     attempt_local1149688163_0001_m_000008_0
>>         2016-05-04 14:33:49,197 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5471488,
>>     inMemoryMapOutputs.size() -> 7, commitMemory -> 33004276,
>>     usedMemory ->38475764
>>         2016-05-04 14:33:49,313 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000003_0 decomp: 5579502 len:
>>     5579506 to MEMORY
>>         2016-05-04 14:33:49,327 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5579502 bytes from map-output for
>>     attempt_local1149688163_0001_m_000003_0
>>         2016-05-04 14:33:49,327 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5579502,
>>     inMemoryMapOutputs.size() -> 8, commitMemory -> 38475764,
>>     usedMemory ->44055266
>>         2016-05-04 14:33:49,332 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000006_0 decomp: 5605456 len:
>>     5605460 to MEMORY
>>         2016-05-04 14:33:49,344 INFO [main]
>>     org.apache.hadoop.mapreduce.Job:  map 100% reduce 0%
>>         2016-05-04 14:33:49,349 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5605456 bytes from map-output for
>>     attempt_local1149688163_0001_m_000006_0
>>         2016-05-04 14:33:49,349 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5605456,
>>     inMemoryMapOutputs.size() -> 9, commitMemory -> 44055266,
>>     usedMemory ->49660722
>>         2016-05-04 14:33:49,354 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
>>     localfetcher#1 about to shuffle output of map
>>     attempt_local1149688163_0001_m_000009_0 decomp: 5738455 len:
>>     5738459 to MEMORY
>>         2016-05-04 14:33:49,370 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
>>     5738455 bytes from map-output for
>>     attempt_local1149688163_0001_m_000009_0
>>         2016-05-04 14:33:49,370 INFO [localfetcher#1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     closeInMemoryFile -> map-output of size: 5738455,
>>     inMemoryMapOutputs.size() -> 10, commitMemory -> 49660722,
>>     usedMemory ->55399177
>>         2016-05-04 14:33:49,373 INFO [EventFetcher for fetching Map
>>     Completion Events]
>>     org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
>>     EventFetcher is interrupted.. Returning
>>         2016-05-04 14:33:49,375 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>>         2016-05-04 14:33:49,376 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
>>     finalMerge called with 10 in-memory map-outputs and 0 on-disk
>>     map-outputs
>>         2016-05-04 14:33:49,388 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapred.Merger: Merging 10 sorted segments
>>         2016-05-04 14:33:49,389 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
>>     with 10 segments left of total size: 55398877 bytes
>>         2016-05-04 14:33:49,711 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged
>>     10 segments, 55399177 bytes to disk to satisfy reduce memory limit
>>         2016-05-04 14:33:49,712 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging
>>     1 files, 55399163 bytes from disk
>>         2016-05-04 14:33:49,713 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging
>>     0 segments, 0 bytes from memory into reduce
>>         2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
>>         2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
>>     with 1 segments left of total size: 55399129 bytes
>>         2016-05-04 14:33:49,715 INFO [pool-9-thread-1]
>>     org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>>         2016-05-04 14:33:49,742 INFO [Thread-42]
>>     org.apache.hadoop.mapred.LocalJobRunner: reduce task executor
>>     complete.
>>         2016-05-04 14:33:49,797 WARN [Thread-42]
>>     org.apache.hadoop.mapred.LocalJobRunner: job_local1149688163_0001
>>         java.lang.Exception: java.io.IOException: Mkdirs failed to
>>     create
>>     file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
>>     (exists=false,
>>     cwd=file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002)
>>             at
>>     org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>>             at
>>     org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
>>         Caused by: java.io.IOException: Mkdirs failed to create
>>     file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
>>     (exists=false,
>>     cwd=file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002)
>>             at
>>     org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449)
>>             at
>>     org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
>>             at
>>     org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
>>             at
>>     org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1074)
>>             at
>>     org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
>>             at
>>     org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
>>             at
>>     org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
>>             at
>>     org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
>>             at
>>     org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:540)
>>             at
>>     org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
>>             at
>>     org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>>             at
>>     org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>>             at
>>     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>             at
>>     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>             at
>>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>             at java.lang.Thread.run(Thread.java:745)
>>         2016-05-04 14:33:50,346 INFO [main]
>>     org.apache.hadoop.mapreduce.Job: Job job_local1149688163_0001
>>     failed with state FAILED due to: NA
>>         2016-05-04 14:33:50,407 INFO [main]
>>     org.apache.hadoop.mapreduce.Job: Counters: 38
>>             File System Counters
>>                 FILE: Number of bytes read=1287449333
>>                 FILE: Number of bytes written=1607139426
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1111590
>>                 HDFS: Number of bytes written=220
>>                 HDFS: Number of read operations=40
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=20
>>             Map-Reduce Framework
>>                 Map input records=10906
>>                 Map output records=10906
>>                 Map output bytes=55355550
>>                 Map output materialized bytes=55399217
>>                 Input split bytes=2900
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=0
>>                 Reduce shuffle bytes=55399217
>>                 Reduce input records=0
>>                 Reduce output records=0
>>                 Spilled Records=10906
>>                 Shuffled Maps =10
>>                 Failed Shuffles=0
>>                 Merged Map outputs=10
>>                 GC time elapsed (ms)=641
>>                 CPU time spent (ms)=11290
>>                 Physical memory (bytes) snapshot=4507889664
>>     <tel:4507889664>
>>                 Virtual memory (bytes) snapshot=22225674240
>>                 Total committed heap usage (bytes)=2925002752
>>             Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=0
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>             File Input Format Counters
>>                 Bytes Read=0
>>             File Output Format Counters
>>                 Bytes Written=0
>>
>>     And here is the exception from next job:
>>
>>         Failing Oozie Launcher, Main class
>>
>>         [org.apache.oozie.action.hadoop.JavaMain], main() threw
>>     exception,
>>     org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
>>     Input path does not exist:
>>     file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>>     org.apache.oozie.action.hadoop.JavaMainException:
>>     org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
>>     Input path does not exist:
>>     file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>>             at
>>     org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
>>             at
>>     org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
>>             at
>>     org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>     Method)
>>             at
>>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>             at
>>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>             at java.lang.reflect.Method.invoke(Method.java:497)
>>             at
>>     org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
>>             at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>>             at
>>     org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>>             at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>>             at
>>     org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>             at java.security.AccessController.doPrivileged(Native Method)
>>             at javax.security.auth.Subject.doAs(Subject.java:422)
>>             at
>>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>             at
>>     org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>         Caused by:
>>     org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
>>     Input path does not exist:
>>     file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>>             at
>>     org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
>>             at
>>     org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
>>             at
>>     org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
>>             at
>>     org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
>>             at
>>     org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
>>             at
>>     org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
>>             at
>>     org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
>>             at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>>             at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>>             at java.security.AccessController.doPrivileged(Native Method)
>>             at javax.security.auth.Subject.doAs(Subject.java:422)
>>             at
>>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>             at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>>             at
>>     org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
>>             at
>>     com.nissatech.kmedoidsusingfames.algorithms.initialization.RandomSeedDriver.generateRandomSeed(RandomSeedDriver.java:52)
>>             at
>>     com.nissatech.kmedoidsusingfames.algorithms.initialization.ScalableKMeansPPInitialization.performInitialization(ScalableKMeansPPInitialization.java:43)
>>             at
>>     com.nissatech.kmedoidsusingfames.algorithms.kmedoids.KMedoidsUsingFAMES.perform(KMedoidsUsingFAMES.java:54)
>>             at
>>     com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmRepetitor.performIteratingForSameNoOfClusters(ClusteringAlgorithmRepetitor.java:43)
>>             at
>>     com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmIterator.performTraining(ClusteringAlgorithmIterator.java:46)
>>             at
>>     com.nissatech.kmedoidsusingfames.orchestration.Orchestrator.main(Orchestrator.java:74)
>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>     Method)
>>             at
>>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>             at
>>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>             at java.lang.reflect.Method.invoke(Method.java:497)
>>             at
>>     org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)
>>             ... 15 more
>>
>>     It seems to me that first job is run locally and hence there is
>>     no result for the next one on the HDFS. Am I wrong?
>>
>>     ___________________________
>>
>>
>>     I was able to make my MR job run on HDP cluster by adding this to
>>     configuration (based on the following link):
>>
>>         Configuration conf = new Configuration(false);
>>         conf.addResource(new Path("file:///",
>>     System.getProperty("oozie.action.conf.xml")));
>>
>>     But why do I need to do that and how to avoid it? I have a
>>     sequence of MR jobs run from this Java action and I don't won't
>>     to bind myself to using Oozie and adding this to config of each
>>     job. Is there a way to make my jobs run on cluster from Oozie by
>>     default?
>>
>>     I should probably mention that this is an HDP cluster and setup
>>     was performed through Ambari.
>>     -- 
>>     *Marko Dinic'*
>>     /Software engineer @/
>>     Nissatech
>>     Kajmakc(alanska 8
>>     18000 Nis(, Serbia
>>     website <http://www.nissatech.com> | email
>>     <ma...@nissatech.com>
>>     tel/fax: +381 18 288 111 <tel:%2B381%2018%20288%20111>
>>     mobile: +381 63 82 49 556
>>     skype: vesto91
>
>     -- 
>     *Marko Dinic'*
>     /Software engineer @/
>     Nissatech
>     Kajmakc(alanska 8
>     18000 Nis(, Serbia
>     website <http://www.nissatech.com> | email
>     <ma...@nissatech.com>
>     tel/fax: +381 18 288 111 <tel:%2B381%2018%20288%20111>
>     mobile: +381 63 82 49 556
>     skype: vesto91
>
>

-- 
signature *Marko Dinic'*
/Software engineer @/
Nissatech
Kajmakc(alanska 8
18000 Nis(, Serbia
website <http://www.nissatech.com> | email 
<ma...@nissatech.com>
tel/fax: +381 18 288 111
mobile: +381 63 82 49 556
skype: vesto91

Re: MR jobs from Java action run locally

Posted by Micah Whitacre <mk...@gmail.com>.

Not sure how your main class is structured but a lot of our Java Actions
extend the Hadoop Tool class

public class MyJob extends Configured implements Tool {

    public static void main(String[] args) throws Exception {
        MyJob job = new MyJob();

        ToolRunner.run(new Configuration(), job, args);
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration config = getConf();

        //do stuff

        return 0;
    }

}


Inside the run method it will usually have populated the config for
kicking off jobs.  We have found in some occasions that adding the
Oozie conf helps in secured clusters when dealing with tokens etc.  So
we have code that looks like the following:


if (System.getProperty("oozie.action.conf.xml") != null) {
    conf.addResource(new Path("file:///",
System.getProperty("oozie.action.conf.xml")));
}

conf.addResource("core-site.xml");
conf.addResource("hdfs-site.xml");
conf.addResource("mapred-site.xml");
conf.addResource("yarn-site.xml");
conf.addResource("hive-site.xml");


With that code we can handle running on the command line or through
Oozie without caring.  Also we can talk to Hive without extra command
line config.



On Thu, May 5, 2016 at 9:31 AM, Marko Dinic <ma...@nissatech.com>
wrote:

> I should add that this is what my Configuration looks like when I create
> it using default constructor
>
> Configuration conf = new Configuration();
>
> mapreduce.jobtracker.address = localmapreduce.framework.name = local
>
> And here is what happens when using
>
> Configuration conf = new Configuration(false);
> conf.addResource(new Path("file:///", System.getProperty("oozie.action.conf.xml")));
>
> mapreduce.jobtracker.address = 192.168.84.27:8050mapreduce.framework.name = yarn
>
> Any help would be highly appreciated.
>
>
> On 05/05/2016 10:39 AM, Marko Dinic wrote:
>
> Hello everyone,
>
> I'm trying to run a sequence of MR jobs using Java action for their
> drivers in Oozie.
>
> The problem is that MR job are run locally instead on Hadoop cluster. How
> to fix this?
>
> First job reads from HBase, performs some processing and puts the result
> on HDFS, while next job should read from it. There are 10 mappers in first
> job, but I'm only showing the last one as an example.
>
> Here is the error log from HBase MR job:
>
>         Aw==, start row: 9-777-1123456789113, end row:
> 9-777-1123456789114, region location: hdp-slave1.nissatech.local:16020)
>     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=hconnection-0x860ce79 connecting to ZooKeeper ensemble=
> 192.168.84.27:2181
>     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=
> 192.168.84.27:2181 sessionTimeout=90000 watcher=hconnection-0x860ce790x0,
> quorum=192.168.84.27:2181, baseZNode=/hbase-unsecure
>     2016-05-04 14:33:48,378 INFO [LocalJobRunner Map Task Executor
> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn:
> Opening socket connection to server 192.168.84.27/192.168.84.27:2181.
> Will not attempt to authenticate using SASL (unknown error)
>     2016-05-04 14:33:48,379 INFO [LocalJobRunner Map Task Executor
> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn:
> Socket connection established to 192.168.84.27/192.168.84.27:2181,
> initiating session
>     2016-05-04 14:33:48,391 INFO [LocalJobRunner Map Task Executor
> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn:
> Session establishment complete on server 192.168.84.27/192.168.84.27:2181,
> sessionid = 0x152f8f85214096b, negotiated timeout = 40000
>     2016-05-04 14:33:48,394 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Input split length:
> 0 bytes.
>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: soft limit at 83886080
>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
>     2016-05-04 14:33:48,591 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
>     2016-05-04 14:33:48,592 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: Map output collector class =
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer
>     2016-05-04 14:33:48,801 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.LocalJobRunner:
>     2016-05-04 14:33:48,802 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
> Closing zookeeper sessionid=0x152f8f85214096b
>     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor
> #0-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
>     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.zookeeper.ZooKeeper: Session: 0x152f8f85214096b closed
>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: Starting flush of map output
>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: Spilling map output
>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 5734062; bufvoid =
> 104857600
>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend =
> 26210008(104840032); length = 4389/6553600
>     2016-05-04 14:33:48,874 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.MapTask: Finished spill 0
>     2016-05-04 14:33:48,877 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.Task: Task:attempt_local1149688163_0001_m_000009_0
> is done. And is in the process of committing
>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.LocalJobRunner: map
>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.Task: Task
> 'attempt_local1149688163_0001_m_000009_0' done.
>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0]
> org.apache.hadoop.mapred.LocalJobRunner: Finishing task:
> attempt_local1149688163_0001_m_000009_0
>     2016-05-04 14:33:48,897 INFO [Thread-42]
> org.apache.hadoop.mapred.LocalJobRunner: map task executor complete.
>     2016-05-04 14:33:48,901 INFO [Thread-42]
> org.apache.hadoop.mapred.LocalJobRunner: Waiting for reduce tasks
>     2016-05-04 14:33:48,901 INFO [pool-9-thread-1]
> org.apache.hadoop.mapred.LocalJobRunner: Starting task:
> attempt_local1149688163_0001_r_000000_0
>     2016-05-04 14:33:48,918 INFO [pool-9-thread-1]
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output
> Committer Algorithm version is 1
>     2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
> FileOutputCommitter skip cleanup _temporary folders under output
> directory:false, ignore cleanup failures: false
>     2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
> org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
>     2016-05-04 14:33:48,932 INFO [pool-9-thread-1]
> org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin:
> org.apache.hadoop.mapreduce.task.reduce.Shuffle@697f13c9
>     2016-05-04 14:33:48,959 INFO [pool-9-thread-1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager:
> memoryLimit=289931264, maxSingleShuffleLimit=72482816,
> mergeThreshold=191354640, ioSortFactor=10, memToMemMergeOutputsThreshold=10
>     2016-05-04 14:33:48,965 INFO [EventFetcher for fetching Map Completion
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
> attempt_local1149688163_0001_r_000000_0 Thread started: EventFetcher for
> fetching Map Completion Events
>     2016-05-04 14:33:49,035 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000007_0 decomp:
> 5381537 len: 5381541 to MEMORY
>     2016-05-04 14:33:49,056 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5381537
> bytes from map-output for attempt_local1149688163_0001_m_000007_0
>     2016-05-04 14:33:49,061 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5381537, inMemoryMapOutputs.size() -> 1,
> commitMemory -> 0, usedMemory ->5381537
>     2016-05-04 14:33:49,070 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000000_0 decomp:
> 5472201 len: 5472205 to MEMORY
>     2016-05-04 14:33:49,084 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5472201
> bytes from map-output for attempt_local1149688163_0001_m_000000_0
>     2016-05-04 14:33:49,084 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5472201, inMemoryMapOutputs.size() -> 2,
> commitMemory -> 5381537, usedMemory ->10853738
>     2016-05-04 14:33:49,110 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000001_0 decomp:
> 5387977 len: 5387981 to MEMORY
>     2016-05-04 14:33:49,124 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5387977
> bytes from map-output for attempt_local1149688163_0001_m_000001_0
>     2016-05-04 14:33:49,125 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5387977, inMemoryMapOutputs.size() -> 3,
> commitMemory -> 10853738, usedMemory ->16241715
>     2016-05-04 14:33:49,129 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000004_0 decomp:
> 5347914 len: 5347918 to MEMORY
>     2016-05-04 14:33:49,143 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5347914
> bytes from map-output for attempt_local1149688163_0001_m_000004_0
>     2016-05-04 14:33:49,144 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5347914, inMemoryMapOutputs.size() -> 4,
> commitMemory -> 16241715, usedMemory ->21589629
>     2016-05-04 14:33:49,148 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000002_0 decomp:
> 5671398 len: 5671402 to MEMORY
>     2016-05-04 14:33:49,161 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5671398
> bytes from map-output for attempt_local1149688163_0001_m_000002_0
>     2016-05-04 14:33:49,161 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5671398, inMemoryMapOutputs.size() -> 5,
> commitMemory -> 21589629, usedMemory ->27261027
>     2016-05-04 14:33:49,166 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000005_0 decomp:
> 5743249 len: 5743253 to MEMORY
>     2016-05-04 14:33:49,180 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5743249
> bytes from map-output for attempt_local1149688163_0001_m_000005_0
>     2016-05-04 14:33:49,180 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5743249, inMemoryMapOutputs.size() -> 6,
> commitMemory -> 27261027, usedMemory ->33004276
>     2016-05-04 14:33:49,184 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000008_0 decomp:
> 5471488 len: 5471492 to MEMORY
>     2016-05-04 14:33:49,197 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5471488
> bytes from map-output for attempt_local1149688163_0001_m_000008_0
>     2016-05-04 14:33:49,197 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5471488, inMemoryMapOutputs.size() -> 7,
> commitMemory -> 33004276, usedMemory ->38475764
>     2016-05-04 14:33:49,313 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000003_0 decomp:
> 5579502 len: 5579506 to MEMORY
>     2016-05-04 14:33:49,327 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5579502
> bytes from map-output for attempt_local1149688163_0001_m_000003_0
>     2016-05-04 14:33:49,327 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5579502, inMemoryMapOutputs.size() -> 8,
> commitMemory -> 38475764, usedMemory ->44055266
>     2016-05-04 14:33:49,332 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000006_0 decomp:
> 5605456 len: 5605460 to MEMORY
>     2016-05-04 14:33:49,344 INFO [main] org.apache.hadoop.mapreduce.Job:
> map 100% reduce 0%
>     2016-05-04 14:33:49,349 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5605456
> bytes from map-output for attempt_local1149688163_0001_m_000006_0
>     2016-05-04 14:33:49,349 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5605456, inMemoryMapOutputs.size() -> 9,
> commitMemory -> 44055266, usedMemory ->49660722
>     2016-05-04 14:33:49,354 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
> to shuffle output of map attempt_local1149688163_0001_m_000009_0 decomp:
> 5738455 len: 5738459 to MEMORY
>     2016-05-04 14:33:49,370 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5738455
> bytes from map-output for attempt_local1149688163_0001_m_000009_0
>     2016-05-04 14:33:49,370 INFO [localfetcher#1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
> -> map-output of size: 5738455, inMemoryMapOutputs.size() -> 10,
> commitMemory -> 49660722, usedMemory ->55399177
>     2016-05-04 14:33:49,373 INFO [EventFetcher for fetching Map Completion
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: EventFetcher
> is interrupted.. Returning
>     2016-05-04 14:33:49,375 INFO [pool-9-thread-1]
> org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>     2016-05-04 14:33:49,376 INFO [pool-9-thread-1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge called
> with 10 in-memory map-outputs and 0 on-disk map-outputs
>     2016-05-04 14:33:49,388 INFO [pool-9-thread-1]
> org.apache.hadoop.mapred.Merger: Merging 10 sorted segments
>     2016-05-04 14:33:49,389 INFO [pool-9-thread-1]
> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
> segments left of total size: 55398877 bytes
>     2016-05-04 14:33:49,711 INFO [pool-9-thread-1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 10
> segments, 55399177 bytes to disk to satisfy reduce memory limit
>     2016-05-04 14:33:49,712 INFO [pool-9-thread-1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 1 files,
> 55399163 bytes from disk
>     2016-05-04 14:33:49,713 INFO [pool-9-thread-1]
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0
> segments, 0 bytes from memory into reduce
>     2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
> org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
>     2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1
> segments left of total size: 55399129 bytes
>     2016-05-04 14:33:49,715 INFO [pool-9-thread-1]
> org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>     2016-05-04 14:33:49,742 INFO [Thread-42]
> org.apache.hadoop.mapred.LocalJobRunner: reduce task executor complete.
>     2016-05-04 14:33:49,797 WARN [Thread-42]
> org.apache.hadoop.mapred.LocalJobRunner: job_local1149688163_0001
>     java.lang.Exception: java.io.IOException: Mkdirs failed to create
> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
> (exists=false, cwd=
> file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002
> )
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
>     Caused by: java.io.IOException: Mkdirs failed to create
> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
> (exists=false, cwd=
> file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002
> )
>         at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449)
>         at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
>         at
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1074)
>         at
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
>         at
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
>         at
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
>         at
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
>         at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:540)
>         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>     2016-05-04 14:33:50,346 INFO [main] org.apache.hadoop.mapreduce.Job:
> Job job_local1149688163_0001 failed with state FAILED due to: NA
>     2016-05-04 14:33:50,407 INFO [main] org.apache.hadoop.mapreduce.Job:
> Counters: 38
>         File System Counters
>             FILE: Number of bytes read=1287449333
>             FILE: Number of bytes written=1607139426
>             FILE: Number of read operations=0
>             FILE: Number of large read operations=0
>             FILE: Number of write operations=0
>             HDFS: Number of bytes read=1111590
>             HDFS: Number of bytes written=220
>             HDFS: Number of read operations=40
>             HDFS: Number of large read operations=0
>             HDFS: Number of write operations=20
>         Map-Reduce Framework
>             Map input records=10906
>             Map output records=10906
>             Map output bytes=55355550
>             Map output materialized bytes=55399217
>             Input split bytes=2900
>             Combine input records=0
>             Combine output records=0
>             Reduce input groups=0
>             Reduce shuffle bytes=55399217
>             Reduce input records=0
>             Reduce output records=0
>             Spilled Records=10906
>             Shuffled Maps =10
>             Failed Shuffles=0
>             Merged Map outputs=10
>             GC time elapsed (ms)=641
>             CPU time spent (ms)=11290
>             Physical memory (bytes) snapshot=4507889664
>             Virtual memory (bytes) snapshot=22225674240
>             Total committed heap usage (bytes)=2925002752
>         Shuffle Errors
>             BAD_ID=0
>             CONNECTION=0
>             IO_ERROR=0
>             WRONG_LENGTH=0
>             WRONG_MAP=0
>             WRONG_REDUCE=0
>         File Input Format Counters
>             Bytes Read=0
>         File Output Format Counters
>             Bytes Written=0
>
> And here is the exception from next job:
>
>     Failing Oozie Launcher, Main class
>
>     [org.apache.oozie.action.hadoop.JavaMain], main() threw exception,
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>     org.apache.oozie.action.hadoop.JavaMainException:
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
>         at
> org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
>         at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at
> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>     Caused by:
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>         at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
>         at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
>         at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
>         at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
>         at
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
>         at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
>         at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
>         at
> com.nissatech.kmedoidsusingfames.algorithms.initialization.RandomSeedDriver.generateRandomSeed(RandomSeedDriver.java:52)
>         at
> com.nissatech.kmedoidsusingfames.algorithms.initialization.ScalableKMeansPPInitialization.performInitialization(ScalableKMeansPPInitialization.java:43)
>         at
> com.nissatech.kmedoidsusingfames.algorithms.kmedoids.KMedoidsUsingFAMES.perform(KMedoidsUsingFAMES.java:54)
>         at
> com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmRepetitor.performIteratingForSameNoOfClusters(ClusteringAlgorithmRepetitor.java:43)
>         at
> com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmIterator.performTraining(ClusteringAlgorithmIterator.java:46)
>         at
> com.nissatech.kmedoidsusingfames.orchestration.Orchestrator.main(Orchestrator.java:74)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)
>         ... 15 more
>
> It seems to me that first job is run locally and hence there is no result
> for the next one on the HDFS. Am I wrong?
>
> ___________________________
>
>
> I was able to make my MR job run on HDP cluster by adding this to
> configuration (based on the following link):
>
>     Configuration conf = new Configuration(false);
>     conf.addResource(new Path("file:///",
> System.getProperty("oozie.action.conf.xml")));
>
> But why do I need to do that and how to avoid it? I have a sequence of MR
> jobs run from this Java action and I don't won't to bind myself to using
> Oozie and adding this to config of each job. Is there a way to make my jobs
> run on cluster from Oozie by default?
>
> I should probably mention that this is an HDP cluster and setup was
> performed through Ambari.
> --
> *Marko Dinić*
> *Software engineer @*
> [image: Nissatech]
> Kajmakčalanska 8
> 18000 Niš, Serbia
> website <http://www.nissatech.com> | email <ma...@nissatech.com>
> tel/fax: +381 18 288 111
> mobile: +381 63 82 49 556
> skype: vesto91
>
>
> --
> *Marko Dinić*
> *Software engineer @*
> [image: Nissatech]
> Kajmakčalanska 8
> 18000 Niš, Serbia
> website <http://www.nissatech.com> | email <ma...@nissatech.com>
> tel/fax: +381 18 288 111
> mobile: +381 63 82 49 556
> skype: vesto91
>

Re: MR jobs from Java action run locally

Posted by Marko Dinic <ma...@nissatech.com>.

I should add that this is what my Configuration looks like when I create 
it using default constructor

Configuration conf = new Configuration();

|mapreduce.jobtracker.address = local
mapreduce.framework.name = local|

And here is what happens when using

|Configuration conf = new Configuration(false);
conf.addResource(new Path("file:///", System.getProperty("oozie.action.conf.xml")));|

|mapreduce.jobtracker.address = 192.168.84.27:8050
mapreduce.framework.name = yarn|

Any help would be highly appreciated.

On 05/05/2016 10:39 AM, Marko Dinic wrote:
> Hello everyone,
>
> I'm trying to run a sequence of MR jobs using Java action for their 
> drivers in Oozie.
>
> The problem is that MR job are run locally instead on Hadoop cluster. 
> How to fix this?
>
> First job reads from HBase, performs some processing and puts the 
> result on HDFS, while next job should read from it. There are 10 
> mappers in first job, but I'm only showing the last one as an example.
>
> Here is the error log from HBase MR job:
>
>         Aw==, start row: 9-777-1123456789113, end row: 
> 9-777-1123456789114, region location: hdp-slave1.nissatech.local:16020)
>     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process 
> identifier=hconnection-0x860ce79 connecting to ZooKeeper 
> ensemble=192.168.84.27:2181
>     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.zookeeper.ZooKeeper: Initiating client connection, 
> connectString=192.168.84.27:2181 sessionTimeout=90000 
> watcher=hconnection-0x860ce790x0, quorum=192.168.84.27:2181, 
> baseZNode=/hbase-unsecure
>     2016-05-04 14:33:48,378 INFO [LocalJobRunner Map Task Executor 
> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn: 
> Opening socket connection to server 192.168.84.27/192.168.84.27:2181. 
> Will not attempt to authenticate using SASL (unknown error)
>     2016-05-04 14:33:48,379 INFO [LocalJobRunner Map Task Executor 
> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn: 
> Socket connection established to 192.168.84.27/192.168.84.27:2181, 
> initiating session
>     2016-05-04 14:33:48,391 INFO [LocalJobRunner Map Task Executor 
> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn: 
> Session establishment complete on server 
> 192.168.84.27/192.168.84.27:2181, sessionid = 0x152f8f85214096b, 
> negotiated timeout = 40000
>     2016-05-04 14:33:48,394 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Input split 
> length: 0 bytes.
>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: soft limit at 83886080
>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
>     2016-05-04 14:33:48,591 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
>     2016-05-04 14:33:48,592 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: Map output collector class = 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer
>     2016-05-04 14:33:48,801 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.LocalJobRunner:
>     2016-05-04 14:33:48,802 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: 
> Closing zookeeper sessionid=0x152f8f85214096b
>     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor 
> #0-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
>     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.zookeeper.ZooKeeper: Session: 0x152f8f85214096b closed
>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: Starting flush of map output
>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: Spilling map output
>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 5734062; 
> bufvoid = 104857600
>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend 
> = 26210008(104840032); length = 4389/6553600
>     2016-05-04 14:33:48,874 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.MapTask: Finished spill 0
>     2016-05-04 14:33:48,877 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.Task: 
> Task:attempt_local1149688163_0001_m_000009_0 is done. And is in the 
> process of committing
>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.LocalJobRunner: map
>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.Task: Task 
> 'attempt_local1149688163_0001_m_000009_0' done.
>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0] 
> org.apache.hadoop.mapred.LocalJobRunner: Finishing task: 
> attempt_local1149688163_0001_m_000009_0
>     2016-05-04 14:33:48,897 INFO [Thread-42] 
> org.apache.hadoop.mapred.LocalJobRunner: map task executor complete.
>     2016-05-04 14:33:48,901 INFO [Thread-42] 
> org.apache.hadoop.mapred.LocalJobRunner: Waiting for reduce tasks
>     2016-05-04 14:33:48,901 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapred.LocalJobRunner: Starting task: 
> attempt_local1149688163_0001_r_000000_0
>     2016-05-04 14:33:48,918 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File 
> Output Committer Algorithm version is 1
>     2016-05-04 14:33:48,919 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: 
> FileOutputCommitter skip cleanup _temporary folders under output 
> directory:false, ignore cleanup failures: false
>     2016-05-04 14:33:48,919 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
>     2016-05-04 14:33:48,932 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin: 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle@697f13c9
>     2016-05-04 14:33:48,959 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> MergerManager: memoryLimit=289931264, maxSingleShuffleLimit=72482816, 
> mergeThreshold=191354640, ioSortFactor=10, 
> memToMemMergeOutputsThreshold=10
>     2016-05-04 14:33:48,965 INFO [EventFetcher for fetching Map 
> Completion Events] 
> org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_local1149688163_0001_r_000000_0 Thread started: EventFetcher 
> for fetching Map Completion Events
>     2016-05-04 14:33:49,035 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000007_0 
> decomp: 5381537 len: 5381541 to MEMORY
>     2016-05-04 14:33:49,056 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5381537 bytes from map-output for attempt_local1149688163_0001_m_000007_0
>     2016-05-04 14:33:49,061 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5381537, 
> inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->5381537
>     2016-05-04 14:33:49,070 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000000_0 
> decomp: 5472201 len: 5472205 to MEMORY
>     2016-05-04 14:33:49,084 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5472201 bytes from map-output for attempt_local1149688163_0001_m_000000_0
>     2016-05-04 14:33:49,084 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5472201, 
> inMemoryMapOutputs.size() -> 2, commitMemory -> 5381537, usedMemory 
> ->10853738
>     2016-05-04 14:33:49,110 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000001_0 
> decomp: 5387977 len: 5387981 to MEMORY
>     2016-05-04 14:33:49,124 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5387977 bytes from map-output for attempt_local1149688163_0001_m_000001_0
>     2016-05-04 14:33:49,125 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5387977, 
> inMemoryMapOutputs.size() -> 3, commitMemory -> 10853738, usedMemory 
> ->16241715
>     2016-05-04 14:33:49,129 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000004_0 
> decomp: 5347914 len: 5347918 to MEMORY
>     2016-05-04 14:33:49,143 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5347914 bytes from map-output for attempt_local1149688163_0001_m_000004_0
>     2016-05-04 14:33:49,144 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5347914, 
> inMemoryMapOutputs.size() -> 4, commitMemory -> 16241715, usedMemory 
> ->21589629
>     2016-05-04 14:33:49,148 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000002_0 
> decomp: 5671398 len: 5671402 to MEMORY
>     2016-05-04 14:33:49,161 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5671398 bytes from map-output for attempt_local1149688163_0001_m_000002_0
>     2016-05-04 14:33:49,161 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5671398, 
> inMemoryMapOutputs.size() -> 5, commitMemory -> 21589629, usedMemory 
> ->27261027
>     2016-05-04 14:33:49,166 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000005_0 
> decomp: 5743249 len: 5743253 to MEMORY
>     2016-05-04 14:33:49,180 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5743249 bytes from map-output for attempt_local1149688163_0001_m_000005_0
>     2016-05-04 14:33:49,180 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5743249, 
> inMemoryMapOutputs.size() -> 6, commitMemory -> 27261027, usedMemory 
> ->33004276
>     2016-05-04 14:33:49,184 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000008_0 
> decomp: 5471488 len: 5471492 to MEMORY
>     2016-05-04 14:33:49,197 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5471488 bytes from map-output for attempt_local1149688163_0001_m_000008_0
>     2016-05-04 14:33:49,197 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5471488, 
> inMemoryMapOutputs.size() -> 7, commitMemory -> 33004276, usedMemory 
> ->38475764
>     2016-05-04 14:33:49,313 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000003_0 
> decomp: 5579502 len: 5579506 to MEMORY
>     2016-05-04 14:33:49,327 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5579502 bytes from map-output for attempt_local1149688163_0001_m_000003_0
>     2016-05-04 14:33:49,327 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5579502, 
> inMemoryMapOutputs.size() -> 8, commitMemory -> 38475764, usedMemory 
> ->44055266
>     2016-05-04 14:33:49,332 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000006_0 
> decomp: 5605456 len: 5605460 to MEMORY
>     2016-05-04 14:33:49,344 INFO [main] 
> org.apache.hadoop.mapreduce.Job:  map 100% reduce 0%
>     2016-05-04 14:33:49,349 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5605456 bytes from map-output for attempt_local1149688163_0001_m_000006_0
>     2016-05-04 14:33:49,349 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5605456, 
> inMemoryMapOutputs.size() -> 9, commitMemory -> 44055266, usedMemory 
> ->49660722
>     2016-05-04 14:33:49,354 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 
> about to shuffle output of map attempt_local1149688163_0001_m_000009_0 
> decomp: 5738455 len: 5738459 to MEMORY
>     2016-05-04 14:33:49,370 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 
> 5738455 bytes from map-output for attempt_local1149688163_0001_m_000009_0
>     2016-05-04 14:33:49,370 INFO [localfetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: 
> closeInMemoryFile -> map-output of size: 5738455, 
> inMemoryMapOutputs.size() -> 10, commitMemory -> 49660722, usedMemory 
> ->55399177
>     2016-05-04 14:33:49,373 INFO [EventFetcher for fetching Map 
> Completion Events] 
> org.apache.hadoop.mapreduce.task.reduce.EventFetcher: EventFetcher is 
> interrupted.. Returning
>     2016-05-04 14:33:49,375 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>     2016-05-04 14:33:49,376 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge 
> called with 10 in-memory map-outputs and 0 on-disk map-outputs
>     2016-05-04 14:33:49,388 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapred.Merger: Merging 10 sorted segments
>     2016-05-04 14:33:49,389 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 
> segments left of total size: 55398877 bytes
>     2016-05-04 14:33:49,711 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 10 
> segments, 55399177 bytes to disk to satisfy reduce memory limit
>     2016-05-04 14:33:49,712 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 1 
> files, 55399163 bytes from disk
>     2016-05-04 14:33:49,713 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0 
> segments, 0 bytes from memory into reduce
>     2016-05-04 14:33:49,714 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
>     2016-05-04 14:33:49,714 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 
> segments left of total size: 55399129 bytes
>     2016-05-04 14:33:49,715 INFO [pool-9-thread-1] 
> org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>     2016-05-04 14:33:49,742 INFO [Thread-42] 
> org.apache.hadoop.mapred.LocalJobRunner: reduce task executor complete.
>     2016-05-04 14:33:49,797 WARN [Thread-42] 
> org.apache.hadoop.mapred.LocalJobRunner: job_local1149688163_0001
>     java.lang.Exception: java.io.IOException: Mkdirs failed to create 
> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0 
> (exists=false, 
> cwd=file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
>     Caused by: java.io.IOException: Mkdirs failed to create 
> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0 
> (exists=false, 
> cwd=file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
>         at 
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1074)
>         at 
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
>         at 
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
>         at 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
>         at 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
>         at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:540)
>         at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>     2016-05-04 14:33:50,346 INFO [main] 
> org.apache.hadoop.mapreduce.Job: Job job_local1149688163_0001 failed 
> with state FAILED due to: NA
>     2016-05-04 14:33:50,407 INFO [main] 
> org.apache.hadoop.mapreduce.Job: Counters: 38
>         File System Counters
>             FILE: Number of bytes read=1287449333
>             FILE: Number of bytes written=1607139426
>             FILE: Number of read operations=0
>             FILE: Number of large read operations=0
>             FILE: Number of write operations=0
>             HDFS: Number of bytes read=1111590
>             HDFS: Number of bytes written=220
>             HDFS: Number of read operations=40
>             HDFS: Number of large read operations=0
>             HDFS: Number of write operations=20
>         Map-Reduce Framework
>             Map input records=10906
>             Map output records=10906
>             Map output bytes=55355550
>             Map output materialized bytes=55399217
>             Input split bytes=2900
>             Combine input records=0
>             Combine output records=0
>             Reduce input groups=0
>             Reduce shuffle bytes=55399217
>             Reduce input records=0
>             Reduce output records=0
>             Spilled Records=10906
>             Shuffled Maps =10
>             Failed Shuffles=0
>             Merged Map outputs=10
>             GC time elapsed (ms)=641
>             CPU time spent (ms)=11290
>             Physical memory (bytes) snapshot=4507889664
>             Virtual memory (bytes) snapshot=22225674240
>             Total committed heap usage (bytes)=2925002752
>         Shuffle Errors
>             BAD_ID=0
>             CONNECTION=0
>             IO_ERROR=0
>             WRONG_LENGTH=0
>             WRONG_MAP=0
>             WRONG_REDUCE=0
>         File Input Format Counters
>             Bytes Read=0
>         File Output Format Counters
>             Bytes Written=0
>
> And here is the exception from next job:
>
>     Failing Oozie Launcher, Main class
>
>     [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, 
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input 
> path does not exist: 
> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>     org.apache.oozie.action.hadoop.JavaMainException: 
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input 
> path does not exist: 
> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
>         at 
> org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
>         at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at 
> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>     Caused by: 
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input 
> path does not exist: 
> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>         at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
>         at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
>         at 
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
>         at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
>         at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>         at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
>         at 
> com.nissatech.kmedoidsusingfames.algorithms.initialization.RandomSeedDriver.generateRandomSeed(RandomSeedDriver.java:52)
>         at 
> com.nissatech.kmedoidsusingfames.algorithms.initialization.ScalableKMeansPPInitialization.performInitialization(ScalableKMeansPPInitialization.java:43)
>         at 
> com.nissatech.kmedoidsusingfames.algorithms.kmedoids.KMedoidsUsingFAMES.perform(KMedoidsUsingFAMES.java:54)
>         at 
> com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmRepetitor.performIteratingForSameNoOfClusters(ClusteringAlgorithmRepetitor.java:43)
>         at 
> com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmIterator.performTraining(ClusteringAlgorithmIterator.java:46)
>         at 
> com.nissatech.kmedoidsusingfames.orchestration.Orchestrator.main(Orchestrator.java:74)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)
>         ... 15 more
>
> It seems to me that first job is run locally and hence there is no 
> result for the next one on the HDFS. Am I wrong?
>
> ___________________________
>
>
> I was able to make my MR job run on HDP cluster by adding this to 
> configuration (based on the following link):
>
>     Configuration conf = new Configuration(false);
>     conf.addResource(new Path("file:///", 
> System.getProperty("oozie.action.conf.xml")));
>
> But why do I need to do that and how to avoid it? I have a sequence of 
> MR jobs run from this Java action and I don't won't to bind myself to 
> using Oozie and adding this to config of each job. Is there a way to 
> make my jobs run on cluster from Oozie by default?
>
> I should probably mention that this is an HDP cluster and setup was 
> performed through Ambari.
> -- 
> signature *Marko Dinic'*
> /Software engineer @/
> Nissatech
> Kajmakc(alanska 8
> 18000 Nis(, Serbia
> website <http://www.nissatech.com> | email 
> <ma...@nissatech.com>
> tel/fax: +381 18 288 111
> mobile: +381 63 82 49 556
> skype: vesto91

-- 
signature *Marko Dinic'*
/Software engineer @/
Nissatech
Kajmakc(alanska 8
18000 Nis(, Serbia
website <http://www.nissatech.com> | email 
<ma...@nissatech.com>
tel/fax: +381 18 288 111
mobile: +381 63 82 49 556
skype: vesto91