You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Satish Mittal (JIRA)" <ji...@apache.org> on 2014/05/29 15:47:02 UTC

[jira] [Updated] (FALCON-455) Replication of output feed of an HCatalog process not working

     [ https://issues.apache.org/jira/browse/FALCON-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Satish Mittal updated FALCON-455:
---------------------------------

    Attachment: workflow.xml
                hcat-process.xml
                hcat-out-feed.xml
                hcat-in-feed.xml

Attaching feed/process/workflow xml files.

> Replication of output feed of an HCatalog process not working
> -------------------------------------------------------------
>
>                 Key: FALCON-455
>                 URL: https://issues.apache.org/jira/browse/FALCON-455
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Satish Mittal
>         Attachments: hcat-in-feed.xml, hcat-out-feed.xml, hcat-process.xml, workflow.xml
>
>
> Suppose there is an HCatalog process (java type) that takes an HCat input feed and outputs another HCat feed. Further, this output feed is configured for replication across 2 clusters.
> The replication of output feed fails during Hive import step. The reason is that HCat process job output on HDFS consists of '_logs' directory if process writes to a static partition (or consists of an empty '_temporary' directory if process writes to a dynamic partition). 
> The Hive import job logs contain following error:
> {noformat}
> 9036 [main] INFO  org.apache.hadoop.hive.ql.Driver  - Starting command: 
> import table table5 partition (minute='25',month='05',year='2014',hour='12',day='29') from 'hdfs://databusdev2.mkhoj.com:9000//projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data'
> 9036 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=TimeToSubmit start=1401367057244 end=1401367057579 duration=335 from=org.apache.hadoop.hive.ql.Driver>
> 9036 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
> 9036 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.COPY.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
> 9036 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Copying data from hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25 to hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000
> 9069 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25/_SUCCESS
> 9096 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25/_logs
> 9190 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25/part-r-00000
> 9222 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.DDL.Stage-1 from=org.apache.hadoop.hive.ql.Driver>
> 9580 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=task.COPY.Stage-0 start=1401367057579 end=1401367058123 duration=544 from=org.apache.hadoop.hive.ql.Driver>
> 9580 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.MOVE.Stage-2 from=org.apache.hadoop.hive.ql.Driver>
> 9581 [main] INFO  org.apache.hadoop.hive.ql.exec.Task  - Loading data to table default.table5 partition (day=29, hour=12, minute=25, month=05, year=2014) from hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000
> 9598 [main] INFO  org.apache.hadoop.hive.ql.exec.MoveTask  - Partition is: {day=29, hour=12, minute=25, month=05, year=2014}
> 9668 [main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Failed with exception checkPaths: hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000 has nested directoryhdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000/_logs
> org.apache.hadoop.hive.ql.metadata.HiveException: checkPaths: hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000 has nested directoryhdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000/_logs
> 	at org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2108)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2298)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1230)
> 	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:408)
> 	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> 	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
> 	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1532)
> 	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1305)
> 	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1136)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:976)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:966)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
> 	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:457)
> 	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:467)
> 	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
> 	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
> 	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
> 	at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:318)
> 	at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:279)
> 	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
> 	at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:260)
> 9668 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=task.MOVE.Stage-2 start=1401367058123 end=1401367058211 duration=88 from=org.apache.hadoop.hive.ql.Driver>
> 9672 [main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> {noformat}
> Apprarently, Hive import doesn't like any directory in import path. This behavior can be seen on Hive CLI also.
> {noformat}
> hive> import table table5 partition (minute='32',month='05',year='2014',hour='12',day='29') from 'hdfs://databusdev2.mkhoj.com:9000//projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data'
>     > ;
> Copying data from hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32
> Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32/_SUCCESS
> Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32/_logs
> Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32/part-r-00000
> Loading data to table default.table5 partition (day=29, hour=12, minute=32, month=05, year=2014)
> Failed with exception checkPaths: hdfs://databusdev2.mkhoj.com:9000/tmp/hive-hive/hive_2014-05-29_13-13-43_867_8757094482694632648-1/-ext-10000 has nested directoryhdfs://databusdev2.mkhoj.com:9000/tmp/hive-hive/hive_2014-05-29_13-13-43_867_8757094482694632648-1/-ext-10000/_logs
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> hive>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)