You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Satish Mittal (JIRA)" <ji...@apache.org> on 2014/05/29 15:47:02 UTC
[jira] [Updated] (FALCON-455) Replication of output feed of an
HCatalog process not working
[ https://issues.apache.org/jira/browse/FALCON-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Satish Mittal updated FALCON-455:
---------------------------------
Attachment: workflow.xml
hcat-process.xml
hcat-out-feed.xml
hcat-in-feed.xml
Attaching feed/process/workflow xml files.
> Replication of output feed of an HCatalog process not working
> -------------------------------------------------------------
>
> Key: FALCON-455
> URL: https://issues.apache.org/jira/browse/FALCON-455
> Project: Falcon
> Issue Type: Bug
> Affects Versions: 0.5
> Reporter: Satish Mittal
> Attachments: hcat-in-feed.xml, hcat-out-feed.xml, hcat-process.xml, workflow.xml
>
>
> Suppose there is an HCatalog process (java type) that takes an HCat input feed and outputs another HCat feed. Further, this output feed is configured for replication across 2 clusters.
> The replication of output feed fails during Hive import step. The reason is that HCat process job output on HDFS consists of '_logs' directory if process writes to a static partition (or consists of an empty '_temporary' directory if process writes to a dynamic partition).
> The Hive import job logs contain following error:
> {noformat}
> 9036 [main] INFO org.apache.hadoop.hive.ql.Driver - Starting command:
> import table table5 partition (minute='25',month='05',year='2014',hour='12',day='29') from 'hdfs://databusdev2.mkhoj.com:9000//projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data'
> 9036 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=TimeToSubmit start=1401367057244 end=1401367057579 duration=335 from=org.apache.hadoop.hive.ql.Driver>
> 9036 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
> 9036 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=task.COPY.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
> 9036 [main] INFO org.apache.hadoop.hive.ql.exec.Task - Copying data from hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25 to hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000
> 9069 [main] INFO org.apache.hadoop.hive.ql.exec.Task - Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25/_SUCCESS
> 9096 [main] INFO org.apache.hadoop.hive.ql.exec.Task - Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25/_logs
> 9190 [main] INFO org.apache.hadoop.hive.ql.exec.Task - Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-25/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=25/part-r-00000
> 9222 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=task.DDL.Stage-1 from=org.apache.hadoop.hive.ql.Driver>
> 9580 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=task.COPY.Stage-0 start=1401367057579 end=1401367058123 duration=544 from=org.apache.hadoop.hive.ql.Driver>
> 9580 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=task.MOVE.Stage-2 from=org.apache.hadoop.hive.ql.Driver>
> 9581 [main] INFO org.apache.hadoop.hive.ql.exec.Task - Loading data to table default.table5 partition (day=29, hour=12, minute=25, month=05, year=2014) from hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000
> 9598 [main] INFO org.apache.hadoop.hive.ql.exec.MoveTask - Partition is: {day=29, hour=12, minute=25, month=05, year=2014}
> 9668 [main] ERROR org.apache.hadoop.hive.ql.exec.Task - Failed with exception checkPaths: hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000 has nested directoryhdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000/_logs
> org.apache.hadoop.hive.ql.metadata.HiveException: checkPaths: hdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000 has nested directoryhdfs://databusdev2.mkhoj.com:9000/tmp/hive-mapred/hive_2014-05-29_12-37-37_244_6437156794758917899-1/-ext-10000/_logs
> at org.apache.hadoop.hive.ql.metadata.Hive.checkPaths(Hive.java:2108)
> at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2298)
> at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1230)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:408)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1532)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1305)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1136)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:976)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:966)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:457)
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:467)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
> at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:318)
> at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:279)
> at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
> at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
> at org.apache.hadoop.mapred.Child.main(Child.java:260)
> 9668 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=task.MOVE.Stage-2 start=1401367058123 end=1401367058211 duration=88 from=org.apache.hadoop.hive.ql.Driver>
> 9672 [main] ERROR org.apache.hadoop.hive.ql.Driver - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> {noformat}
> Apprarently, Hive import doesn't like any directory in import path. This behavior can be seen on Hive CLI also.
> {noformat}
> hive> import table table5 partition (minute='32',month='05',year='2014',hour='12',day='29') from 'hdfs://databusdev2.mkhoj.com:9000//projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data'
> > ;
> Copying data from hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32
> Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32/_SUCCESS
> Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32/_logs
> Copying file: hdfs://databusdev2.mkhoj.com:9000/projects/falcon/hcolo2/staging/FALCON_FEED_REPLICATION_hcat-out6_hcat-cluster2/default/table5/year=2014/2014-05-29-12-32/hcat-cluster2/data/year=2014/month=05/day=29/hour=12/minute=32/part-r-00000
> Loading data to table default.table5 partition (day=29, hour=12, minute=32, month=05, year=2014)
> Failed with exception checkPaths: hdfs://databusdev2.mkhoj.com:9000/tmp/hive-hive/hive_2014-05-29_13-13-43_867_8757094482694632648-1/-ext-10000 has nested directoryhdfs://databusdev2.mkhoj.com:9000/tmp/hive-hive/hive_2014-05-29_13-13-43_867_8757094482694632648-1/-ext-10000/_logs
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> hive>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)