You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Balu Vellanki (JIRA)" <ji...@apache.org> on 2016/06/24 06:09:16 UTC

[jira] [Commented] (FALCON-2049) Feed Replication with Empty Directories are failing

    [ https://issues.apache.org/jira/browse/FALCON-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347773#comment-15347773 ] 

Balu Vellanki commented on FALCON-2049:
---------------------------------------

This issue was introduced by https://issues.apache.org/jira/browse/FALCON-1844 where setDeleteMissing is set to true by default. Assume source dir is /tmp/source/${YEAR}/${MONTH}/${DAY} and target is /tmp/target/${YEAR}/${MONTH}/${DAY} Feed replication is triggered with DistCp being equivalent to following CLI distcp command
{code}
hadoop distcp -update -delete hdfs://c6401.ambari.apache.org:8020/tmp/source/1/2/3 hdfs://c6401.ambari.apache.org:8020/tmp/target/1/2/3/
{code}
The following scenarios can occur,

Case 1. Source dir is created but is empty, but availabilityFlag is created. 
Result : DistCp succeeds, hdfs://c6401.ambari.apache.org:8020/tmp/target/1/2/3/ is created and availabilityFlag is copies to target

Case 2. Source dir is created and has files. 
Result : DistCp succeeds, hdfs://c6401.ambari.apache.org:8020/tmp/target/1/2/3/ is created and target dir has same files as sourceDir with same dir structure. 

Case 3.  Source dir is created without any files and target dir is also created. 
Result : DistCp succeeds,  both source and target have empty dirs.

Case 4. Source dir is created but is empty, availabilityFlag is not created. 
Result : DistCp fails with error "Job commit failed: org.apache.hadoop.tools.CopyListing$InvalidInputException: hdfs://c6401.ambari.apache.org:8020/tmp/target/1/2/3 does not exist"

There seems to be two solutions for this problem. 
1. Return success when sourceDir has no files and targetDir is missing, thus avoiding Case 4
OR
2. Create targetDir and then attempt to DistCp. This will trigger Case 3 and replication job will succeed. 

I recommend option 2 because having an empty source/target dir is a valid use case for data directories. 


> Feed Replication with Empty Directories are failing
> ---------------------------------------------------
>
>                 Key: FALCON-2049
>                 URL: https://issues.apache.org/jira/browse/FALCON-2049
>             Project: Falcon
>          Issue Type: Bug
>          Components: feed
>    Affects Versions: 0.10
>            Reporter: Murali Ramasami
>            Priority: Critical
>             Fix For: 0.10
>
>
> Feed Replication with empty directories are failing with the following error in application log:
> {noformat}
> 2016-06-23 08:35:21,475 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://nat-os-r7-wkqu-falcon-multicluster-10.openstacklocal:8020/mr-history/tmp/hrt_qa/job_1466658266370_0059_conf.xml_tmp to hdfs://nat-os-r7-wkqu-falcon-multicluster-10.openstacklocal:8020/mr-history/tmp/hrt_qa/job_1466658266370_0059_conf.xml
> 2016-06-23 08:35:21,476 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: hdfs://nat-os-r7-wkqu-falcon-multicluster-10.openstacklocal:8020/mr-history/tmp/hrt_qa/job_1466658266370_0059-1466670911340-hrt_qa-distcp%3A+oozie%3Aaction%3AT%3Djava%3AW%3DFALCON_FEED_REPLICAT-1466670921283-0-0-FAILED-default-1466670920070.jhist_tmp to hdfs://nat-os-r7-wkqu-falcon-multicluster-10.openstacklocal:8020/mr-history/tmp/hrt_qa/job_1466658266370_0059-1466670911340-hrt_qa-distcp%3A+oozie%3Aaction%3AT%3Djava%3AW%3DFALCON_FEED_REPLICAT-1466670921283-0-0-FAILED-default-1466670920070.jhist
> 2016-06-23 08:35:21,477 INFO [Thread-66] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
> 2016-06-23 08:35:21,479 INFO [Thread-66] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to No of maps and reduces are 0 job_1466658266370_0059
> Job commit failed: org.apache.hadoop.tools.CopyListing$InvalidInputException: hdfs://nat-os-r7-wkqu-falcon-multicluster-10.openstacklocal:8020/tmp/falcon-regression/FeedReplicationTest/target/2016/06/23/08/32 doesn't exist
>         at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
>         at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
>         at org.apache.hadoop.tools.mapred.CopyCommitter.deleteMissing(CopyCommitter.java:241)
>         at org.apache.hadoop.tools.mapred.CopyCommitter.commitJob(CopyCommitter.java:94)
>         at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
>         at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Feed submitted:
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?><feed xmlns="uri:falcon:feed:0.1" name="A7769e4e0-49663d60" description="Input File">
>     <partitions>
>         <partition name="colo"/>
>         <partition name="eventTime"/>
>         <partition name="impressionHour"/>
>         <partition name="pricingModel"/>
>     </partitions>
>     <availabilityFlag>availabilityFlag.txt</availabilityFlag>
>     <frequency>minutes(5)</frequency>
>     <late-arrival cut-off="days(100000)"/>
>     <clusters>
>         <cluster name="A7769e4e0-0af6c74b" type="source">
>             <validity start="2016-06-23T08:32Z" end="2016-06-23T08:37Z"/>
>             <retention limit="days(1000000)" action="delete"/>
>         </cluster>
>         <cluster name="A7769e4e0-25f87f0e" type="target">
>             <validity start="2016-06-23T08:32Z" end="2016-06-23T08:37Z"/>
>             <retention limit="days(1000000)" action="delete"/>
>             <locations>
>                 <location type="data" path="/tmp/falcon-regression/FeedReplicationTest/target/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
>             </locations>
>         </cluster>
>     </clusters>
>     <locations>
>         <location type="data" path="/tmp/falcon-regression/FeedReplicationTest/source/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
>         <location type="stats" path="/data/regression/fetlrc/billing/stats"/>
>         <location type="meta" path="/data/regression/fetlrc/billing/metadata"/>
>     </locations>
>     <ACL owner="hrt_qa" group="users" permission="0x755"/>
>     <schema location="/databus/streams_local/click_rr/schema/" provider="protobuf"/>
>     <properties>
>         <property name="field1" value="value1"/>
>         <property name="field2" value="value2"/>
>         <property name="job.counter" value="true"/>
>     </properties>
> </feed>
> {noformat}
> It is failing because of the target directories are not exists to replicate. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)