You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by "Mona Chitnis (JIRA)" <ji...@apache.org> on 2011/08/01 22:09:09 UTC

[jira] [Updated] (OOZIE-2) Oozie 'move' fs action is inconsistent

     [ https://issues.apache.org/jira/browse/OOZIE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mona Chitnis updated OOZIE-2:
-----------------------------

    Description: 
> Oozie 'move' fs action is inconsistent
> --------------------------------------
>
>                 Key: OOZIE-133
>                 URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
>             Project: oozie
>          Issue Type: New Feature
>          Components: workflow
>    Affects Versions: 3.0.2
>            Reporter: Mona Chitnis
>            Assignee: Oozie
>   Original Estimate: 1 week
>  Remaining Estimate: 1 week
>
> I'm using the 'move' fs action and I first got the following error:
> FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
> when I had the following in my workflow.xml :
> <fs>
>         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
>         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> </fs>
> I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
> <fs>
>         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
>         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> </fs>
> However, I now get this error:
> FS003: Scheme [hdfs] not allowed in path
> [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
> it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.
> Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
> exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
> placed under the target directory.

--
     [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur resolved OOZIE-133.
--------------------------------------

    Resolution: Won't Fix

This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.

--

Mona Chitnis commented on OOZIE-133:
------------------------------------

There are two parts to this issue.
1. target not to mention scheme
2. if target is an existing directory, exception thrown

For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.

For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?



  was:
>From the reporter of this issue:

I'm using the 'move' fs action and I first got the following error:

FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]

when I had the following in my workflow.xml :

<fs>
        <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
        <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
</fs>


I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:

<fs>
        <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
        <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
</fs>


However, I now get this error:

FS003: Scheme [hdfs] not allowed in path
[hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]


it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.

Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
placed under the target directory.



> Oozie 'move' fs action is inconsistent
> --------------------------------------
>
>                 Key: OOZIE-2
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2
>             Project: Apache Oozie (Incubating)
>          Issue Type: Improvement
>            Reporter: Mona Chitnis
>              Labels: oozie
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> > Oozie 'move' fs action is inconsistent
> > --------------------------------------
> >
> >                 Key: OOZIE-133
> >                 URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
> >             Project: oozie
> >          Issue Type: New Feature
> >          Components: workflow
> >    Affects Versions: 3.0.2
> >            Reporter: Mona Chitnis
> >            Assignee: Oozie
> >   Original Estimate: 1 week
> >  Remaining Estimate: 1 week
> >
> > I'm using the 'move' fs action and I first got the following error:
> > FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
> > when I had the following in my workflow.xml :
> > <fs>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
> > <fs>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > However, I now get this error:
> > FS003: Scheme [hdfs] not allowed in path
> > [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
> > it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.
> > Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
> > exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
> > placed under the target directory.
> --
>      [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> Alejandro Abdelnur resolved OOZIE-133.
> --------------------------------------
>     Resolution: Won't Fix
> This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.
> --
> Mona Chitnis commented on OOZIE-133:
> ------------------------------------
> There are two parts to this issue.
> 1. target not to mention scheme
> 2. if target is an existing directory, exception thrown
> For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.
> For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira