You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by "Mona Chitnis (JIRA)" <ji...@apache.org> on 2011/08/01 22:05:11 UTC

[jira] [Created] (OOZIE-2) Oozie 'move' fs action is inconsistent

Oozie 'move' fs action is inconsistent
--------------------------------------

                 Key: OOZIE-2
                 URL: https://issues.apache.org/jira/browse/OOZIE-2
             Project: Apache Oozie (Incubating)
          Issue Type: Improvement
            Reporter: Mona Chitnis


>From the reporter of this issue:

I'm using the 'move' fs action and I first got the following error:

FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]

when I had the following in my workflow.xml :

<fs>
        <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
        <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
</fs>


I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:

<fs>
        <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
        <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
</fs>


However, I now get this error:

FS003: Scheme [hdfs] not allowed in path
[hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]


it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.

Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
placed under the target directory.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OOZIE-2) Oozie 'move' fs action is inconsistent

Posted by "Mona Chitnis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mona Chitnis updated OOZIE-2:
-----------------------------

    Description: 
> Oozie 'move' fs action is inconsistent
> --------------------------------------
>
>                 Key: OOZIE-133
>                 URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
>             Project: oozie
>          Issue Type: New Feature
>          Components: workflow
>    Affects Versions: 3.0.2
>            Reporter: Mona Chitnis
>            Assignee: Oozie
>   Original Estimate: 1 week
>  Remaining Estimate: 1 week
>
> I'm using the 'move' fs action and I first got the following error:
> FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
> when I had the following in my workflow.xml :
> <fs>
>         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
>         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> </fs>
> I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
> <fs>
>         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
>         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> </fs>
> However, I now get this error:
> FS003: Scheme [hdfs] not allowed in path
> [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
> it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.
> Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
> exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
> placed under the target directory.

--
     [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur resolved OOZIE-133.
--------------------------------------

    Resolution: Won't Fix

This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.

--

Mona Chitnis commented on OOZIE-133:
------------------------------------

There are two parts to this issue.
1. target not to mention scheme
2. if target is an existing directory, exception thrown

For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.

For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?



  was:
>From the reporter of this issue:

I'm using the 'move' fs action and I first got the following error:

FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]

when I had the following in my workflow.xml :

<fs>
        <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
        <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
</fs>


I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:

<fs>
        <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
        <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
</fs>


However, I now get this error:

FS003: Scheme [hdfs] not allowed in path
[hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]


it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.

Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
placed under the target directory.



> Oozie 'move' fs action is inconsistent
> --------------------------------------
>
>                 Key: OOZIE-2
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2
>             Project: Apache Oozie (Incubating)
>          Issue Type: Improvement
>            Reporter: Mona Chitnis
>              Labels: oozie
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> > Oozie 'move' fs action is inconsistent
> > --------------------------------------
> >
> >                 Key: OOZIE-133
> >                 URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
> >             Project: oozie
> >          Issue Type: New Feature
> >          Components: workflow
> >    Affects Versions: 3.0.2
> >            Reporter: Mona Chitnis
> >            Assignee: Oozie
> >   Original Estimate: 1 week
> >  Remaining Estimate: 1 week
> >
> > I'm using the 'move' fs action and I first got the following error:
> > FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
> > when I had the following in my workflow.xml :
> > <fs>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
> > <fs>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > However, I now get this error:
> > FS003: Scheme [hdfs] not allowed in path
> > [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
> > it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.
> > Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
> > exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
> > placed under the target directory.
> --
>      [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> Alejandro Abdelnur resolved OOZIE-133.
> --------------------------------------
>     Resolution: Won't Fix
> This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.
> --
> Mona Chitnis commented on OOZIE-133:
> ------------------------------------
> There are two parts to this issue.
> 1. target not to mention scheme
> 2. if target is an existing directory, exception thrown
> For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.
> For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-2) Oozie 'move' fs action is inconsistent

Posted by "Mona Chitnis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078456#comment-13078456 ] 

Mona Chitnis commented on OOZIE-2:
----------------------------------

For part 2, currently exception is thrown by Oozie. The move function is defined as follows in the oozie FsActionExecutor code

public void move(Context context, Path source, Path target, boolean recovery) throws ActionExecutorException {
        try {
            validatePath1(source, true);
            validatePath1(target, false);
            FileSystem fs = getFileSystemFor(source, context);
            if (!fs.exists(source) && !recovery) {
                throw new ActionExecutorException(ActionExecutorException.ErrorType.ERROR, "FS006",
                                                  "move, source path [{0}] does not exist", source);
            }

            /*Path path = new Path(source, target);
            if (fs.exists(path) && !recovery) {
                throw new ActionExecutorException(ActionExecutorException.ErrorType.ERROR, "FS007",
                                                  "move, target path [{0}] already exists", target);
            }*/

            if (!fs.rename(source, target) && !recovery) {
                System.out.println("move gives exception");
                throw new ActionExecutorException(ActionExecutorException.ErrorType.ERROR, "FS008",
                                                  "move, could not move [{0}] to [{1}]", source, target);
            }
        } 
        catch (Exception ex) {
            throw convertException(ex);
        }
    }

So by removing the middle part (commented out now for testing) where oozie complains about target already existing, we leave it to Hadoop Filesystem to either allow the rename or not. Does this seem like a fair modification? 

> Oozie 'move' fs action is inconsistent
> --------------------------------------
>
>                 Key: OOZIE-2
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2
>             Project: Apache Oozie (Incubating)
>          Issue Type: Improvement
>            Reporter: Mona Chitnis
>              Labels: oozie
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> > Oozie 'move' fs action is inconsistent
> > --------------------------------------
> >
> >                 Key: OOZIE-133
> >                 URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
> >             Project: oozie
> >          Issue Type: New Feature
> >          Components: workflow
> >    Affects Versions: 3.0.2
> >            Reporter: Mona Chitnis
> >            Assignee: Oozie
> >   Original Estimate: 1 week
> >  Remaining Estimate: 1 week
> >
> > I'm using the 'move' fs action and I first got the following error:
> > FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
> > when I had the following in my workflow.xml :
> > <fs>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
> > <fs>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > However, I now get this error:
> > FS003: Scheme [hdfs] not allowed in path
> > [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
> > it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.
> > Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
> > exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
> > placed under the target directory.
> --
>      [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> Alejandro Abdelnur resolved OOZIE-133.
> --------------------------------------
>     Resolution: Won't Fix
> This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.
> --
> Mona Chitnis commented on OOZIE-133:
> ------------------------------------
> There are two parts to this issue.
> 1. target not to mention scheme
> 2. if target is an existing directory, exception thrown
> For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.
> For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-2) Oozie 'move' fs action is inconsistent

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078364#comment-13078364 ] 

Alejandro Abdelnur commented on OOZIE-2:
----------------------------------------

Solution for part 1 seams reasonable.

Solution for part 2, I'm not sure about it, Oozie FS is just calling the FileSystem.rename() method. The proposed 'smart' logic should be done by Hadoop FileSystem not by Oozie.



> Oozie 'move' fs action is inconsistent
> --------------------------------------
>
>                 Key: OOZIE-2
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2
>             Project: Apache Oozie (Incubating)
>          Issue Type: Improvement
>            Reporter: Mona Chitnis
>              Labels: oozie
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> > Oozie 'move' fs action is inconsistent
> > --------------------------------------
> >
> >                 Key: OOZIE-133
> >                 URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
> >             Project: oozie
> >          Issue Type: New Feature
> >          Components: workflow
> >    Affects Versions: 3.0.2
> >            Reporter: Mona Chitnis
> >            Assignee: Oozie
> >   Original Estimate: 1 week
> >  Remaining Estimate: 1 week
> >
> > I'm using the 'move' fs action and I first got the following error:
> > FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
> > when I had the following in my workflow.xml :
> > <fs>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
> > <fs>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > However, I now get this error:
> > FS003: Scheme [hdfs] not allowed in path
> > [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
> > it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.
> > Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
> > exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
> > placed under the target directory.
> --
>      [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> Alejandro Abdelnur resolved OOZIE-133.
> --------------------------------------
>     Resolution: Won't Fix
> This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.
> --
> Mona Chitnis commented on OOZIE-133:
> ------------------------------------
> There are two parts to this issue.
> 1. target not to mention scheme
> 2. if target is an existing directory, exception thrown
> For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.
> For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (OOZIE-2) Oozie 'move' fs action is inconsistent

Posted by "Mona Chitnis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mona Chitnis resolved OOZIE-2.
------------------------------

    Resolution: Fixed

> Oozie 'move' fs action is inconsistent
> --------------------------------------
>
>                 Key: OOZIE-2
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Mona Chitnis
>              Labels: oozie
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> > Oozie 'move' fs action is inconsistent
> > --------------------------------------
> >
> >                 Key: OOZIE-133
> >                 URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
> >             Project: oozie
> >          Issue Type: New Feature
> >          Components: workflow
> >    Affects Versions: 3.0.2
> >            Reporter: Mona Chitnis
> >            Assignee: Oozie
> >   Original Estimate: 1 week
> >  Remaining Estimate: 1 week
> >
> > I'm using the 'move' fs action and I first got the following error:
> > FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
> > when I had the following in my workflow.xml :
> > <fs>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
> > <fs>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > However, I now get this error:
> > FS003: Scheme [hdfs] not allowed in path
> > [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
> > it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.
> > Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
> > exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
> > placed under the target directory.
> --
>      [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> Alejandro Abdelnur resolved OOZIE-133.
> --------------------------------------
>     Resolution: Won't Fix
> This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.
> --
> Mona Chitnis commented on OOZIE-133:
> ------------------------------------
> There are two parts to this issue.
> 1. target not to mention scheme
> 2. if target is an existing directory, exception thrown
> For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.
> For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (OOZIE-2) Oozie 'move' fs action is inconsistent

Posted by "Mona Chitnis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mona Chitnis closed OOZIE-2.
----------------------------


> Oozie 'move' fs action is inconsistent
> --------------------------------------
>
>                 Key: OOZIE-2
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Mona Chitnis
>              Labels: oozie
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> > Oozie 'move' fs action is inconsistent
> > --------------------------------------
> >
> >                 Key: OOZIE-133
> >                 URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
> >             Project: oozie
> >          Issue Type: New Feature
> >          Components: workflow
> >    Affects Versions: 3.0.2
> >            Reporter: Mona Chitnis
> >            Assignee: Oozie
> >   Original Estimate: 1 week
> >  Remaining Estimate: 1 week
> >
> > I'm using the 'move' fs action and I first got the following error:
> > FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
> > when I had the following in my workflow.xml :
> > <fs>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
> > <fs>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > However, I now get this error:
> > FS003: Scheme [hdfs] not allowed in path
> > [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
> > it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.
> > Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
> > exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
> > placed under the target directory.
> --
>      [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> Alejandro Abdelnur resolved OOZIE-133.
> --------------------------------------
>     Resolution: Won't Fix
> This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.
> --
> Mona Chitnis commented on OOZIE-133:
> ------------------------------------
> There are two parts to this issue.
> 1. target not to mention scheme
> 2. if target is an existing directory, exception thrown
> For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.
> For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-2) Oozie 'move' fs action is inconsistent

Posted by "Mona Chitnis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082019#comment-13082019 ] 

Mona Chitnis commented on OOZIE-2:
----------------------------------

Here is the github link to the patch:

OOZIE-2: https://github.com/yahoo/oozie/commit/ec352102e97158aa527c1522eb751b1bd72c7b13

> Oozie 'move' fs action is inconsistent
> --------------------------------------
>
>                 Key: OOZIE-2
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2
>             Project: Apache Oozie (Incubating)
>          Issue Type: Improvement
>            Reporter: Mona Chitnis
>              Labels: oozie
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> > Oozie 'move' fs action is inconsistent
> > --------------------------------------
> >
> >                 Key: OOZIE-133
> >                 URL: http://h12.grid.sp2.yahoo.net/browse/OOZIE-133
> >             Project: oozie
> >          Issue Type: New Feature
> >          Components: workflow
> >    Affects Versions: 3.0.2
> >            Reporter: Mona Chitnis
> >            Assignee: Oozie
> >   Original Estimate: 1 week
> >  Remaining Estimate: 1 week
> >
> > I'm using the 'move' fs action and I first got the following error:
> > FS001: Missing scheme in path [/projects/ngdstone/user/ogg_oozie/intermediate/tmp_price_feats_uniq/.pig_header]
> > when I had the following in my workflow.xml :
> > <fs>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > I then prefixed the namenode URI to the paths (like I did for the <prepare> paths), as such:
> > <fs>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> >         <move source='${nameNode}${OUT}/intermediate/tmp_price_feats_uniq/.pig_header'
> > target='${nameNode}${OUT}/intermediate/tmp_predict_supply_feats/'/>
> > </fs>
> > However, I now get this error:
> > FS003: Scheme [hdfs] not allowed in path
> > [hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com:8020/projects/ngdstone/user/ogg_oozie/intermediate/tmp_predict_supply_feats]
> > it seems the 'scheme' is only needed for the source path, but not the target.  This is inconsistent.
> > Finally, if the source path is a file and the target path is a directory, Oozie will complain that the target already
> > exists.  I feel it should be consistent with the Hadoop CLI (and Unix) and simply understand that the source should be
> > placed under the target directory.
> --
>      [ http://h12.grid.sp2.yahoo.net/browse/OOZIE-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> Alejandro Abdelnur resolved OOZIE-133.
> --------------------------------------
>     Resolution: Won't Fix
> This is not an bug, it is like that by design. The reason is to make clear it is a move within the current filesystem; no actual data movement.
> --
> Mona Chitnis commented on OOZIE-133:
> ------------------------------------
> There are two parts to this issue.
> 1. target not to mention scheme
> 2. if target is an existing directory, exception thrown
> For part 1, if the target does include scheme, we can allow it but only if target's hdfs scheme is the same as source's (since move essentially incorporates a hadoop fs rename). This way users who have typed source and target paths both having the namenode parameter for the sake of consistency, do not face an exception.
> For part 2, hadoop can care of placing the source dir or file as a child of the target dir, if target dir exists. Is there any reason why oozie should not be consistent with this?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira