You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2007/10/15 12:56:50 UTC

[jira] Created: (HADOOP-2055) JobConf should have a setInputPathFilter(PathFilter filter) method

JobConf should have a setInputPathFilter(PathFilter filter) method
------------------------------------------------------------------

                 Key: HADOOP-2055
                 URL: https://issues.apache.org/jira/browse/HADOOP-2055
             Project: Hadoop
          Issue Type: New Feature
         Environment: all
            Reporter: Alejandro Abdelnur
            Priority: Minor


It should be possible to set a PathFilter for the input to avoid taking certain files as input data within the input directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2055) JobConf should have a setInputPathFilter(PathFilter filter) method

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534876 ] 

Owen O'Malley commented on HADOOP-2055:
---------------------------------------

The method should probably also have a getter and most of them look like:
{code}
public static void setInputPathFilter(JobConf job, PathFilter filter);
public static PathFilter getInputPathFilter(JobConf job);
{code}

> JobConf should have a setInputPathFilter(PathFilter filter) method
> ------------------------------------------------------------------
>
>                 Key: HADOOP-2055
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2055
>             Project: Hadoop
>          Issue Type: New Feature
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Minor
>
> It should be possible to set a PathFilter for the input to avoid taking certain files as input data within the input directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2055) JobConf should have a setInputPathFilter(PathFilter filter) method

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535280 ] 

Alejandro Abdelnur commented on HADOOP-2055:
--------------------------------------------

Having a static method on the FileInputFormat it would make difficult for an application that dispatches hadoop jobs (ie a webapp) to set filters on per job basis.

IMO, it should be configurable at job level.


> JobConf should have a setInputPathFilter(PathFilter filter) method
> ------------------------------------------------------------------
>
>                 Key: HADOOP-2055
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2055
>             Project: Hadoop
>          Issue Type: New Feature
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Minor
>
> It should be possible to set a PathFilter for the input to avoid taking certain files as input data within the input directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2055) JobConf should have a setInputPathFilter(PathFilter filter) method

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535507 ] 

eric baldeschwieler commented on HADOOP-2055:
---------------------------------------------

we support globing in input paths now.  Doesn't that address this need?

IE *.foo 

> JobConf should have a setInputPathFilter(PathFilter filter) method
> ------------------------------------------------------------------
>
>                 Key: HADOOP-2055
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2055
>             Project: Hadoop
>          Issue Type: New Feature
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Minor
>
> It should be possible to set a PathFilter for the input to avoid taking certain files as input data within the input directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2055) JobConf should have a setInputPathFilter(PathFilter filter) method

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535302 ] 

Doug Cutting commented on HADOOP-2055:
--------------------------------------

> IMO, it should be configurable at job level.

Please look more closely at the static methods Owen suggested.  The job is a parameter.


> JobConf should have a setInputPathFilter(PathFilter filter) method
> ------------------------------------------------------------------
>
>                 Key: HADOOP-2055
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2055
>             Project: Hadoop
>          Issue Type: New Feature
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Minor
>
> It should be possible to set a PathFilter for the input to avoid taking certain files as input data within the input directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2055) JobConf should have a setInputPathFilter(PathFilter filter) method

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537170 ] 

Alejandro Abdelnur commented on HADOOP-2055:
--------------------------------------------

Owen, Doug, got the static methos thing, that would work.

Eric, using wildcards would not work as it allows you to tell what you want to include, but now what you don't want to include.

For example, if I have some files like the CRC files (to track other type of information) and I would like to skip them.



> JobConf should have a setInputPathFilter(PathFilter filter) method
> ------------------------------------------------------------------
>
>                 Key: HADOOP-2055
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2055
>             Project: Hadoop
>          Issue Type: New Feature
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Minor
>
> It should be possible to set a PathFilter for the input to avoid taking certain files as input data within the input directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2055) JobConf should have a setInputPathFilter(PathFilter filter) method

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534875 ] 

Owen O'Malley commented on HADOOP-2055:
---------------------------------------

This should be a static method on the FileInputFormat instead of JobConf, since it won't affect the framework, but only the FileInputFormat's behavior. 

> JobConf should have a setInputPathFilter(PathFilter filter) method
> ------------------------------------------------------------------
>
>                 Key: HADOOP-2055
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2055
>             Project: Hadoop
>          Issue Type: New Feature
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Minor
>
> It should be possible to set a PathFilter for the input to avoid taking certain files as input data within the input directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.