You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2008/03/25 11:35:26 UTC
[jira] Commented: (HADOOP-2055) JobConf should have a
setInputPathFilter(PathFilter filter) method
[ https://issues.apache.org/jira/browse/HADOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581861#action_12581861 ]
Alejandro Abdelnur commented on HADOOP-2055:
--------------------------------------------
I've figured out (IMO) a cleaner way of implementing this feature:
Adding the following 2 instance methods to the JobConf:
* void setInputPathFilter(class<? extends PathFilter> pathFilter);
* InputPathFilter getInputPathFilter();
Modifying the FileInputFormat's listPaths() method to apply the hiddenFileFilter and (if set) the filter set in the jobconf.
And still globbing works for regex inclusion, even if a path filter is set.
By being able to specify a custom PathFilter it will be possible to create more complex filters such as exclusion ones and doing selections not possible to be done via regex.
> JobConf should have a setInputPathFilter(PathFilter filter) method
> ------------------------------------------------------------------
>
> Key: HADOOP-2055
> URL: https://issues.apache.org/jira/browse/HADOOP-2055
> Project: Hadoop Core
> Issue Type: New Feature
> Environment: all
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Priority: Minor
>
> It should be possible to set a PathFilter for the input to avoid taking certain files as input data within the input directories.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.