You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2016/12/31 13:02:58 UTC
[jira] [Commented] (AIRFLOW-715) HDFS Sensor Should be more
effective
[ https://issues.apache.org/jira/browse/AIRFLOW-715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15789482#comment-15789482 ]
ASF subversion and git services commented on AIRFLOW-715:
---------------------------------------------------------
Commit 1c4cff056488623cfd3a6ec411e680e3e5198b21 in incubator-airflow's branch refs/heads/master from [~vfoucault]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1c4cff0 ]
[AIRFLOW-715] A more efficient HDFS Sensor:
A more efficient HDFS Sensor:
HDFS Sensor is now capable to trigger true based
on a file size, a directory status
(empty or not) a regex to match files in a
directory and also to discard copying files.
With the base HDFS Sensor, it was not possible to
watch a directory for files with a
unknown name.
HDFS Sensors is now extended with (contrib):
- HdfsSensorRegex : for matching files wih a regex
(re)
- HdfsSensorFolder : for matching with directory
HDFS Sensor has now to built in filters :
- filter_for_filesize : to filter list result by
the filesize
- filter_for_ignored_ext : to discard or not
copying files
Unittests added with a new FakeSnakebite client
and a FakeHdfsHook
A more efficient HDFS Sensor:
HDFS Sensor is now capable to trigger true based
on a file size, a directory status
(empty or not) a regex to match files in a
directory and also to discard copying files.
With the base HDFS Sensor, it was not possible to
watch a directory for files with a
unknown name.
HDFS Sensors is now extended with (contrib):
- HdfsSensorRegex : for matching files wih a regex
(re)
- HdfsSensorFolder : for matching with directory
HDFS Sensor has now to built in filters :
- filter_for_filesize : to filter list result by
the filesize
- filter_for_ignored_ext : to discard or not
copying files
Unittests added with a new FakeSnakebite client
and a FakeHdfsHook
A more efficient HDFS Sensor:
HDFS Sensor is now capable to trigger true based
on a file size, a directory status
(empty or not) a regex to match files in a
directory and also to discard copying files.
With the base HDFS Sensor, it was not possible to
watch a directory for files with a
unknown name.
HDFS Sensors is now extended with (contrib):
- HdfsSensorRegex : for matching files wih a regex
(re)
- HdfsSensorFolder : for matching with directory
HDFS Sensor has now to built in filters :
- filter_for_filesize : to filter list result by
the filesize
- filter_for_ignored_ext : to discard or not
copying files
Unittests added with a new FakeSnakebite client
and a FakeHdfsHook
Closes #1957 from vfoucault/feature/AIRFLOW-715
> HDFS Sensor Should be more effective
> ------------------------------------
>
> Key: AIRFLOW-715
> URL: https://issues.apache.org/jira/browse/AIRFLOW-715
> Project: Apache Airflow
> Issue Type: Improvement
> Components: operators
> Affects Versions: Airflow 2.0, Airflow 1.7.1
> Environment: HDFS Sensor should be more effective
> Reporter: Vianney FOUCAULT
> Assignee: Vianney FOUCAULT
> Priority: Minor
> Fix For: Airflow 1.8
>
>
> As a Airflow user, HDFS Sensor should be more effective and be aware of file size, matching regex in files names, be aware of empty directories
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)