You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/05/25 15:15:00 UTC

[jira] [Resolved] (HUDI-1723) DFSPathSelector skips files with the same modify date when read up to source limit

     [ https://issues.apache.org/jira/browse/HUDI-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan resolved HUDI-1723.
---------------------------------------
    Resolution: Fixed

> DFSPathSelector skips files with the same modify date when read up to source limit
> ----------------------------------------------------------------------------------
>
>                 Key: HUDI-1723
>                 URL: https://issues.apache.org/jira/browse/HUDI-1723
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: DeltaStreamer
>            Reporter: Raymond Xu
>            Assignee: Raymond Xu
>            Priority: Blocker
>              Labels: pull-request-available, sev:critical, user-support-issues
>             Fix For: 0.9.0
>
>         Attachments: Screen Shot 2021-03-26 at 1.42.42 AM.png
>
>
> org.apache.hudi.utilities.sources.helpers.DFSPathSelector#listEligibleFiles filters the input files based on last saved checkpoint, which was the modification date from last read file. However, the last read file's modification date could be duplicated for multiple files and resulted in skipping a few of them when reading up to source limit. An illustration is shown in the attached picture.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)