You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flume.apache.org by "Confuse (Jira)" <ji...@apache.org> on 2020/01/09 13:44:00 UTC

[jira] [Created] (FLUME-3350) Spooldir source may collect empty files and write them to HDFS

Confuse created FLUME-3350:
------------------------------

             Summary: Spooldir source may collect empty files and write them to HDFS
                 Key: FLUME-3350
                 URL: https://issues.apache.org/jira/browse/FLUME-3350
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: 1.9.0
            Reporter: Confuse
         Attachments: image-2020-01-09-21-33-55-306.png

When I collect data from spooldir source to HDFS,i found if an empty file is created in spoolDir, an empty file with the same name will appear on hfds. It seems unreasonable. After reading source coding,i fount this code the following conditions will never be true in SpoolDirectorySource class.

 public void run() {
      int backoffInterval = 250;
      boolean readingEvents = false;
      try {
        while (!Thread.interrupted()) {
          readingEvents = true;
          List<Event> events = reader.readEvents(batchSize);
          readingEvents = false;
          
           # this conditions will never be true
          if (events.isEmpty()) {
            break;
          }
       .
       .
       .
}

Please confirm whether this phenomenon is a problem. In my opinion, collecting empty file is meaningless. Especially for HDFS, it is not allowed to store too many small files on HDFS. Even if the user puts a lot of empty files unconsciously, flume should process it instead of writing to HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@flume.apache.org
For additional commands, e-mail: issues-help@flume.apache.org