You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Stephen Sisk (JIRA)" <ji...@apache.org> on 2017/04/26 17:13:04 UTC

[jira] [Created] (BEAM-2081) I/O Authoring overview - better clarify how to read from files

Stephen Sisk created BEAM-2081:
----------------------------------

             Summary: I/O Authoring overview - better clarify how to read from files
                 Key: BEAM-2081
                 URL: https://issues.apache.org/jira/browse/BEAM-2081
             Project: Beam
          Issue Type: Improvement
          Components: website
            Reporter: Stephen Sisk
            Assignee: Davor Bonaci
            Priority: Minor


The I/O authoring doc is a little bit confusing - it has an example of reading from file globs and says to use ParDos, but then mentions "A class derived from FileBasedSource is often the best option when reading from files"

It'd be nice to better clarify this and provide guidance as to when to use which.

I *think* the right answer here is that if you file is splittable you use FBS (and let it handle the glob splitting), and if it's not splittable you use ParDos.

SDF I believe will make all this easier.

cc [~kirpichov] [~dhalperi@google.com]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)