You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Randal Moore <rd...@gmail.com> on 2021/12/13 23:47:32 UTC

using FileIO to read a single input file

I have some side inputs that I would like to add to my pipeline. Some of
them are based on a file pattern, so I found that I can collect the
contents of those files using a pattern like the following:

    val genotypes =
p.apply(FileIO.`match`.filepattern(opts.getGenotypesFilePattern()))
      .apply(FileIO.readMatches)
      .apply("ReadGenotypesFile", ParDo.of(new ReadFileAsBytes()))
      .apply("UnmarshalGenotypes", ParDo.of(new UnmarshalGenotypesDoFn()))
      .apply("GenotypesAsMap", Combine.globally[Genotypes,
ibd.GenotypesMap](new CombineGenotypesFn))
      .apply("ViewAsGeneticMap", View.asSingleton[ibd.GenotypesMap])

(the code snippet is Scala...)

I have another input - just a single file containing some protobuf. How do
I construct a single FileIO.ReadableFile rather than using the "match"?
Trying to avoid CombineGlobally  - I assume that would be more correct to
let Beam know the expected data and perhaps more performant.

Thanks in advance,
rdm