You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Randal Moore <rd...@gmail.com> on 2021/12/13 23:47:32 UTC
using FileIO to read a single input file
I have some side inputs that I would like to add to my pipeline. Some of
them are based on a file pattern, so I found that I can collect the
contents of those files using a pattern like the following:
val genotypes =
p.apply(FileIO.`match`.filepattern(opts.getGenotypesFilePattern()))
.apply(FileIO.readMatches)
.apply("ReadGenotypesFile", ParDo.of(new ReadFileAsBytes()))
.apply("UnmarshalGenotypes", ParDo.of(new UnmarshalGenotypesDoFn()))
.apply("GenotypesAsMap", Combine.globally[Genotypes,
ibd.GenotypesMap](new CombineGenotypesFn))
.apply("ViewAsGeneticMap", View.asSingleton[ibd.GenotypesMap])
(the code snippet is Scala...)
I have another input - just a single file containing some protobuf. How do
I construct a single FileIO.ReadableFile rather than using the "match"?
Trying to avoid CombineGlobally - I assume that would be more correct to
let Beam know the expected data and perhaps more performant.
Thanks in advance,
rdm