You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Hans-Peter Zorn <zo...@algo.informatik.tu-darmstadt.de> on 2013/09/18 14:16:49 UTC

Best practice for accessing separate metadata for input files?

Hi,

I have implemented a custom Writable that needs special metadata (a Apache
UIMA type system) to decode the input. This is much more complex metadata
than a simple schema, so I suppose I can't use HCat or similar things. I
would like to store this metadata only once per input file, e.g.

part-00000                              (sequence file)

.part-00000.typesystem.xml (metadata)

What would be the best practice to write and read such metadata from my
Writable? Do I need to implement custom FileFormats, RecordReaders etc or
is there somewhere an API for locating the HDFS FQDN of the file containing
the current input split so I can locate the metadata file that belongs to
it? I also need to create this metadata when output of this kind is written.

Thanks,

Hans-Peter