You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@tika.apache.org by Adrian Blakey <ad...@gmail.com> on 2021/12/04 19:58:58 UTC

Parsing an mbox file?

Could someone please explain how to use the mbox parser to parse an mbox
file?

I have looked closely at the code and written a little test program. There
is nothing apparent that:

1. Returns the output from each individual document separately, without
writing an input parser/splitter to split the input mbox file into "From "
separated documents before parsing. Is this the intent?
2. Set the "tracking" flag in the MboxParser to true, so that it returns
the email header as additional meta-data. Should it be passed in the
ParseContext?
3. Tell it to not return blank content.