You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2023/01/30 16:09:00 UTC
[jira] [Created] (TIKA-3962) Set RFC822 parser to noRecurse
Tim Allison created TIKA-3962:
---------------------------------
Summary: Set RFC822 parser to noRecurse
Key: TIKA-3962
URL: https://issues.apache.org/jira/browse/TIKA-3962
Project: Tika
Issue Type: Task
Reporter: Tim Allison
On our test file {{testGroupWiseEml.eml}}, there's an embedded rfc822 attachment that is currently not treated as an attachment but is inlined.
The relevant section of the test file is:
{noformat}
Content-Type: message/rfc822
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.eml"
{noformat}
When I open the email in several email clients, it shows this {{test.eml}} correctly as an attachment.
It turns out there's a setting on mime4j's parser "setNoRecurse" that yields the correct behavior on this test file. Given that Tika handles files recursively already by default, I _think_ we should be safe to set no recurse in the mime4j parser and rely on Tika's own recursive parsing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)