You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "liyu (Jira)" <ji...@apache.org> on 2021/10/14 06:43:00 UTC
[jira] [Created] (TIKA-3574) after the fork parser timeout,Can't
get the correct content-type
liyu created TIKA-3574:
--------------------------
Summary: after the fork parser timeout,Can't get the correct content-type
Key: TIKA-3574
URL: https://issues.apache.org/jira/browse/TIKA-3574
Project: Tika
Issue Type: Bug
Reporter: liyu
code example
{code:java}
Parser parser = new AutoDecterParser(tikaConfig);
parser = new RecursiveParserWrapper(parser);
ForkParser forkParser = new ForkParser(parser.getClass().getClassLoader(), parser);
forkParser.setServerParseTimeoutMills(600000);
forkParser.setServerWaitTimeoutMills(600000);
// then parser inputstream
BasicContentHandlerFactory factory = new BasicContentHandlerFactory(HANDLER_TYPE.HEML, 104857600);
RecursiveParseWrapperHandler handler = new RecursiveParseWrapperHandler(factory, -1);
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
try{
forkParser.parse(inputStream,handler,metadata,context);
} catch (Exception e) {
}
{code}
after the fork parser timeout, i get metaDataList from handler.getMetaDataList()
But handler.getMetaDataList().get(0) not root metadata of inputstream, it's embeddedDocument metadata of inputStream
So i can't get current ContentType for inputstream
--
This message was sent by Atlassian Jira
(v8.3.4#803005)