You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@james.apache.org by GitBox <gi...@apache.org> on 2022/03/03 04:18:48 UTC
[GitHub] [james-project] chibenwa opened a new pull request #902: JAMES-3719 Reactive textual content extraction with Apache Tika
chibenwa opened a new pull request #902:
URL: https://github.com/apache/james-project/pull/902
Tika was called from reactive code and was doing blocking HTTP calls from within
the MIME parsing code.
This generate:
- An unneeded thread consumption as we have some threads waiting for Tika
response
- Potentially dangerous blocking calls: for instance the InVM event bus was
doing such calls on the parallel thread pool (where it is critical NOT to
block)...
- Also the connection was opened on a per-call basis, not being reused.
We introduce the following changes:
- Reactification of the TextExtractor API
- We re-implement the HTTP calls done by TikaTextExtractor with reactor-netty
which allows us to pool HTTP connections and do this in a non-blocking
reactive fashion.
- We provide a reactive cache using the caffeine caching library - Guava
caches are blocking thus not an option...
- We uncouple the text extraction from the MIME parsing phase by introducing
an intermediate POJO. Doing so requires us to do a post-parsing copy of
content.
Only do the copy if necessary. We don't want to copy large attachments for whom no text is going to be extracted...
- Finally we reactify index content generation for ElasticSearch code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org