You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Benoit Tellier (Jira)" <se...@james.apache.org> on 2022/03/04 10:06:00 UTC

[jira] [Assigned] (JAMES-3719) Reactify Tika calls

     [ https://issues.apache.org/jira/browse/JAMES-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benoit Tellier reassigned JAMES-3719:
-------------------------------------

    Assignee: Benoit Tellier

> Reactify Tika calls
> -------------------
>
>                 Key: JAMES-3719
>                 URL: https://issues.apache.org/jira/browse/JAMES-3719
>             Project: James Server
>          Issue Type: Improvement
>          Components: elasticsearch
>    Affects Versions: 3.7.0
>            Reporter: Benoit Tellier
>            Assignee: Benoit Tellier
>            Priority: Major
>             Fix For: 3.8.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We rely on blocking HTTP calls to extract textual content with Tika.
> This means:
>  - Threads hangs around why we do the requests...
>  - We are blocking in a parrallel reactor thread (cassandra-app) which is dramatic performance wise.
> We can improve this matter of fact by using reactor-netty to query tika. 
> Caching layers need to be adapted to - guava is blocking. Caffeine library can be a good candidate as a reactive caching library.
> Also, we need to uncouple MIME parsing and content extraction: both are currently tightly coupled; I suggest extracting a POJO representation of the mail first, then extract content if need be, not do both at the same time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org