You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Alex Parvulescu (Updated) (JIRA)" <ji...@apache.org> on 2011/11/14 15:08:52 UTC

[jira] [Updated] (JCR-3146) Text extraction may congest thread pool in the repository

     [ https://issues.apache.org/jira/browse/JCR-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Parvulescu updated JCR-3146:
---------------------------------

    Attachment: JCR-3146.patch

The solution is to define another queue for the tasks considered as low priority, so that they don't fill the execution queue.
Then, depending on the executor's load poll this queue for additional work items.

The secondary queue will only be used as needed, and the load is configurable via the system property 
"org.apache.jackrabbit.core.JackrabbitThreadPool.maxLoadForLowPriorityTasks"
This property is meant to be used as a percent. 0 means disabled / the default is 75.

There are some timing issues with the indexing tests on account of this new async text extraction. I've tried to fix all of them, but there may be more.

I haven't touched yet on the tika extraction that happens in a different process. I think that will need some minor refactoring as well.

Attaching proposed patch.


                
> Text extraction may congest thread pool in the repository
> ---------------------------------------------------------
>
>                 Key: JCR-3146
>                 URL: https://issues.apache.org/jira/browse/JCR-3146
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core
>            Reporter: Alex Parvulescu
>            Priority: Minor
>         Attachments: JCR-3146.patch
>
>
> Text extraction congests the thread pool in the repository when e.g. many PDFs are loaded into the workspace. Tasks submitted by the index merger are delayed because of that and will result in many index segment folders.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira