You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Bob Paulin (JIRA)" <ji...@apache.org> on 2015/10/16 00:45:05 UTC

[jira] [Comment Edited] (TIKA-1762) Create Executor Service from TikaConfig

    [ https://issues.apache.org/jira/browse/TIKA-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959775#comment-14959775 ] 

Bob Paulin edited comment on TIKA-1762 at 10/15/15 10:44 PM:
-------------------------------------------------------------

Hey Nick,

Just so we're clear on what's being proposed are you thinking something like this with respect to the ExternalParser:

{code}
public class ExternalParser extends AbstractParser {
    
    private final ExecutorService executorService;
    
    public ExternalParser() {
        this.executorService = Executors.newSingleThreadExecutor();
    }
    
    public ExternalParser(TikaConfig config)
    {
        this.executorService = config.getExecutorService();
    }   
    
    public ExternalParser(ExecutorService executorService)
    {
        this.executorService = executorService
    }
...
{code}

This does at least provide a default.  For example the following:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Thread t = new Thread() {
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        };
        t.start();
        try{
     	   t.join();
        }
        catch(InterruptedException ignore){}        
    }
{code}

Could be replaced with:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Future result = executorService.submit(new Runnable() {
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        });
        try{
     	   result.get();
        }
        catch(InterruptedException ignore){}        
    }
{code}

The approach I was originally thinking of would not require an executor service to be set as a default but would replace replace the Thread with:
{code}
private void sendInput(final Process process, final InputStream stream, final ParseContext context) {
        Future result = ConcurrentUtils.execute(context, new Runnable(){
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        });
        try{
     	   result.get();
        }
        catch(InterruptedException ignore){}        
    }
{code}
The code above is not as elegant but doesn't require a default thread pool at instantiation.  The ConcurrentUtil method could also be modified to take the ExecutorService directly so it could come from the Parser's field instead of only the ParseContext.  Then a null ExecutorService could fall back to just creating threads.  Any opinions on which is worse: Spinning up a number of threads that are infrequently used in a Pool or creating threads on the fly the way we do now? 


was (Author: bobpaulin):
Hey Nick,

Just so we're clear on what's being proposed are you thinking something like this with respect to the ExternalParser:

{code}
public class ExternalParser extends AbstractParser {
    
    private final ExecutorService executorService;
    
    public ExternalParser() {
        this.executorService = Executors.newSingleThreadExecutor();
    }
    
    public ExternalParser(TikaConfig config)
    {
        this.executorService = config.getExecutorService();
    }   
    
    public ExternalParser(ExecutorService executorService)
    {
        this.executorService = executorService
    }
...
{code}

This does at least provide a default.  For example the following:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Thread t = new Thread() {
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        };
        t.start();
        try{
     	   t.join();
        }
        catch(InterruptedException ignore){}        
    }
{code}

Could be replaced with:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Future result = executorService.submit(new Runnable() {
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        });
        try{
     	   result.get();
        }
        catch(InterruptedException ignore){}        
    }
{code}

The approach I was originally thinking of would not require an executor service to be set as a default but would replace replace the Thread with:
{code}
private void sendInput(final Process process, final InputStream stream) {
        Future result = ConcurrentUtils.execute(new Runnable(){
            public void run() {
                OutputStream stdin = process.getOutputStream();
                try {
                    IOUtils.copy(stream, stdin);
                } catch (IOException e) {
                }
            }
        });
        try{
     	   result.get();
        }
        catch(InterruptedException ignore){}        
    }
{code}
The code above is not as elegant but doesn't require a default thread pool at instantiation.  The ConcurrentUtil method could also be modified to take the ExecutorService directly so it could come from the Parser's field instead of only the ParseContext.  Then a null ExecutorService could fall back to just creating threads.  Any opinions on which is worse: Spinning up a number of threads that are infrequently used in a Pool or creating threads on the fly the way we do now? 

> Create Executor Service from TikaConfig
> ---------------------------------------
>
>                 Key: TIKA-1762
>                 URL: https://issues.apache.org/jira/browse/TIKA-1762
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Bob Paulin
>             Fix For: 1.11
>
>
> Create a configurable executor service that is configurable from the TikaConfig.
>  Konstantin Gribov added a comment - 23/Sep/15 09:55
> Bob Paulin, I have two ideas on the issue:
>     by default use common thread pool, configured via and contained in TikaConfig as Tyler Palsulich suggested,
>     you can pass thread pool for parser invocation via ParserContext with fallback to default if now thread pool/executor service in context.
> Also o.a.tika.Tika#parse(InputStream, Metadate) produces o.a.tika.parser.ParsingReader and anonymous Executor with unbounded daemon thread creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)