You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/05/25 20:47:51 UTC

[DISCUSS] Thinking about completely refactoring the ExternalParser and using commons-exec

Hey Everyone,

ExternalParser is way broke. I have some patches that somewhat fix it, but in doing so, I realized, why not just use commons-exec? I realize that this is another dependency into core, but commons-exec simplifies a lot of the stuff that's broke with ExternalParser (reading its streams, for one).

Thoughts? Objections?

Note this is in reference to fixing FFMPEG parsing, which I've nearly done.

Cheers,
Chris

Re: [DISCUSS] Thinking about completely refactoring the ExternalParser and using commons-exec

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 25 May 2015, Tyler Palsulich wrote:
>> Maybe we could push some or all of external parser into the 
>> tika-parsers module, so we don't have to add more dependencies into 
>> core?
>
> What is the argument for having ExternalParser in core? Provide an 
> easy-to-extend class for downstream users to create their own external 
> parser?

I can't be certain, as svn blame suggests is was done over 4 yeras ago, 
but I've got a feeling we said something like "the other parser abstract 
and base classes are in core, and it has no dependencies, so why not put 
it in core too". Quite possible there's a comment around that in the list 
archives or jira, if someone fancies a few minutes with google to verify!

Nick

Re: [DISCUSS] Thinking about completely refactoring the ExternalParser and using commons-exec

Posted by Tyler Palsulich <tp...@gmail.com>.
On Mon, May 25, 2015 at 4:05 PM, Nick Burch <ap...@gagravarr.org> wrote:

> On Mon, 25 May 2015, Mattmann, Chris A (3980) wrote:
>
>> ExternalParser is way broke. I have some patches that somewhat fix it,
>> but in doing so, I realized, why not just use commons-exec? I realize that
>> this is another dependency into core, but commons-exec simplifies a lot of
>> the stuff that's broke with ExternalParser (reading its streams, for one).
>>
>
> Maybe we could push some or all of external parser into the tika-parsers
> module, so we don't have to add more dependencies into core?


What is the argument for having ExternalParser in core? Provide an
easy-to-extend class for downstream users to create their own external
parser?

Tyler

Re: [DISCUSS] Thinking about completely refactoring the ExternalParser and using commons-exec

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 25 May 2015, Mattmann, Chris A (3980) wrote:
> ExternalParser is way broke. I have some patches that somewhat fix it, 
> but in doing so, I realized, why not just use commons-exec? I realize 
> that this is another dependency into core, but commons-exec simplifies a 
> lot of the stuff that's broke with ExternalParser (reading its streams, 
> for one).

Maybe we could push some or all of external parser into the tika-parsers 
module, so we don't have to add more dependencies into core?

Nick