You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2017/02/16 13:17:08 UTC

Testing an ingest framework that uses Apache Tika

All,

I finally got around to documenting Apache Tika's MockParser[1].  As of Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and you can simulate:

1. Regular catchable exceptions
2. OOMs
3. Permanent hangs

This will allow you to determine if your ingest framework is robust against these issues.

As always, we fix Tika when we can, but if history is any indicator, you'll want to make sure your ingest code can handle these issues if you are handling millions/billions of files from the wild.

Cheers,

            Tim


[1] https://wiki.apache.org/tika/MockParser

RE: Testing an ingest framework that uses Apache Tika

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Thank you, Chris, Luís and Konstantin!



-----Original Message-----
From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Thursday, February 16, 2017 10:18 AM
To: dev@tika.apache.org; lfcnassif@gmail.com
Cc: solr-user@lucene.apache.org
Subject: Re: Testing an ingest framework that uses Apache Tika

++1 awesome job

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF & Open Source Projects Formulation and Development Offices (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 2/16/17, 5:28 AM, "Luís Filipe Nassif" <lf...@gmail.com> wrote:

    Excellent, Tim! Thank you for all your great work on Apache Tika!
    
    2017-02-16 11:23 GMT-02:00 Konstantin Gribov <gr...@gmail.com>:
    
    > Tim,
    >
    > it's a awesome feature for downstream projects' integration tests. Thanks
    > for implementing it!
    >
    > чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <ta...@mitre.org>:
    >
    > > All,
    > >
    > > I finally got around to documenting Apache Tika's MockParser[1].  As of
    > > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
    > you
    > > can simulate:
    > >
    > > 1. Regular catchable exceptions
    > > 2. OOMs
    > > 3. Permanent hangs
    > >
    > > This will allow you to determine if your ingest framework is robust
    > > against these issues.
    > >
    > > As always, we fix Tika when we can, but if history is any indicator,
    > > you'll want to make sure your ingest code can handle these issues if you
    > > are handling millions/billions of files from the wild.
    > >
    > > Cheers,
    > >
    > >             Tim
    > >
    > >
    > > [1] https://wiki.apache.org/tika/MockParser
    > >
    > --
    >
    > Best regards,
    > Konstantin Gribov
    >
    


Re: Testing an ingest framework that uses Apache Tika

Posted by "Mattmann, Chris A (3010)" <ch...@jpl.nasa.gov>.
++1 awesome job

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 2/16/17, 5:28 AM, "Luís Filipe Nassif" <lf...@gmail.com> wrote:

    Excellent, Tim! Thank you for all your great work on Apache Tika!
    
    2017-02-16 11:23 GMT-02:00 Konstantin Gribov <gr...@gmail.com>:
    
    > Tim,
    >
    > it's a awesome feature for downstream projects' integration tests. Thanks
    > for implementing it!
    >
    > чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <ta...@mitre.org>:
    >
    > > All,
    > >
    > > I finally got around to documenting Apache Tika's MockParser[1].  As of
    > > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
    > you
    > > can simulate:
    > >
    > > 1. Regular catchable exceptions
    > > 2. OOMs
    > > 3. Permanent hangs
    > >
    > > This will allow you to determine if your ingest framework is robust
    > > against these issues.
    > >
    > > As always, we fix Tika when we can, but if history is any indicator,
    > > you'll want to make sure your ingest code can handle these issues if you
    > > are handling millions/billions of files from the wild.
    > >
    > > Cheers,
    > >
    > >             Tim
    > >
    > >
    > > [1] https://wiki.apache.org/tika/MockParser
    > >
    > --
    >
    > Best regards,
    > Konstantin Gribov
    >
    


Re: Testing an ingest framework that uses Apache Tika

Posted by "Mattmann, Chris A (3010)" <ch...@jpl.nasa.gov>.
++1 awesome job

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 2/16/17, 5:28 AM, "Luís Filipe Nassif" <lf...@gmail.com> wrote:

    Excellent, Tim! Thank you for all your great work on Apache Tika!
    
    2017-02-16 11:23 GMT-02:00 Konstantin Gribov <gr...@gmail.com>:
    
    > Tim,
    >
    > it's a awesome feature for downstream projects' integration tests. Thanks
    > for implementing it!
    >
    > чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <ta...@mitre.org>:
    >
    > > All,
    > >
    > > I finally got around to documenting Apache Tika's MockParser[1].  As of
    > > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
    > you
    > > can simulate:
    > >
    > > 1. Regular catchable exceptions
    > > 2. OOMs
    > > 3. Permanent hangs
    > >
    > > This will allow you to determine if your ingest framework is robust
    > > against these issues.
    > >
    > > As always, we fix Tika when we can, but if history is any indicator,
    > > you'll want to make sure your ingest code can handle these issues if you
    > > are handling millions/billions of files from the wild.
    > >
    > > Cheers,
    > >
    > >             Tim
    > >
    > >
    > > [1] https://wiki.apache.org/tika/MockParser
    > >
    > --
    >
    > Best regards,
    > Konstantin Gribov
    >
    


Re: Testing an ingest framework that uses Apache Tika

Posted by Luís Filipe Nassif <lf...@gmail.com>.
Excellent, Tim! Thank you for all your great work on Apache Tika!

2017-02-16 11:23 GMT-02:00 Konstantin Gribov <gr...@gmail.com>:

> Tim,
>
> it's a awesome feature for downstream projects' integration tests. Thanks
> for implementing it!
>
> чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <ta...@mitre.org>:
>
> > All,
> >
> > I finally got around to documenting Apache Tika's MockParser[1].  As of
> > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
> you
> > can simulate:
> >
> > 1. Regular catchable exceptions
> > 2. OOMs
> > 3. Permanent hangs
> >
> > This will allow you to determine if your ingest framework is robust
> > against these issues.
> >
> > As always, we fix Tika when we can, but if history is any indicator,
> > you'll want to make sure your ingest code can handle these issues if you
> > are handling millions/billions of files from the wild.
> >
> > Cheers,
> >
> >             Tim
> >
> >
> > [1] https://wiki.apache.org/tika/MockParser
> >
> --
>
> Best regards,
> Konstantin Gribov
>

Re: Testing an ingest framework that uses Apache Tika

Posted by Luís Filipe Nassif <lf...@gmail.com>.
Excellent, Tim! Thank you for all your great work on Apache Tika!

2017-02-16 11:23 GMT-02:00 Konstantin Gribov <gr...@gmail.com>:

> Tim,
>
> it's a awesome feature for downstream projects' integration tests. Thanks
> for implementing it!
>
> чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <ta...@mitre.org>:
>
> > All,
> >
> > I finally got around to documenting Apache Tika's MockParser[1].  As of
> > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
> you
> > can simulate:
> >
> > 1. Regular catchable exceptions
> > 2. OOMs
> > 3. Permanent hangs
> >
> > This will allow you to determine if your ingest framework is robust
> > against these issues.
> >
> > As always, we fix Tika when we can, but if history is any indicator,
> > you'll want to make sure your ingest code can handle these issues if you
> > are handling millions/billions of files from the wild.
> >
> > Cheers,
> >
> >             Tim
> >
> >
> > [1] https://wiki.apache.org/tika/MockParser
> >
> --
>
> Best regards,
> Konstantin Gribov
>

Re: Testing an ingest framework that uses Apache Tika

Posted by Konstantin Gribov <gr...@gmail.com>.
Tim,

it's a awesome feature for downstream projects' integration tests. Thanks
for implementing it!

чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <ta...@mitre.org>:

> All,
>
> I finally got around to documenting Apache Tika's MockParser[1].  As of
> Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and you
> can simulate:
>
> 1. Regular catchable exceptions
> 2. OOMs
> 3. Permanent hangs
>
> This will allow you to determine if your ingest framework is robust
> against these issues.
>
> As always, we fix Tika when we can, but if history is any indicator,
> you'll want to make sure your ingest code can handle these issues if you
> are handling millions/billions of files from the wild.
>
> Cheers,
>
>             Tim
>
>
> [1] https://wiki.apache.org/tika/MockParser
>
-- 

Best regards,
Konstantin Gribov

Re: Testing an ingest framework that uses Apache Tika

Posted by Konstantin Gribov <gr...@gmail.com>.
Tim,

it's a awesome feature for downstream projects' integration tests. Thanks
for implementing it!

чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <ta...@mitre.org>:

> All,
>
> I finally got around to documenting Apache Tika's MockParser[1].  As of
> Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and you
> can simulate:
>
> 1. Regular catchable exceptions
> 2. OOMs
> 3. Permanent hangs
>
> This will allow you to determine if your ingest framework is robust
> against these issues.
>
> As always, we fix Tika when we can, but if history is any indicator,
> you'll want to make sure your ingest code can handle these issues if you
> are handling millions/billions of files from the wild.
>
> Cheers,
>
>             Tim
>
>
> [1] https://wiki.apache.org/tika/MockParser
>
-- 

Best regards,
Konstantin Gribov