You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2016/04/06 19:18:28 UTC

FW: Apache Tika used to parse the Panama papers!

Looks like quite a few PDFs [0]...

Couldn't have done it without you! 

Cheers,

           Tim

P.S. Tip of the hat to Andreas for rt the link!

[0] https://twitter.com/bigdata/status/717346207312392192 

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Tuesday, April 05, 2016 6:47 PM
To: dev@tika.apache.org
Cc: press@apache.org
Subject: Apache Tika used to parse the Panama papers!

FYI:
http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech&utm_source=TWITTER&utm_medium=social&utm_channel=Technology&linkId=23087770#709893771df5


BTW I know Thomas and am in touch..he wrote an article about MEMEX last year.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






Re: FW: Apache Tika used to parse the Panama papers!

Posted by Tilman Hausherr <TH...@t-online.de>.
Yes I read about that too :-)

It would be interesting to hear whether they had any problems, and 
whether they made any support requests, and were these answered 
successfully? Were there any files that failed or did poorly? Or was 
everything so good that no help was needed at all?

I'm delighted that a java product was used, despite that native code 
products would likely have been faster.

Tilman (I'm slightly skeptic about the ICIJ because of the funding and 
the suspicious lack of US data, but as a huge data archeology project, I 
love it!)

Am 06.04.2016 um 19:18 schrieb Allison, Timothy B.:
> Looks like quite a few PDFs [0]...
>
> Couldn't have done it without you!
>
> Cheers,
>
>             Tim
>
> P.S. Tip of the hat to Andreas for rt the link!
>
> [0] https://twitter.com/bigdata/status/717346207312392192
>
> -----Original Message-----
> From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
> Sent: Tuesday, April 05, 2016 6:47 PM
> To: dev@tika.apache.org
> Cc: press@apache.org
> Subject: Apache Tika used to parse the Panama papers!
>
> FYI:
> http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech&utm_source=TWITTER&utm_medium=social&utm_channel=Technology&linkId=23087770#709893771df5
>
>
> BTW I know Thomas and am in touch..he wrote an article about MEMEX last year.
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


Re: FW: Apache Tika used to parse the Panama papers!

Posted by Tilman Hausherr <TH...@t-online.de>.
Yes I read about that too :-)

It would be interesting to hear whether they had any problems, and 
whether they made any support requests, and were these answered 
successfully? Were there any files that failed or did poorly? Or was 
everything so good that no help was needed at all?

I'm delighted that a java product was used, despite that native code 
products would likely have been faster.

Tilman (I'm slightly skeptic about the ICIJ because of the funding and 
the suspicious lack of US data, but as a huge data archeology project, I 
love it!)

Am 06.04.2016 um 19:18 schrieb Allison, Timothy B.:
> Looks like quite a few PDFs [0]...
>
> Couldn't have done it without you!
>
> Cheers,
>
>             Tim
>
> P.S. Tip of the hat to Andreas for rt the link!
>
> [0] https://twitter.com/bigdata/status/717346207312392192
>
> -----Original Message-----
> From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
> Sent: Tuesday, April 05, 2016 6:47 PM
> To: dev@tika.apache.org
> Cc: press@apache.org
> Subject: Apache Tika used to parse the Panama papers!
>
> FYI:
> http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech&utm_source=TWITTER&utm_medium=social&utm_channel=Technology&linkId=23087770#709893771df5
>
>
> BTW I know Thomas and am in touch..he wrote an article about MEMEX last year.
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org