You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Nick Burch <ni...@apache.org> on 2017/05/14 15:34:21 UTC

Tika talk next week - help needed!

Hi All

Last year in Seville, I gave a talk on Tika entitled "Apache Tika - What’s 
new with 2.0?". For ApacheCon Miami next week, I've been roped into giving 
an updated version...
https://apachecon2017.sched.com/event/9zvD/apache-tika-whats-new-with-20-nick-burch-apache-software-foundation

My slides from Seville are available at:
http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf

Beyond updating the list of releases and parsers, and the slide 
background, what should I change?

Maybe some more on Tika eval? More details on some of the NLP / Entity 
Recognition / Image Recoginition stuff? Some screenshots of that stuff? 
More on translation? Something else?

Ideas greatly appreciated! Good screenshots even more so :)

Cheers
Nick

Re: Tika talk next week - help needed!

Posted by Bob Paulin <bo...@bobpaulin.com>.
Quick slide on camel-tika. 

https://docs.google.com/presentation/d/1OUORiDwB4d0FkLZ0HIlQDLE30vvTniawdyzhQmLj1xE/edit?usp=sharing



On 5/16/2017 10:31 AM, Nick Burch wrote:
> On Tue, 16 May 2017, Eric Pugh wrote:
>> It was great to read through
>> http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf…
>> Wow there is a lot in Tika.
>>
>> And I think that might be the one challenge with the talk structure,
>> there is SOO much information.
>
> The plan is to have the material for a talk of about twice the length
> of the slot! I don't know how much people will know who come, so I'm
> aiming to do a show of hands at the start, then decide which
> slides/sections to skip based on experience of the audience. Extra
> details available in the slides for anyone interested :)
>
> Cheers
> Nick



Re: Tika talk next week - help needed!

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 16 May 2017, Eric Pugh wrote:
> It was great to read through 
> http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf… 
> Wow there is a lot in Tika.
>
> And I think that might be the one challenge with the talk structure, 
> there is SOO much information.

The plan is to have the material for a talk of about twice the length of 
the slot! I don't know how much people will know who come, so I'm aiming 
to do a show of hands at the start, then decide which slides/sections to 
skip based on experience of the audience. Extra details available in the 
slides for anyone interested :)

Cheers
Nick

Re: Tika talk next week - help needed!

Posted by Eric Pugh <ep...@opensourceconnections.com>.
Nick,

It was great to read through http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf…    Wow there is a lot in Tika.

And I think that might be the one challenge with the talk structure, there is SOO much information.

I think I’d like to see “How does Tika actually architected” to support so many amazing use cases.    If this talk is meant for folks who don’t already know a lot about the project, then they might get overwhelmed with the long lists, such as all the file types it can handle.   Maybe change some of them to “here is an eye chart of logos, don’t actually read it” and consolidate some pages.




Eric

> On May 16, 2017, at 10:38 AM, Thamme Gowda <th...@apache.org> wrote:
> 
> Nick,
> Here are some pointers:
> 1. Image recognition using Tensorflow:
> https://wiki.apache.org/tika/TikaAndVision; Link to Paper:
> https://memex.jpl.nasa.gov/MFSEC17.pdf
> 2. Image Recognition using Deeplearning4j -
> https://wiki.apache.org/tika/TikaAndVisionDL4J
> 3. Sentiment Analysis using OpenNLP: https://github.com/apache/tika/pull/169
> 4. Video labeling using tensorflow image rec:
> https://wiki.apache.org/tika/TikaAndVisionVideo
> 5.  Named Entity Extraction using OpenNLP and CoreNLP:
> https://wiki.apache.org/tika/TikaAndNER
> 
> *Coming soon (Work in progress):*
> 6. Image Captioning (Image-to-Text) https://github.com/apache/tika/pull/180
> 
> Cheers,
> -Thamme
> 
> *--*
> *Thamme Gowda*
> TG | @thammegowda <https://twitter.com/thammegowda>
> ~Sent via somebody's Webmail server!
> 
> On Tue, May 16, 2017 at 6:59 AM, Chris Mattmann <ma...@apache.org> wrote:
> 
>> Yep, literally take a look at the Tika wiki – there are examples a plenty
>> and even
>> screen shots. Further, if you look at the MEMEX site under our new
>> publications
>> section, there are a few examples (like the ICMR paper on forensics) that
>> show it
>> in action.
>> 
>> http://memex.jpl.nasa.gov/#publications
>> 
>> 
>> 
>> On 5/16/17, 6:21 AM, "Konstantin Gribov" <gr...@gmail.com> wrote:
>> 
>>    IIRC, image and video labeling basic support was added (Chris & Thamme
>>    could you elaborate on that, please), TSD (TIKA-2309, time stamped data
>>    envelope format) support, slf4j migration (ongoing on 2.x branch).
>> 
>>    вт, 16 мая 2017 г. в 16:06, Allison, Timothy B. <ta...@mitre.org>:
>> 
>>> Doh!  Sorry for the delay...might add configuration of
>> EncodingDetectors,
>>> but that's probably too far into the weeds?
>>> 
>>> -----Original Message-----
>>> From: Nick Burch [mailto:nick@apache.org]
>>> Sent: Sunday, May 14, 2017 11:34 AM
>>> To: dev@tika.apache.org
>>> Subject: Tika talk next week - help needed!
>>> 
>>> Hi All
>>> 
>>> Last year in Seville, I gave a talk on Tika entitled "Apache Tika -
>> What’s
>>> new with 2.0?". For ApacheCon Miami next week, I've been roped into
>> giving
>>> an updated version...
>>> 
>>> https://apachecon2017.sched.com/event/9zvD/apache-tika-
>> whats-new-with-20-nick-burch-apache-software-foundation
>>> 
>>> My slides from Seville are available at:
>>> 
>>> http://events.linuxfoundation.org/sites/events/files/slides/
>> WhatsNewWithApacheTika_1.pdf
>>> 
>>> Beyond updating the list of releases and parsers, and the slide
>>> background, what should I change?
>>> 
>>> Maybe some more on Tika eval? More details on some of the NLP /
>> Entity
>>> Recognition / Image Recoginition stuff? Some screenshots of that
>> stuff?
>>> More on translation? Something else?
>>> 
>>> Ideas greatly appreciated! Good screenshots even more so :)
>>> 
>>> Cheers
>>> Nick
>>> 
>>    --
>> 
>>    Best regards,
>>    Konstantin Gribov
>> 
>> 
>> 
>> 


_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Re: Tika talk next week - help needed!

Posted by Thamme Gowda <th...@apache.org>.
Nick,
 Here are some pointers:
1. Image recognition using Tensorflow:
https://wiki.apache.org/tika/TikaAndVision; Link to Paper:
https://memex.jpl.nasa.gov/MFSEC17.pdf
2. Image Recognition using Deeplearning4j -
https://wiki.apache.org/tika/TikaAndVisionDL4J
3. Sentiment Analysis using OpenNLP: https://github.com/apache/tika/pull/169
4. Video labeling using tensorflow image rec:
https://wiki.apache.org/tika/TikaAndVisionVideo
5.  Named Entity Extraction using OpenNLP and CoreNLP:
https://wiki.apache.org/tika/TikaAndNER

*Coming soon (Work in progress):*
6. Image Captioning (Image-to-Text) https://github.com/apache/tika/pull/180

Cheers,
-Thamme

*--*
*Thamme Gowda*
TG | @thammegowda <https://twitter.com/thammegowda>
~Sent via somebody's Webmail server!

On Tue, May 16, 2017 at 6:59 AM, Chris Mattmann <ma...@apache.org> wrote:

> Yep, literally take a look at the Tika wiki – there are examples a plenty
> and even
> screen shots. Further, if you look at the MEMEX site under our new
> publications
> section, there are a few examples (like the ICMR paper on forensics) that
> show it
> in action.
>
> http://memex.jpl.nasa.gov/#publications
>
>
>
> On 5/16/17, 6:21 AM, "Konstantin Gribov" <gr...@gmail.com> wrote:
>
>     IIRC, image and video labeling basic support was added (Chris & Thamme
>     could you elaborate on that, please), TSD (TIKA-2309, time stamped data
>     envelope format) support, slf4j migration (ongoing on 2.x branch).
>
>     вт, 16 мая 2017 г. в 16:06, Allison, Timothy B. <ta...@mitre.org>:
>
>     > Doh!  Sorry for the delay...might add configuration of
> EncodingDetectors,
>     > but that's probably too far into the weeds?
>     >
>     > -----Original Message-----
>     > From: Nick Burch [mailto:nick@apache.org]
>     > Sent: Sunday, May 14, 2017 11:34 AM
>     > To: dev@tika.apache.org
>     > Subject: Tika talk next week - help needed!
>     >
>     > Hi All
>     >
>     > Last year in Seville, I gave a talk on Tika entitled "Apache Tika -
> What’s
>     > new with 2.0?". For ApacheCon Miami next week, I've been roped into
> giving
>     > an updated version...
>     >
>     > https://apachecon2017.sched.com/event/9zvD/apache-tika-
> whats-new-with-20-nick-burch-apache-software-foundation
>     >
>     > My slides from Seville are available at:
>     >
>     > http://events.linuxfoundation.org/sites/events/files/slides/
> WhatsNewWithApacheTika_1.pdf
>     >
>     > Beyond updating the list of releases and parsers, and the slide
>     > background, what should I change?
>     >
>     > Maybe some more on Tika eval? More details on some of the NLP /
> Entity
>     > Recognition / Image Recoginition stuff? Some screenshots of that
> stuff?
>     > More on translation? Something else?
>     >
>     > Ideas greatly appreciated! Good screenshots even more so :)
>     >
>     > Cheers
>     > Nick
>     >
>     --
>
>     Best regards,
>     Konstantin Gribov
>
>
>
>

Re: Tika talk next week - help needed!

Posted by Chris Mattmann <ma...@apache.org>.
Yep, literally take a look at the Tika wiki – there are examples a plenty and even 
screen shots. Further, if you look at the MEMEX site under our new publications
section, there are a few examples (like the ICMR paper on forensics) that show it
in action.

http://memex.jpl.nasa.gov/#publications 



On 5/16/17, 6:21 AM, "Konstantin Gribov" <gr...@gmail.com> wrote:

    IIRC, image and video labeling basic support was added (Chris & Thamme
    could you elaborate on that, please), TSD (TIKA-2309, time stamped data
    envelope format) support, slf4j migration (ongoing on 2.x branch).
    
    вт, 16 мая 2017 г. в 16:06, Allison, Timothy B. <ta...@mitre.org>:
    
    > Doh!  Sorry for the delay...might add configuration of EncodingDetectors,
    > but that's probably too far into the weeds?
    >
    > -----Original Message-----
    > From: Nick Burch [mailto:nick@apache.org]
    > Sent: Sunday, May 14, 2017 11:34 AM
    > To: dev@tika.apache.org
    > Subject: Tika talk next week - help needed!
    >
    > Hi All
    >
    > Last year in Seville, I gave a talk on Tika entitled "Apache Tika - What’s
    > new with 2.0?". For ApacheCon Miami next week, I've been roped into giving
    > an updated version...
    >
    > https://apachecon2017.sched.com/event/9zvD/apache-tika-whats-new-with-20-nick-burch-apache-software-foundation
    >
    > My slides from Seville are available at:
    >
    > http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf
    >
    > Beyond updating the list of releases and parsers, and the slide
    > background, what should I change?
    >
    > Maybe some more on Tika eval? More details on some of the NLP / Entity
    > Recognition / Image Recoginition stuff? Some screenshots of that stuff?
    > More on translation? Something else?
    >
    > Ideas greatly appreciated! Good screenshots even more so :)
    >
    > Cheers
    > Nick
    >
    -- 
    
    Best regards,
    Konstantin Gribov
    



Re: Tika talk next week - help needed!

Posted by Konstantin Gribov <gr...@gmail.com>.
IIRC, image and video labeling basic support was added (Chris & Thamme
could you elaborate on that, please), TSD (TIKA-2309, time stamped data
envelope format) support, slf4j migration (ongoing on 2.x branch).

вт, 16 мая 2017 г. в 16:06, Allison, Timothy B. <ta...@mitre.org>:

> Doh!  Sorry for the delay...might add configuration of EncodingDetectors,
> but that's probably too far into the weeds?
>
> -----Original Message-----
> From: Nick Burch [mailto:nick@apache.org]
> Sent: Sunday, May 14, 2017 11:34 AM
> To: dev@tika.apache.org
> Subject: Tika talk next week - help needed!
>
> Hi All
>
> Last year in Seville, I gave a talk on Tika entitled "Apache Tika - What’s
> new with 2.0?". For ApacheCon Miami next week, I've been roped into giving
> an updated version...
>
> https://apachecon2017.sched.com/event/9zvD/apache-tika-whats-new-with-20-nick-burch-apache-software-foundation
>
> My slides from Seville are available at:
>
> http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf
>
> Beyond updating the list of releases and parsers, and the slide
> background, what should I change?
>
> Maybe some more on Tika eval? More details on some of the NLP / Entity
> Recognition / Image Recoginition stuff? Some screenshots of that stuff?
> More on translation? Something else?
>
> Ideas greatly appreciated! Good screenshots even more so :)
>
> Cheers
> Nick
>
-- 

Best regards,
Konstantin Gribov

RE: Tika talk next week - help needed!

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Doh!  Sorry for the delay...might add configuration of EncodingDetectors, but that's probably too far into the weeds?

-----Original Message-----
From: Nick Burch [mailto:nick@apache.org] 
Sent: Sunday, May 14, 2017 11:34 AM
To: dev@tika.apache.org
Subject: Tika talk next week - help needed!

Hi All

Last year in Seville, I gave a talk on Tika entitled "Apache Tika - What’s new with 2.0?". For ApacheCon Miami next week, I've been roped into giving an updated version...
https://apachecon2017.sched.com/event/9zvD/apache-tika-whats-new-with-20-nick-burch-apache-software-foundation

My slides from Seville are available at:
http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf

Beyond updating the list of releases and parsers, and the slide background, what should I change?

Maybe some more on Tika eval? More details on some of the NLP / Entity Recognition / Image Recoginition stuff? Some screenshots of that stuff? 
More on translation? Something else?

Ideas greatly appreciated! Good screenshots even more so :)

Cheers
Nick

RE: Tika talk next week - help needed!

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Will send some ideas in the next few hours. 

-----Original Message-----
From: Nick Burch [mailto:nick@apache.org] 
Sent: Sunday, May 14, 2017 11:34 AM
To: dev@tika.apache.org
Subject: Tika talk next week - help needed!

Hi All

Last year in Seville, I gave a talk on Tika entitled "Apache Tika - What’s new with 2.0?". For ApacheCon Miami next week, I've been roped into giving an updated version...
https://apachecon2017.sched.com/event/9zvD/apache-tika-whats-new-with-20-nick-burch-apache-software-foundation

My slides from Seville are available at:
http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf

Beyond updating the list of releases and parsers, and the slide background, what should I change?

Maybe some more on Tika eval? More details on some of the NLP / Entity Recognition / Image Recoginition stuff? Some screenshots of that stuff? 
More on translation? Something else?

Ideas greatly appreciated! Good screenshots even more so :)

Cheers
Nick