You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2008/10/07 01:42:44 UTC

[jira] Commented: (TIKA-167) Tika presentation @ ApacheConUs 2008: review

    [ https://issues.apache.org/jira/browse/TIKA-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637302#action_12637302 ] 

Jukka Zitting commented on TIKA-167:
------------------------------------

The presentation looks great!

> What does TIKA mean? (literally)

"Tika" comes from the name of Jérôme Charron's (who first proposed the project in 2006) son.

> How many registered media types; glob/magic header (slide 7)

We currently have 77 registered mime types (plus 36 aliases), 179 glob patterns, and 18 magic patterns.

> How many supported media types (slide 7)

Depends on what you mean by "supported". We currently have 15 parser classes configured for a total of 66 mime types.

> How many committers? (slide 7)

Six, plus Dave Meikle who was just voted in.

> Do we have a download/chackout history for Tika? (eventually slide 10)

Not really. I could try to dig something up if you want, though  I expect the numbers to be fairly low still as we've kept a relatively low profile so far.

> Future goals; to be completed? (slide 31)

Main goals off the top of my head, see the issue tracker for more:

- Improved metadata handling, perhaps with XMP support
- Better configurability of Tika
- Improved media type registry
- More parser implementations

> Next parsers to be implemented? (slide 32)

- Office Open XML based on a POI upgrade
- Structural parsers (i.e. more than just a flat text stream) for PDF, Word, OpenDocument, etc. 
- More multimedia formats: image, audio, video

> Who uses Tika? projects using Tika (slide 33)
> Integration scenarios with other Lucene projects (slide 34)

Not that many now that we're still incubating. Beyond Nutch we have at least Apache Jackrabbit with a sandbox component with Tika support, the Droids lab (to be incubated) that is currently adding Tika integration, and the UIMA project (incubating) that has a proposed patch with Tika support.

> Related projects: others? (slide 34)

Aperture (http://aperture.sourceforge.net/) is another project with similar (though wider) goals.



> Tika presentation @ ApacheConUs 2008: review
> --------------------------------------------
>
>                 Key: TIKA-167
>                 URL: https://issues.apache.org/jira/browse/TIKA-167
>             Project: Tika
>          Issue Type: Task
>          Components: documentation
>    Affects Versions: 0.2-incubating
>            Reporter: Paolo Mottadelli
>         Attachments: ApacheConUS2008_Tika_PaoloMottadelli.pdf
>
>
> As I have not been involved in the development process, it would be great if someone could review the Tika part of my presentation. I am attaching a rough version of my slides concerning the Tika presentation and listing some *** Open Points ***. Please, let me know if I am out of scope in some parts and if I can get better anyhow.
> *** Open Points: ***
>    * What does TIKA mean? (literally)
>    * How many registered media types; glob/magic header (slide 7)
>    * How many supported media types (slide 7)
>    * How many committers? (slide 7)
>    * Do we have a download/chackout history for Tika? (eventually slide 10)
>    * Future goals; to be completed?  (slide 31)
>    * Next parsers to be implemented? (slide 32)
>    * Who uses Tika? projects using Tika  (slide 33)
>    * Integration scenarios with other Lucene projects (slide 34)
>    * Related projects: others? (slide 34)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.