You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2008/10/07 01:42:44 UTC
[jira] Commented: (TIKA-167) Tika presentation @ ApacheConUs 2008:
review
[ https://issues.apache.org/jira/browse/TIKA-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637302#action_12637302 ]
Jukka Zitting commented on TIKA-167:
------------------------------------
The presentation looks great!
> What does TIKA mean? (literally)
"Tika" comes from the name of Jérôme Charron's (who first proposed the project in 2006) son.
> How many registered media types; glob/magic header (slide 7)
We currently have 77 registered mime types (plus 36 aliases), 179 glob patterns, and 18 magic patterns.
> How many supported media types (slide 7)
Depends on what you mean by "supported". We currently have 15 parser classes configured for a total of 66 mime types.
> How many committers? (slide 7)
Six, plus Dave Meikle who was just voted in.
> Do we have a download/chackout history for Tika? (eventually slide 10)
Not really. I could try to dig something up if you want, though I expect the numbers to be fairly low still as we've kept a relatively low profile so far.
> Future goals; to be completed? (slide 31)
Main goals off the top of my head, see the issue tracker for more:
- Improved metadata handling, perhaps with XMP support
- Better configurability of Tika
- Improved media type registry
- More parser implementations
> Next parsers to be implemented? (slide 32)
- Office Open XML based on a POI upgrade
- Structural parsers (i.e. more than just a flat text stream) for PDF, Word, OpenDocument, etc.
- More multimedia formats: image, audio, video
> Who uses Tika? projects using Tika (slide 33)
> Integration scenarios with other Lucene projects (slide 34)
Not that many now that we're still incubating. Beyond Nutch we have at least Apache Jackrabbit with a sandbox component with Tika support, the Droids lab (to be incubated) that is currently adding Tika integration, and the UIMA project (incubating) that has a proposed patch with Tika support.
> Related projects: others? (slide 34)
Aperture (http://aperture.sourceforge.net/) is another project with similar (though wider) goals.
> Tika presentation @ ApacheConUs 2008: review
> --------------------------------------------
>
> Key: TIKA-167
> URL: https://issues.apache.org/jira/browse/TIKA-167
> Project: Tika
> Issue Type: Task
> Components: documentation
> Affects Versions: 0.2-incubating
> Reporter: Paolo Mottadelli
> Attachments: ApacheConUS2008_Tika_PaoloMottadelli.pdf
>
>
> As I have not been involved in the development process, it would be great if someone could review the Tika part of my presentation. I am attaching a rough version of my slides concerning the Tika presentation and listing some *** Open Points ***. Please, let me know if I am out of scope in some parts and if I can get better anyhow.
> *** Open Points: ***
> * What does TIKA mean? (literally)
> * How many registered media types; glob/magic header (slide 7)
> * How many supported media types (slide 7)
> * How many committers? (slide 7)
> * Do we have a download/chackout history for Tika? (eventually slide 10)
> * Future goals; to be completed? (slide 31)
> * Next parsers to be implemented? (slide 32)
> * Who uses Tika? projects using Tika (slide 33)
> * Integration scenarios with other Lucene projects (slide 34)
> * Related projects: others? (slide 34)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.