You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by dm...@apache.org on 2008/11/17 23:24:46 UTC

svn commit: r718414 - /lucene/tika/trunk/src/site/apt/formats.apt

Author: dmeikle
Date: Mon Nov 17 14:24:46 2008
New Revision: 718414

URL: http://svn.apache.org/viewvc?rev=718414&view=rev
Log:
Updated formats page to finish some todos on supported formats

Modified:
    lucene/tika/trunk/src/site/apt/formats.apt

Modified: lucene/tika/trunk/src/site/apt/formats.apt
URL: http://svn.apache.org/viewvc/lucene/tika/trunk/src/site/apt/formats.apt?rev=718414&r1=718413&r2=718414&view=diff
==============================================================================
--- lucene/tika/trunk/src/site/apt/formats.apt (original)
+++ lucene/tika/trunk/src/site/apt/formats.apt Mon Nov 17 14:24:46 2008
@@ -192,22 +192,35 @@
 * Other supported formats
 
    [Extensible Markup Language (application/xml)]
-    TODO
+    Tika uses the <<<javax.xml>>> classes to parse Extensible Markup Language files.
+    Support for Extensible Markup Language files was added in Tika 0.1.
 
    [HyperText Markup Language (text/html)]
-    TODO
+    Tika uses the {{{http://sourceforge.net/projects/nekohtml}CyberNeko}} library to parse HyperText Markup Language files.
+    Support for HyperText Markup Language files was added in Tika 0.1.
 
    [Images (image/*)]
-    TODO
+    Tika uses the <<<javax.imageio>>> classes to extract Metadata from Image files.
+    Support for Image files was added in Tika 0.2.
 
    [Java class files]
-    TODO
+    The parsing of Java Class files is based on the asm library and work by Dave Brosius in JCR-1522.
+    Support for Java Class files was added in Tika 0.2.
 
    [Java jar archives]
-    TODO
+    The parsing of Java JAR archives is performed using a combination of the ZIP and Java class file parsers.
+    Support for Java JAR archives was added in Tika 0.2.
 
    [MP3 Audio (audio/mp3)]
-    TODO
+    The parsing of {{{http://www.id3.org/ID3v1}ID3v1}} tags from MP3 files was added in Tika version 0.2.
+    If found the following metadata is extracted and set:
+
+      * <<<TITLE>>> Title
+
+      * <<<SUBJECT>>> Subject
+
+    The above information, as well as the <<<Album>>>, <<<Track>>>, <<<Year>>>, <<<Genre>>>
+    and additional <<<Comment>>> are extracted when set in the file.
 
    [OpenDocument (application/vnd.oasis.opendocument.*)]
     TODO
@@ -256,4 +269,5 @@
     Support for tar archives was added in Tika 0.2.
 
    [ZIP archive (application/zip)]
-    TODO
+    Tika uses Java's built-in Zip classes to parse ZIP files.
+    Support for ZIP was added in Tika 0.2.