You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ju...@apache.org on 2009/04/28 00:28:00 UTC

svn commit: r769189 - /lucene/tika/trunk/src/site/apt/formats.apt

Author: jukka
Date: Mon Apr 27 22:28:00 2009
New Revision: 769189

URL: http://svn.apache.org/viewvc?rev=769189&view=rev
Log:
Improved documentation about support for audio formats

Modified:
    lucene/tika/trunk/src/site/apt/formats.apt

Modified: lucene/tika/trunk/src/site/apt/formats.apt
URL: http://svn.apache.org/viewvc/lucene/tika/trunk/src/site/apt/formats.apt?rev=769189&r1=769188&r2=769189&view=diff
==============================================================================
--- lucene/tika/trunk/src/site/apt/formats.apt (original)
+++ lucene/tika/trunk/src/site/apt/formats.apt Mon Apr 27 22:28:00 2009
@@ -189,6 +189,39 @@
     and <<<*.bz2>>>, and they will respectively be replaced with <<<*.tar>>>,
     <<<*.tar>>>, <<<*>>> and <<<*>>> as described above.
 
+* Audio formats
+
+   Tika can detect several common audio formats and extract metadata
+   from them. Text extraction is supported for some MIDI-based karaoke
+   formats that contain the lyrics of the encoded audio.
+
+   See {{{https://issues.apache.org/jira/browse/TIKA-94}TIKA-94}} for
+   an effort to integrate speech recognition support to Tika.
+
+   [MP3 Audio (audio/mpeg)]
+    The parsing of {{{http://www.id3.org/ID3v1}ID3v1}} tags from MP3 files
+    was added in Tika version 0.2. If found the following metadata is
+    extracted and set:
+
+      * <<<TITLE>>> Title
+
+      * <<<SUBJECT>>> Subject
+
+    The above information, as well as the <<<Album>>>, <<<Track>>>,
+    <<<Year>>>, <<<Genre>>> and additional <<<Comment>>> are extracted
+    when set in the file.
+
+   [MIDI audio (audio/midi)]
+    Tika uses the MIDI support in <<<javax.audio.midi>>> to parse MIDI
+    sequence files. Many karaoke file formats are based on MIDI, and
+    contain lyrics as embedded text tracks that Tika knows how to extract.
+    Support for MIDI files was added in Tika 0.3.
+
+   [Wave audio (audio/basic)]
+    Tika supports sampled wave audio (.wav files, etc.) using the
+    <<<javax.audio.sampled>>> package. Only sampling metadata is extracted.
+    Support for sampled wave audio was added in Tika 0.3. 
+
 * Other supported formats
 
    [Extensible Markup Language (application/xml)]
@@ -211,17 +244,6 @@
     The parsing of Java JAR archives is performed using a combination of the ZIP and Java class file parsers.
     Support for Java JAR archives was added in Tika 0.2.
 
-   [MP3 Audio (audio/mp3)]
-    The parsing of {{{http://www.id3.org/ID3v1}ID3v1}} tags from MP3 files was added in Tika version 0.2.
-    If found the following metadata is extracted and set:
-
-      * <<<TITLE>>> Title
-
-      * <<<SUBJECT>>> Subject
-
-    The above information, as well as the <<<Album>>>, <<<Track>>>, <<<Year>>>, <<<Genre>>>
-    and additional <<<Comment>>> are extracted when set in the file.
-
    [OpenDocument (application/vnd.oasis.opendocument.*)]
     TODO