You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ju...@apache.org on 2009/04/28 00:28:00 UTC
svn commit: r769189 - /lucene/tika/trunk/src/site/apt/formats.apt
Author: jukka
Date: Mon Apr 27 22:28:00 2009
New Revision: 769189
URL: http://svn.apache.org/viewvc?rev=769189&view=rev
Log:
Improved documentation about support for audio formats
Modified:
lucene/tika/trunk/src/site/apt/formats.apt
Modified: lucene/tika/trunk/src/site/apt/formats.apt
URL: http://svn.apache.org/viewvc/lucene/tika/trunk/src/site/apt/formats.apt?rev=769189&r1=769188&r2=769189&view=diff
==============================================================================
--- lucene/tika/trunk/src/site/apt/formats.apt (original)
+++ lucene/tika/trunk/src/site/apt/formats.apt Mon Apr 27 22:28:00 2009
@@ -189,6 +189,39 @@
and <<<*.bz2>>>, and they will respectively be replaced with <<<*.tar>>>,
<<<*.tar>>>, <<<*>>> and <<<*>>> as described above.
+* Audio formats
+
+ Tika can detect several common audio formats and extract metadata
+ from them. Text extraction is supported for some MIDI-based karaoke
+ formats that contain the lyrics of the encoded audio.
+
+ See {{{https://issues.apache.org/jira/browse/TIKA-94}TIKA-94}} for
+ an effort to integrate speech recognition support to Tika.
+
+ [MP3 Audio (audio/mpeg)]
+ The parsing of {{{http://www.id3.org/ID3v1}ID3v1}} tags from MP3 files
+ was added in Tika version 0.2. If found the following metadata is
+ extracted and set:
+
+ * <<<TITLE>>> Title
+
+ * <<<SUBJECT>>> Subject
+
+ The above information, as well as the <<<Album>>>, <<<Track>>>,
+ <<<Year>>>, <<<Genre>>> and additional <<<Comment>>> are extracted
+ when set in the file.
+
+ [MIDI audio (audio/midi)]
+ Tika uses the MIDI support in <<<javax.audio.midi>>> to parse MIDI
+ sequence files. Many karaoke file formats are based on MIDI, and
+ contain lyrics as embedded text tracks that Tika knows how to extract.
+ Support for MIDI files was added in Tika 0.3.
+
+ [Wave audio (audio/basic)]
+ Tika supports sampled wave audio (.wav files, etc.) using the
+ <<<javax.audio.sampled>>> package. Only sampling metadata is extracted.
+ Support for sampled wave audio was added in Tika 0.3.
+
* Other supported formats
[Extensible Markup Language (application/xml)]
@@ -211,17 +244,6 @@
The parsing of Java JAR archives is performed using a combination of the ZIP and Java class file parsers.
Support for Java JAR archives was added in Tika 0.2.
- [MP3 Audio (audio/mp3)]
- The parsing of {{{http://www.id3.org/ID3v1}ID3v1}} tags from MP3 files was added in Tika version 0.2.
- If found the following metadata is extracted and set:
-
- * <<<TITLE>>> Title
-
- * <<<SUBJECT>>> Subject
-
- The above information, as well as the <<<Album>>>, <<<Track>>>, <<<Year>>>, <<<Genre>>>
- and additional <<<Comment>>> are extracted when set in the file.
-
[OpenDocument (application/vnd.oasis.opendocument.*)]
TODO