You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ni...@apache.org on 2014/09/02 14:06:02 UTC
svn commit: r1621968 - in /tika/site/src/site/apt: 1.6/formats.apt
1.7/formats.apt
Author: nick
Date: Tue Sep 2 12:06:01 2014
New Revision: 1621968
URL: http://svn.apache.org/r1621968
Log:
Update the supported formats documentation
Modified:
tika/site/src/site/apt/1.6/formats.apt
tika/site/src/site/apt/1.7/formats.apt
Modified: tika/site/src/site/apt/1.6/formats.apt
URL: http://svn.apache.org/viewvc/tika/site/src/site/apt/1.6/formats.apt?rev=1621968&r1=1621967&r2=1621968&view=diff
==============================================================================
--- tika/site/src/site/apt/1.6/formats.apt (original)
+++ tika/site/src/site/apt/1.6/formats.apt Tue Sep 2 12:06:01 2014
@@ -86,6 +86,9 @@ Supported Document Formats
supports the Electronic Publication Format (EPUB) used for many digital
books.
+ The {{{./api/org/apache/tika/parser/xml/FictionBookParser.html}FictionBookParser}} class
+ supports the xml-based Fiction Book publishing format.
+
* {Rich Text Format}
The {{{./api/org/apache/tika/parser/rtf/RTFParser.html}RTFParser}} class
@@ -115,6 +118,9 @@ Supported Document Formats
The {{{./api/org/apache/tika/parser/feed/FeedParser.html}FeedParser}} class
supports the RSS and Atom feed syndication formats.
+ The {{{./api/org/apache/tika/parser/iptc/IptcAnpaParser.html}IptcAnpaParser}} class
+ supports the IPTC ANPA News Wire feed format.
+
* {Help formats}
The {{{./api/org/apache/tika/parser/chm/ChmParser.html}ChmParser}} class
@@ -188,6 +194,10 @@ Supported Document Formats
extract email messages from the mbox format used by many email archives
and Unix-style mailboxes.
+ The {{{./api/org/apache/tika/parser/mail/RFC822Parser.html}RFC822Parser}} can
+ process single email messages in the RFC 822 format used by many email clients
+ in their archives / exports.
+
The {{{./api/org/apache/tika/parser/mbox/PSTParser.html}PSDParser}} can
extract email messages from the Microsoft Outlook PST email format.
@@ -203,6 +213,17 @@ Supported Document Formats
The {{{./api/org/apache/tika/parser/font/AdobeFontMetricParser.html}AdobeFontMetricParser}}
class does something similar for Adobe Font Metrics files.
+* {Scientific formats}
+
+ The {{{./api/org/apache/tika/parser/hdf/HDFParser.html}HDFParser}}
+ is able to extract attribute metadata from the HDF scientific file format.
+
+ The {{{./api/org/apache/tika/parser/netcdf/NetCDFParser.html}NetCDFParser}}
+ is able to extract attribute metadata from the NetCDF scientific file format.
+
+ The {{{./api/org/apache/tika/parser/mat/MatParser.html}MatParser}}
+ is able to extract attribute metadata from the Matlab scientific file format.
+
* {Executable programs and libraries}
The {{{./api/org/apache/tika/parser/executable/ExecutableParser.html}ExecutableParser}} can
@@ -210,6 +231,12 @@ Supported Document Formats
of executable formats and libraries, such as Windows Executables and Linux / BSD
programs and libraries.
+* {Crypto formats}
+
+ The {{{./api/org/apache/tika/parser/crypto/Pkcs7Parser.html}Pkcs7Parser}} is able to
+ parse the contents of PKCS7 signed messages, but doesn't include any information from
+ the outer PKCS7 wrapper.
+
Full list of supported formats:
* org.apache.tika.parser.asm.{{{./api/org/apache/tika/parser/asm/ClassParser}ClassParser}}
@@ -350,6 +377,10 @@ Full list of supported formats:
* message/rfc822
+ * org.apache.tika.parser.mat.{{{./api/org/apache/tika/parser/mat/MatParser}MatParser}}
+
+ * application/x-matlab-data
+
* org.apache.tika.parser.mbox.{{{./api/org/apache/tika/parser/mbox/MboxParser}MboxParser}}
* application/mbox
Modified: tika/site/src/site/apt/1.7/formats.apt
URL: http://svn.apache.org/viewvc/tika/site/src/site/apt/1.7/formats.apt?rev=1621968&r1=1621967&r2=1621968&view=diff
==============================================================================
--- tika/site/src/site/apt/1.7/formats.apt (original)
+++ tika/site/src/site/apt/1.7/formats.apt Tue Sep 2 12:06:01 2014
@@ -86,6 +86,9 @@ Supported Document Formats
supports the Electronic Publication Format (EPUB) used for many digital
books.
+ The {{{./api/org/apache/tika/parser/xml/FictionBookParser.html}FictionBookParser}} class
+ supports the xml-based Fiction Book publishing format.
+
* {Rich Text Format}
The {{{./api/org/apache/tika/parser/rtf/RTFParser.html}RTFParser}} class
@@ -115,6 +118,9 @@ Supported Document Formats
The {{{./api/org/apache/tika/parser/feed/FeedParser.html}FeedParser}} class
supports the RSS and Atom feed syndication formats.
+ The {{{./api/org/apache/tika/parser/iptc/IptcAnpaParser.html}IptcAnpaParser}} class
+ supports the IPTC ANPA News Wire feed format.
+
* {Help formats}
The {{{./api/org/apache/tika/parser/chm/ChmParser.html}ChmParser}} class
@@ -188,6 +194,10 @@ Supported Document Formats
extract email messages from the mbox format used by many email archives
and Unix-style mailboxes.
+ The {{{./api/org/apache/tika/parser/mail/RFC822Parser.html}RFC822Parser}} can
+ process single email messages in the RFC 822 format used by many email clients
+ in their archives / exports.
+
The {{{./api/org/apache/tika/parser/mbox/PSTParser.html}PSDParser}} can
extract email messages from the Microsoft Outlook PST email format.
@@ -203,6 +213,17 @@ Supported Document Formats
The {{{./api/org/apache/tika/parser/font/AdobeFontMetricParser.html}AdobeFontMetricParser}}
class does something similar for Adobe Font Metrics files.
+* {Scientific formats}
+
+ The {{{./api/org/apache/tika/parser/hdf/HDFParser.html}HDFParser}}
+ is able to extract attribute metadata from the HDF scientific file format.
+
+ The {{{./api/org/apache/tika/parser/netcdf/NetCDFParser.html}NetCDFParser}}
+ is able to extract attribute metadata from the NetCDF scientific file format.
+
+ The {{{./api/org/apache/tika/parser/mat/MatParser.html}MatParser}}
+ is able to extract attribute metadata from the Matlab scientific file format.
+
* {Executable programs and libraries}
The {{{./api/org/apache/tika/parser/executable/ExecutableParser.html}ExecutableParser}} can
@@ -210,6 +231,12 @@ Supported Document Formats
of executable formats and libraries, such as Windows Executables and Linux / BSD
programs and libraries.
+* {Crypto formats}
+
+ The {{{./api/org/apache/tika/parser/crypto/Pkcs7Parser.html}Pkcs7Parser}} is able to
+ parse the contents of PKCS7 signed messages, but doesn't include any information from
+ the outer PKCS7 wrapper.
+
Full list of supported formats:
TODO Populate this at release time