You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ni...@apache.org on 2015/08/01 23:07:08 UTC

svn commit: r1693764 - in /tika/site: publish/1.10/formats.html src/site/apt/1.10/formats.apt

Author: nick
Date: Sat Aug  1 21:07:08 2015
New Revision: 1693764

URL: http://svn.apache.org/r1693764
Log:
List more supported parsers

Modified:
    tika/site/publish/1.10/formats.html
    tika/site/src/site/apt/1.10/formats.apt

Modified: tika/site/publish/1.10/formats.html
URL: http://svn.apache.org/viewvc/tika/site/publish/1.10/formats.html?rev=1693764&r1=1693763&r2=1693764&view=diff
==============================================================================
--- tika/site/publish/1.10/formats.html (original)
+++ tika/site/publish/1.10/formats.html Sat Aug  1 21:07:08 2015
@@ -113,7 +113,8 @@
 <li><a href="#Font_formats">Font formats</a></li>
 <li><a href="#Scientific_formats">Scientific formats</a></li>
 <li><a href="#Executable_programs_and_libraries">Executable programs and libraries</a></li>
-<li><a href="#Crypto_formats">Crypto formats</a></li></ul></li></ul>
+<li><a href="#Crypto_formats">Crypto formats</a></li>
+<li><a href="#Database_formats">Database formats</a></li></ul></li></ul>
 <div class="section">
 <h3><a name="HyperText_Markup_Language">HyperText Markup Language</a></h3>
 <p>The HyperText Markup Language (HTML) is the lingua franca of the web. Tika uses the <a class="externalLink" href="http://home.ccil.org/~cowan/XML/tagsoup/">TagSoup</a> library to support virtually any kind of HTML found on the web. The output from the <a href="./api/org/apache/tika/parser/html/HtmlParser.html">HtmlParser</a> class is guaranteed to be well-formed and valid XHTML, and various heuristics are used to prevent things like inline scripts from cluttering the extracted text content.</p></div>
@@ -200,7 +201,11 @@
 <p>The <a href="./api/org/apache/tika/parser/executable/ExecutableParser.html">ExecutableParser</a> can extract metadata information on platforms, architectures and types from a range of executable formats and libraries, such as Windows Executables and Linux / BSD programs and libraries.</p></div>
 <div class="section">
 <h3><a name="Crypto_formats">Crypto formats</a></h3>
-<p>The <a href="./api/org/apache/tika/parser/crypto/Pkcs7Parser.html">Pkcs7Parser</a> is able to parse the contents of PKCS7 signed messages, but doesn't include any information from the outer PKCS7 wrapper.</p></div></div>
+<p>The <a href="./api/org/apache/tika/parser/crypto/Pkcs7Parser.html">Pkcs7Parser</a> is able to parse the contents of PKCS7 signed messages, but doesn't include any information from the outer PKCS7 wrapper.</p></div>
+<div class="section">
+<h3><a name="Database_formats">Database formats</a></h3>
+<p>The <a href="./api/org/apache/tika/parser/jdbc/SQLite3Parser.html">SQLite3Parser</a> is able to extract content from SQLite3 files, in a tabular form. However, it requires that the <a href="#org.xerial_sqlite-jdbc_jar"></a> is manually added to the classpath first, as that binary jar isn't shipped as standard.</p>
+<p>The <a href="./api/org/apache/tika/parser/microsoft/JackcessParser.html">JackcessParser</a> is able to extract metadata and content in a tabular form, from Microsoft Access database files.</p></div></div>
 <div class="section">
 <h2>Full list of supported formats:<a name="Full_list_of_supported_formats:"></a></h2>
 <p>TODO Populate this at release time</p></div>

Modified: tika/site/src/site/apt/1.10/formats.apt
URL: http://svn.apache.org/viewvc/tika/site/src/site/apt/1.10/formats.apt?rev=1693764&r1=1693763&r2=1693764&view=diff
==============================================================================
--- tika/site/src/site/apt/1.10/formats.apt (original)
+++ tika/site/src/site/apt/1.10/formats.apt Sat Aug  1 21:07:08 2015
@@ -286,6 +286,17 @@ Supported Document Formats
    parse the contents of PKCS7 signed messages, but doesn't include any information from
    the outer PKCS7 wrapper.
 
+* {Database formats}
+
+   The {{{./api/org/apache/tika/parser/jdbc/SQLite3Parser.html}SQLite3Parser}} is able to
+   extract content from SQLite3 files, in a tabular form. However, it requires that the
+   {{{org.xerial sqlite-jdbc jar}}} is manually added to the classpath first, as that
+   binary jar isn't shipped as standard.
+
+   The {{{./api/org/apache/tika/parser/microsoft/JackcessParser.html}JackcessParser}} is 
+   able to extract metadata and content in a tabular form, from Microsoft Access 
+   database files.
+
 Full list of supported formats:
 
    TODO Populate this at release time