You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2021/11/22 19:42:47 UTC

svn commit: r1895260 [25/25] - in /tika/site: publish/1.10/ publish/1.11/ publish/1.12/ publish/1.13/ publish/1.14/ publish/1.15/ publish/1.16/ publish/1.17/ publish/1.18/ publish/1.19.1/ publish/1.19/ publish/1.20/ publish/1.21/ publish/1.22/ publish/...

Modified: tika/site/publish/2.1.0/examples.html
URL: http://svn.apache.org/viewvc/tika/site/publish/2.1.0/examples.html?rev=1895260&r1=1895259&r2=1895260&view=diff
==============================================================================
--- tika/site/publish/2.1.0/examples.html (original)
+++ tika/site/publish/2.1.0/examples.html Mon Nov 22 19:42:46 2021
@@ -116,23 +116,23 @@
 <p>The <a href="./api/org/apache/tika/Tika.html">Tika facade</a>, provides a number of very quick and easy ways to have your content parsed by Tika, and return the resulting plain text</p><style type="text/css">
    @import url('attached-includes/css/shCoreDefault.css');
 </style>
-<div id="highlighter_49141" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number54 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseToStringExample() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number55 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Tika tika = </code><code class="java keyword">new</code> <code class="java plain">Tika();</code></div><div class="line number56 index2 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ParsingExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><co
 de class="java plain">)) {</code></div><div class="line number57 index3 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">tika.parseToString(stream);</code></div><div class="line number58 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number59 index5 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<div id="highlighter_186641" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number54 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseToStringExample() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number55 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Tika tika = </code><code class="java keyword">new</code> <code class="java plain">Tika();</code></div><div class="line number56 index2 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ParsingExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><c
 ode class="java plain">)) {</code></div><div class="line number57 index3 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">tika.parseToString(stream);</code></div><div class="line number58 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number59 index5 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Parsing_using_the_Auto-Detect_Parser">Parsing using the Auto-Detect Parser</a></h4>
-<p>For more control, you can call the <a href="./api/org/apache/tika/parser/Parser.html">Tika Parsers</a> directly. Most likely, you'll want to start out using the <a href="./api/org/apache/tika/parser/AutoDetectParser.html">Auto-Detect Parser</a>, which automatically figures out what kind of content you have, then calls the appropriate parser for you.</p><div id="highlighter_985755" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number85 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseExample() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number86 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java pla
 in">AutoDetectParser();</code></div><div class="line number87 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">BodyContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler();</code></div><div class="line number88 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number89 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ParsingExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number90 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nb
 sp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number91 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number92 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number93 index8 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>For more control, you can call the <a href="./api/org/apache/tika/parser/Parser.html">Tika Parsers</a> directly. Most likely, you'll want to start out using the <a href="./api/org/apache/tika/parser/AutoDetectParser.html">Auto-Detect Parser</a>, which automatically figures out what kind of content you have, then calls the appropriate parser for you.</p><div id="highlighter_995619" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number85 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseExample() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number86 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java pla
 in">AutoDetectParser();</code></div><div class="line number87 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">BodyContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler();</code></div><div class="line number88 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number89 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ParsingExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number90 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nb
 sp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number91 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number92 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number93 index8 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Picking_different_output_formats">Picking different output formats</a></h3>
 <p>With Tika, you can get the textual content of your files returned in a number of different formats. These can be plain text, html, xhtml, xhtml of one part of the file etc. This is controlled based on the <a class="externalLink" href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html">ContentHandler</a> you supply to the Parser.</p>
 <div class="section">
 <h4><a name="Parsing_to_Plain_Text">Parsing to Plain Text</a></h4>
-<p>By using the <a href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>, you can request that Tika return only the content of the document's body as a plain-text string.</p><div id="highlighter_512554" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number47 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseToPlainText() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number48 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">BodyContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler();</code></div><div class="line number49 index2 alt2">&nbsp;</div><div class="line number50 index3 alt1"><code class="java space
 s">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number51 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number52 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number53 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</c
 ode></div><div class="line number54 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number55 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number56 index9 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>By using the <a href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>, you can request that Tika return only the content of the document's body as a plain-text string.</p><div id="highlighter_826156" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number47 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseToPlainText() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number48 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">BodyContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler();</code></div><div class="line number49 index2 alt2">&nbsp;</div><div class="line number50 index3 alt1"><code class="java space
 s">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number51 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number52 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number53 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</c
 ode></div><div class="line number54 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number55 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number56 index9 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Parsing_to_XHTML">Parsing to XHTML</a></h4>
-<p>By using the <a href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>, you can get the XHTML content of the whole document as a string.</p><div id="highlighter_271587" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number61 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number62 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler();</code></div><div class="line number63 index2 alt2">&nbsp;</div><div class="line number64 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><cod
 e class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number65 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number66 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number67 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number68 in
 dex7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number69 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number70 index9 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div>
-<p>If you just want the body of the xhtml document, without the header, you can chain together a <a href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a> and a <a href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a> as shown:</p><div id="highlighter_961138" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number76 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseBodyToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number77 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler(</code></div><div class="line number78 index2 alt
 1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler());</code></div><div class="line number79 index3 alt2">&nbsp;</div><div class="line number80 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number81 index5 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number82 index6 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code 
 class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number83 index7 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number84 index8 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number85 index9 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number86 index10 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>By using the <a href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>, you can get the XHTML content of the whole document as a string.</p><div id="highlighter_227566" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number61 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number62 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler();</code></div><div class="line number63 index2 alt2">&nbsp;</div><div class="line number64 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><cod
 e class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number65 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number66 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number67 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number68 in
 dex7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number69 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number70 index9 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div>
+<p>If you just want the body of the xhtml document, without the header, you can chain together a <a href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a> and a <a href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a> as shown:</p><div id="highlighter_155846" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number76 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseBodyToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number77 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler(</code></div><div class="line number78 index2 alt
 1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler());</code></div><div class="line number79 index3 alt2">&nbsp;</div><div class="line number80 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number81 index5 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number82 index6 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code 
 class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number83 index7 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number84 index8 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number85 index9 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number86 index10 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Fetching_just_certain_bits_of_the_XHTML">Fetching just certain bits of the XHTML</a></h4>
-<p>It possible to execute XPath queries on the parse results, to fetch only certain bits of the XHTML. </p><div id="highlighter_317301" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number92 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseOnePartToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number93 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// Only get things under html -> body -> div (class=header)</code></div><div class="line number94 index2 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">XPathParser xhtmlParser = </code><code class="java keyword">new</code> <code class="java plain">XPathParser(</code><code class="java string">
 "xhtml"</code><code class="java plain">, XHTMLContentHandler.XHTML);</code></div><div class="line number95 index3 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Matcher divContentMatcher = xhtmlParser.parse(</code><code class="java string">"/xhtml:html/xhtml:body/xhtml:div/descendant::node()"</code><code class="java plain">);</code></div><div class="line number96 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">MatchingContentHandler(</code></div><div class="line number97 index5 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler(), divContentMatcher);</code></div><div class="line number98 index6 alt1">&nbsp;</div><div class="line number99 index7 alt2"><code class=
 "java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number100 index8 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number101 index9 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test2.doc"</code><code class="java plain">)) {</code></div><div class="line number102 index10 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handle
 r, metadata);</code></div><div class="line number103 index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number104 index12 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number105 index13 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>It possible to execute XPath queries on the parse results, to fetch only certain bits of the XHTML. </p><div id="highlighter_411972" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number92 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseOnePartToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number93 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// Only get things under html -> body -> div (class=header)</code></div><div class="line number94 index2 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">XPathParser xhtmlParser = </code><code class="java keyword">new</code> <code class="java plain">XPathParser(</code><code class="java string">
 "xhtml"</code><code class="java plain">, XHTMLContentHandler.XHTML);</code></div><div class="line number95 index3 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Matcher divContentMatcher = xhtmlParser.parse(</code><code class="java string">"/xhtml:html/xhtml:body/xhtml:div/descendant::node()"</code><code class="java plain">);</code></div><div class="line number96 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">MatchingContentHandler(</code></div><div class="line number97 index5 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler(), divContentMatcher);</code></div><div class="line number98 index6 alt1">&nbsp;</div><div class="line number99 index7 alt2"><code class=
 "java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number100 index8 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number101 index9 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test2.doc"</code><code class="java plain">)) {</code></div><div class="line number102 index10 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handle
 r, metadata);</code></div><div class="line number103 index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number104 index12 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number105 index13 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Custom_Content_Handlers">Custom Content Handlers</a></h3>
 <p>The textual output of parsing a file with Tika is returned via the SAX <a class="externalLink" href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html">ContentHandler</a> you pass to the parse method. It is possible to customise your parsing by supplying your own ContentHandler which does special things.</p>
@@ -141,16 +141,16 @@
 <p>By using the <a href="./api/org/apache/tika/sax/PhoneExtractingContentHandler.html">PhoneExtractingContentHandler</a>, you can have any phone numbers found in the textual content of the document extracted and placed into the Metadata object for you.</p></div>
 <div class="section">
 <h4><a name="Streaming_the_plain_text_in_chunks">Streaming the plain text in chunks</a></h4>
-<p>Sometimes, you want to chunk the resulting text up, perhaps to output as you go minimising memory use, perhaps to output to HDFS files, or any other reason! With a small custom content handler, you can do that.</p><div id="highlighter_554019" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number113 index0 alt2"><code class="java keyword">public</code> <code class="java plain">List&lt;String> parseToPlainTextChunks() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number114 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">final</code> <code class="java plain">List&lt;String> chunks = </code><code class="java keyword">new</code> <code class="java plain">ArrayList&lt;>();</code></div><div class="line number115 index2 alt2"><code c
 lass="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.add(</code><code class="java string">""</code><code class="java plain">);</code></div><div class="line number116 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandlerDecorator handler = </code><code class="java keyword">new</code> <code class="java plain">ContentHandlerDecorator() {</code></div><div class="line number117 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java color1">@Override</code></div><div class="line number118 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">public</code> <code class="java keyword">void</code> <code class="java plain">characters(</code><code class="java keyword">char</code><code class="java plain">[] ch, </code><code class="java keyword">int</code> <code class="java plain">start, </c
 ode><code class="java keyword">int</code> <code class="java plain">length) {</code></div><div class="line number119 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">String lastChunk = chunks.get(chunks.size() - </code><code class="java value">1</code><code class="java plain">);</code></div><div class="line number120 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">String thisStr = </code><code class="java keyword">new</code> <code class="java plain">String(ch, start, length);</code></div><div class="line number121 index8 alt2">&nbsp;</div><div class="line number122 index9 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">if</code> <code class="java plain">(lastChunk.length() + length > MAXIMUM_TEXT_CHUNK_SIZE) {
 </code></div><div class="line number123 index10 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.add(thisStr);</code></div><div class="line number124 index11 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code class="java keyword">else</code> <code class="java plain">{</code></div><div class="line number125 index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.set(chunks.size() - </code><code class="java value">1</code><code class="java plain">, lastChunk + thisStr);</code></div><div class="line number126 index13 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</c
 ode></div><div class="line number127 index14 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number128 index15 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">};</code></div><div class="line number129 index16 alt2">&nbsp;</div><div class="line number130 index17 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number131 index18 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number132 index19 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class
 ="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test2.doc"</code><code class="java plain">)) {</code></div><div class="line number133 index20 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number134 index21 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">chunks;</code></div><div class="line number135 index22 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number136 index23 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>Sometimes, you want to chunk the resulting text up, perhaps to output as you go minimising memory use, perhaps to output to HDFS files, or any other reason! With a small custom content handler, you can do that.</p><div id="highlighter_826360" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number113 index0 alt2"><code class="java keyword">public</code> <code class="java plain">List&lt;String> parseToPlainTextChunks() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number114 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">final</code> <code class="java plain">List&lt;String> chunks = </code><code class="java keyword">new</code> <code class="java plain">ArrayList&lt;>();</code></div><div class="line number115 index2 alt2"><code c
 lass="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.add(</code><code class="java string">""</code><code class="java plain">);</code></div><div class="line number116 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandlerDecorator handler = </code><code class="java keyword">new</code> <code class="java plain">ContentHandlerDecorator() {</code></div><div class="line number117 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java color1">@Override</code></div><div class="line number118 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">public</code> <code class="java keyword">void</code> <code class="java plain">characters(</code><code class="java keyword">char</code><code class="java plain">[] ch, </code><code class="java keyword">int</code> <code class="java plain">start, </c
 ode><code class="java keyword">int</code> <code class="java plain">length) {</code></div><div class="line number119 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">String lastChunk = chunks.get(chunks.size() - </code><code class="java value">1</code><code class="java plain">);</code></div><div class="line number120 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">String thisStr = </code><code class="java keyword">new</code> <code class="java plain">String(ch, start, length);</code></div><div class="line number121 index8 alt2">&nbsp;</div><div class="line number122 index9 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">if</code> <code class="java plain">(lastChunk.length() + length > MAXIMUM_TEXT_CHUNK_SIZE) {
 </code></div><div class="line number123 index10 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.add(thisStr);</code></div><div class="line number124 index11 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code class="java keyword">else</code> <code class="java plain">{</code></div><div class="line number125 index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.set(chunks.size() - </code><code class="java value">1</code><code class="java plain">, lastChunk + thisStr);</code></div><div class="line number126 index13 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</c
 ode></div><div class="line number127 index14 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number128 index15 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">};</code></div><div class="line number129 index16 alt2">&nbsp;</div><div class="line number130 index17 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number131 index18 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number132 index19 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class
 ="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test2.doc"</code><code class="java plain">)) {</code></div><div class="line number133 index20 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number134 index21 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">chunks;</code></div><div class="line number135 index22 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number136 index23 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Translation">Translation</a></h3>
 <p>Tika provides a pluggable Translation system, which allow you to send the results of parsing off to an external system or program to have the text translated into another language.</p>
 <div class="section">
 <h4><a name="Translation_using_the_Microsoft_Translation_API">Translation using the Microsoft Translation API</a></h4>
-<p>In order to use the Microsoft Translation API, you need to sign up for a Microsoft account, get an API key, then pass the key to Tika before translating.</p><div id="highlighter_685646" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number23 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String microsoftTranslateToFrench(String text) {</code></div><div class="line number24 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">MicrosoftTranslator translator = </code><code class="java keyword">new</code> <code class="java plain">MicrosoftTranslator();</code></div><div class="line number25 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// Change the id and secret! See <a href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.">http://msdn.microso
 ft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line number26 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">translator.setId(</code><code class="java string">"dummy-id"</code><code class="java plain">);</code></div><div class="line number27 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">translator.setSecret(</code><code class="java string">"dummy-secret"</code><code class="java plain">);</code></div><div class="line number28 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">{</code></div><div class="line number29 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">translator.translate(text, </code><code class="java string">"fr"</code><code class="java plain">);</code></div><div class=
 "line number30 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code class="java keyword">catch</code> <code class="java plain">(Exception e) {</code></div><div class="line number31 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java string">"Error while translating."</code><code class="java plain">;</code></div><div class="line number32 index9 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number33 index10 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>In order to use the Microsoft Translation API, you need to sign up for a Microsoft account, get an API key, then pass the key to Tika before translating.</p><div id="highlighter_646746" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number23 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String microsoftTranslateToFrench(String text) {</code></div><div class="line number24 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">MicrosoftTranslator translator = </code><code class="java keyword">new</code> <code class="java plain">MicrosoftTranslator();</code></div><div class="line number25 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// Change the id and secret! See <a href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.">http://msdn.microso
 ft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line number26 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">translator.setId(</code><code class="java string">"dummy-id"</code><code class="java plain">);</code></div><div class="line number27 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">translator.setSecret(</code><code class="java string">"dummy-secret"</code><code class="java plain">);</code></div><div class="line number28 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">{</code></div><div class="line number29 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">translator.translate(text, </code><code class="java string">"fr"</code><code class="java plain">);</code></div><div class=
 "line number30 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code class="java keyword">catch</code> <code class="java plain">(Exception e) {</code></div><div class="line number31 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java string">"Error while translating."</code><code class="java plain">;</code></div><div class="line number32 index9 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number33 index10 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Language_Identification">Language Identification</a></h3>
-<p>Tika provides support for identifying the language of text, through the <a href="./api/org/apache/tika/language/LanguageIdentifier.html">LanguageIdentifier</a> class.</p><div id="highlighter_228729" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number23 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String identifyLanguage(String text) {</code></div><div class="line number24 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">LanguageIdentifier identifier = </code><code class="java keyword">new</code> <code class="java plain">LanguageIdentifier(text);</code></div><div class="line number25 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">identifier.getLanguage();</code></div><div class="line number26 index3 alt
 1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>Tika provides support for identifying the language of text, through the <a href="./api/org/apache/tika/language/LanguageIdentifier.html">LanguageIdentifier</a> class.</p><div id="highlighter_206803" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number23 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String identifyLanguage(String text) {</code></div><div class="line number24 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">LanguageIdentifier identifier = </code><code class="java keyword">new</code> <code class="java plain">LanguageIdentifier(text);</code></div><div class="line number25 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">identifier.getLanguage();</code></div><div class="line number26 index3 alt
 1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h3><a name="Additional_Examples">Additional Examples</a></h3>
 <p>A number of other examples are also available, including all of the examples from the <a class="externalLink" href="http://manning.com/mattmann/">Tika In Action book</a>. These can all be found in the <a class="externalLink" href="https://svn.apache.org/repos/asf/tika/trunk/tika-example">Tika Example module</a> in SVN.</p></div></div>

Modified: tika/site/publish/2.1.0/gettingstarted.html
URL: http://svn.apache.org/viewvc/tika/site/publish/2.1.0/gettingstarted.html?rev=1895260&r1=1895259&r2=1895260&view=diff
==============================================================================
--- tika/site/publish/2.1.0/gettingstarted.html (original)
+++ tika/site/publish/2.1.0/gettingstarted.html Mon Nov 22 19:42:46 2021
@@ -101,48 +101,48 @@
 <dl>
 <dt>tika-core/target/tika-core-*.jar</dt>
 <dd> Tika core library. Contains the core interfaces and classes of Tika, but none of the parser implementations.</dd>
-<dt>tika-parsers/target/tika-parsers-*.jar</dt>
-<dd> Tika parsers. Collection of classes that implement the Tika Parser interface based on various external parser libraries.</dd>
+<dt>tika-parsers/tika-parsers-standard/tika-parsers-standard-package/target/tika-parsers-standard-package-*.jar</dt>
+<dd> Tika parsers. Collection of classes that implement the Tika Parser interface based on various external parser libraries. This includes the most commonly used parsers. Users may want to add <tt>tika-parser-sqlite3-package</tt> and <tt>tika-parser-scientific-package</tt> or other parser modules.</dd>
 <dt>tika-app/target/tika-app-*.jar</dt>
-<dd> Tika application. Combines the above components and all the external parser libraries into a single runnable jar with a GUI and a command line interface.</dd>
-<dt>tika-server/target/tika-server-*.jar</dt>
-<dd> Tika JAX-RS REST application. This is a Jetty web server running Tika REST services as described in <a class="externalLink" href="https://cwiki.apache.org/confluence/display/TIKA/TikaServer">this page</a>.</dd>
-<dt>tika-bundle/target/tika-bundle-*.jar</dt>
+<dd> Tika application. Combines the above components and the standard parser libraries into a single runnable jar with a GUI and a command line interface.</dd>
+<dt>tika-server/tika-server-standard/target/tika-server-standard-*.jar</dt>
+<dd> Tika JAX-RS REST application. This is a Jetty web server running Tika REST services with the parsers in tika-parsers-standard-package as described in <a class="externalLink" href="https://cwiki.apache.org/confluence/display/TIKA/TikaServer">this page</a>.</dd>
+<dt>tika-bundles/tika-bundle-standard/target/tika-bundle-standard-*.jar</dt>
 <dd> Tika bundle. An OSGi bundle that combines tika-parsers with non-OSGified parser libraries to make them easy to deploy in an OSGi environment.</dd>
-<dt>tika-eval/target/tika-eval-*.jar</dt>
+<dt>tika-eval/tika-eval-app/target/tika-eval-app-*.jar</dt>
 <dd> Tika eval module. Commandline tool to assess the output of Tika or compare the output of two different versions of Tika or other text extraction packages.</dd></dl></div>
 <div class="section">
 <h2><a name="Using_Tika_as_a_Maven_dependency"></a>Using Tika as a Maven dependency</h2>
-<p>The core library, <tt> tika-core </tt>, contains the key interfaces and classes of Tika and can be used by itself if you don't need the full set of parsers from the <tt> tika-parsers </tt> component. The tika-core dependency looks like this:</p>
+<p>The core library, <tt>tika-core</tt>, contains the key interfaces and classes of Tika and can be used by itself if you don't need the full set of parsers from the <tt> tika-parsers </tt> component. The tika-core dependency looks like this:</p>
 <div>
 <pre>  &lt;dependency&gt;
     &lt;groupId&gt;org.apache.tika&lt;/groupId&gt;
     &lt;artifactId&gt;tika-core&lt;/artifactId&gt;
-    &lt;version&gt;2.0.0&lt;/version&gt;
+    &lt;version&gt;2.1.0&lt;/version&gt;
   &lt;/dependency&gt;</pre></div>
-<p>If you want to use Tika to parse documents (instead of simply detecting document types, etc.), you'll want to depend on <tt> tika-parsers </tt> instead: </p>
+<p>If you want to use Tika to parse documents (instead of simply detecting document types, etc.), you'll want to depend on <tt> tika-parsers-standard-package </tt> instead:</p>
 <div>
 <pre>  &lt;dependency&gt;
     &lt;groupId&gt;org.apache.tika&lt;/groupId&gt;
-    &lt;artifactId&gt;tika-parsers&lt;/artifactId&gt;
-    &lt;version&gt;2.0.0&lt;/version&gt;
+    &lt;artifactId&gt;tika-parsers-standard-package&lt;/artifactId&gt;
+    &lt;version&gt;2.1.0&lt;/version&gt;
   &lt;/dependency&gt;</pre></div>
 <p>Note that adding this dependency will introduce a number of transitive dependencies to your project, including one on tika-core. You need to make sure that these dependencies won't conflict with your existing project dependencies. You can use the following command in the tika-parsers directory to get a full listing of all the dependencies.</p>
 <div>
 <pre>$ mvn dependency:tree | grep :compile</pre></div></div>
 <div class="section">
 <h2><a name="Using_Tika_in_a_Gradle-built_project"></a>Using Tika in a Gradle-built project</h2>
-<p>To add a dependency on Apache Tika to your Gradle built project, including the full set of parsers, you should depend on the <tt> tika-parsers </tt> artifact:</p>
+<p>To add a dependency on Apache Tika to your Gradle built project, including the full set of parsers, you should depend on the <tt> tika-parsers-standard-package </tt> artifact:</p>
 <div>
 <pre>dependencies {
-    runtime 'org.apache.tika:tika-parsers:2.0.0'
+    runtime 'org.apache.tika:tika-parsers-standard-package:2.1.0'
 }</pre></div></div>
 <div class="section">
 <h2><a name="Using_Tika_in_an_Ant_project"></a>Using Tika in an Ant project</h2>
 <p>If you are using <a class="externalLink" href="http://ant.apache.org/ivy/">Apache Ivy</a> as your dependency manager tool with Ant, then to include Tika with the full set of parsers, you should depend on the <tt> tika-parsers </tt> artifact like this:</p>
 <div>
 <pre>    &lt;dependencies&gt;
-        &lt;dependency org=&quot;org.apache.tika&quot; name=&quot;tika-parsers&quot; rev=&quot;2.0.0&quot;/&gt;
+        &lt;dependency org=&quot;org.apache.tika&quot; name=&quot;tika-parsers-standard-package&quot; rev=&quot;2.1.0&quot;/&gt;
     &lt;/dependencies&gt;</pre></div>
 <p>Otherwise, probably the easiest way to use Tika is to include the full <tt> tika-app </tt> jar on your classpath. For just core functionality, you can add the <tt> tika-core </tt> jar, but be aware that the full set of parsers have a large number of dependencies which must be included which is very fiddly to do by hand with Ant! To include Tika in your Ant project, you should do something like:</p>
 <div>
@@ -236,12 +236,6 @@ Description:
     a normal file explorer to the GUI window to extract
     text content and metadata from the files.
 
-- Server mode
-
-    Use the &quot;--server&quot; (or &quot;-s&quot;) option to start the
-    Apache Tika server. The server will listen to the
-    ports you specify as one or more arguments.
-
 - Batch mode
 
     Simplest method.

Modified: tika/site/src/site/apt/2.1.0/gettingstarted.apt
URL: http://svn.apache.org/viewvc/tika/site/src/site/apt/2.1.0/gettingstarted.apt?rev=1895260&r1=1895259&r2=1895260&view=diff
==============================================================================
--- tika/site/src/site/apt/2.1.0/gettingstarted.apt (original)
+++ tika/site/src/site/apt/2.1.0/gettingstarted.apt Mon Nov 22 19:42:46 2021
@@ -52,24 +52,27 @@ Build artifacts
   Tika core library. Contains the core interfaces and classes of Tika,
   but none of the parser implementations.
 
- [tika-parsers/target/tika-parsers-*.jar]
+ [tika-parsers/tika-parsers-standard/tika-parsers-standard-package/target/tika-parsers-standard-package-*.jar]
   Tika parsers. Collection of classes that implement the Tika Parser
-  interface based on various external parser libraries.
+  interface based on various external parser libraries. This includes
+  the most commonly used parsers.  Users may want to add <<<tika-parser-sqlite3-package>>>
+  and <<<tika-parser-scientific-package>>> or other parser modules.
 
  [tika-app/target/tika-app-*.jar]
-  Tika application. Combines the above components and all the external
+  Tika application. Combines the above components and the standard
   parser libraries into a single runnable jar with a GUI and a command
   line interface.
 
- [tika-server/target/tika-server-*.jar]
+ [tika-server/tika-server-standard/target/tika-server-standard-*.jar]
   Tika JAX-RS REST application. This is a Jetty web server running Tika
-  REST services as described in {{{https://cwiki.apache.org/confluence/display/TIKA/TikaServer}this page}}.
+  REST services with the parsers in tika-parsers-standard-package
+  as described in {{{https://cwiki.apache.org/confluence/display/TIKA/TikaServer}this page}}.
 
- [tika-bundle/target/tika-bundle-*.jar]
+ [tika-bundles/tika-bundle-standard/target/tika-bundle-standard-*.jar]
   Tika bundle. An OSGi bundle that combines tika-parsers with non-OSGified
   parser libraries to make them easy to deploy in an OSGi environment.
 
- [tika-eval/target/tika-eval-*.jar]
+ [tika-eval/tika-eval-app/target/tika-eval-app-*.jar]
   Tika eval module. Commandline tool to assess the output of Tika
   or compare the output of two different versions of Tika or
   other text extraction packages.
@@ -78,7 +81,7 @@ Build artifacts
 
 Using Tika as a Maven dependency
 
- The core library, <<< tika-core >>>, contains the key interfaces and classes
+ The core library, <<<tika-core>>>, contains the key interfaces and classes
  of Tika and can be used by itself if you don't need the full set of parsers 
  from the <<< tika-parsers >>> component. The tika-core dependency looks like 
  this:
@@ -87,18 +90,18 @@ Using Tika as a Maven dependency
   <dependency>
     <groupId>org.apache.tika</groupId>
     <artifactId>tika-core</artifactId>
-    <version>2.0.0</version>
+    <version>2.1.0</version>
   </dependency>
 ---
 
  If you want to use Tika to parse documents (instead  of simply detecting
- document types, etc.), you'll want to depend on <<< tika-parsers >>> instead: 
+ document types, etc.), you'll want to depend on <<< tika-parsers-standard-package >>> instead:
 
 ---
   <dependency>
     <groupId>org.apache.tika</groupId>
-    <artifactId>tika-parsers</artifactId>
-    <version>2.0.0</version>
+    <artifactId>tika-parsers-standard-package</artifactId>
+    <version>2.1.0</version>
   </dependency>
 ---
 
@@ -116,11 +119,11 @@ Using Tika in a Gradle-built project
 
  To add a dependency on Apache Tika to your Gradle built project,
  including the full set of parsers, you should depend on the
- <<< tika-parsers >>> artifact:
+ <<< tika-parsers-standard-package >>> artifact:
 
 ---
 dependencies {
-    runtime 'org.apache.tika:tika-parsers:2.0.0'
+    runtime 'org.apache.tika:tika-parsers-standard-package:2.1.0'
 }
 ---
 
@@ -132,7 +135,7 @@ Using Tika in an Ant project
 
 ---
     <dependencies>
-        <dependency org="org.apache.tika" name="tika-parsers" rev="2.0.0"/>
+        <dependency org="org.apache.tika" name="tika-parsers-standard-package" rev="2.1.0"/>
     </dependencies>
 ---
 
@@ -241,12 +244,6 @@ Description:
     a normal file explorer to the GUI window to extract
     text content and metadata from the files.
 
-- Server mode
-
-    Use the "--server" (or "-s") option to start the
-    Apache Tika server. The server will listen to the
-    ports you specify as one or more arguments.
-
 - Batch mode
 
     Simplest method.