You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by gr...@apache.org on 2019/04/20 21:04:19 UTC

svn commit: r1857884 [19/20] - in /tika/site/publish: ./ 0.10/ 0.5/ 0.6/ 0.7/ 0.8/ 0.9/ 1.0/ 1.1/ 1.10/ 1.11/ 1.12/ 1.13/ 1.14/ 1.15/ 1.16/ 1.17/ 1.18/ 1.19.1/ 1.19/ 1.2/ 1.20/ 1.3/ 1.4/ 1.5/ 1.6/ 1.7/ 1.8/ 1.9/

Modified: tika/site/publish/1.9/examples.html
URL: http://svn.apache.org/viewvc/tika/site/publish/1.9/examples.html?rev=1857884&r1=1857883&r2=1857884&view=diff
==============================================================================
--- tika/site/publish/1.9/examples.html (original)
+++ tika/site/publish/1.9/examples.html Sat Apr 20 21:04:17 2019
@@ -10,7 +10,7 @@
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+    https://www.apache.org/licenses/LICENSE-2.0
  
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
@@ -75,12 +75,12 @@
   <body onLoad="initProvider();">
     <div id="body">
       <div id="banner">
-        <a href="http://tika.apache.org" id="bannerLeft" title="Apache Tika"
-          ><img src="http://tika.apache.org/tika.png" alt="Apache Tika"
+        <a href="https://tika.apache.org" id="bannerLeft" title="Apache Tika"
+          ><img src="https://tika.apache.org/tika.png" alt="Apache Tika"
                 width="292" height="100"/></a>
-        <a href="http://www.apache.org/" id="bannerRight"
+        <a href="https://www.apache.org/" id="bannerRight"
            title="The Apache Software Foundation"
-          ><img src="http://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
+          ><img src="https://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
                 width="387" height="100"/></a>
       </div>
       <div id="content">
@@ -116,23 +116,23 @@
 <p>The <a href="./api/org/apache/tika/Tika.html">Tika facade</a>, provides a number of very quick and easy ways to have your content parsed by Tika, and return the resulting plain text</p><style type="text/css">
    @import url('attached-includes/css/shCoreDefault.css');
 </style>
-<div id="highlighter_214712" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number54 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseToStringExample() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number55 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Tika tika = </code><code class="java keyword">new</code> <code class="java plain">Tika();</code></div><div class="line number56 index2 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ParsingExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><c
 ode class="java plain">)) {</code></div><div class="line number57 index3 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">tika.parseToString(stream);</code></div><div class="line number58 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number59 index5 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<div id="highlighter_818269" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number54 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseToStringExample() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number55 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Tika tika = </code><code class="java keyword">new</code> <code class="java plain">Tika();</code></div><div class="line number56 index2 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ParsingExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><c
 ode class="java plain">)) {</code></div><div class="line number57 index3 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">tika.parseToString(stream);</code></div><div class="line number58 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number59 index5 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Parsing_using_the_Auto-Detect_Parser">Parsing using the Auto-Detect Parser</a></h4>
-<p>For more control, you can call the <a href="./api/org/apache/tika/parser/Parser.html">Tika Parsers</a> directly. Most likely, you'll want to start out using the <a href="./api/org/apache/tika/parser/AutoDetectParser.html">Auto-Detect Parser</a>, which automatically figures out what kind of content you have, then calls the appropriate parser for you.</p><div id="highlighter_544221" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number85 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseExample() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number86 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java pla
 in">AutoDetectParser();</code></div><div class="line number87 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">BodyContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler();</code></div><div class="line number88 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number89 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ParsingExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number90 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nb
 sp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number91 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number92 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number93 index8 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>For more control, you can call the <a href="./api/org/apache/tika/parser/Parser.html">Tika Parsers</a> directly. Most likely, you'll want to start out using the <a href="./api/org/apache/tika/parser/AutoDetectParser.html">Auto-Detect Parser</a>, which automatically figures out what kind of content you have, then calls the appropriate parser for you.</p><div id="highlighter_634468" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number85 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseExample() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number86 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java pla
 in">AutoDetectParser();</code></div><div class="line number87 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">BodyContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler();</code></div><div class="line number88 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number89 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ParsingExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number90 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nb
 sp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number91 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number92 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number93 index8 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Picking_different_output_formats">Picking different output formats</a></h3>
 <p>With Tika, you can get the textual content of your files returned in a number of different formats. These can be plain text, html, xhtml, xhtml of one part of the file etc. This is controlled based on the <a class="externalLink" href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html">ContentHandler</a> you supply to the Parser.</p>
 <div class="section">
 <h4><a name="Parsing_to_Plain_Text">Parsing to Plain Text</a></h4>
-<p>By using the <a href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>, you can request that Tika return only the content of the document's body as a plain-text string.</p><div id="highlighter_700541" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number47 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseToPlainText() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number48 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">BodyContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler();</code></div><div class="line number49 index2 alt2">&nbsp;</div><div class="line number50 index3 alt1"><code class="java space
 s">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number51 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number52 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number53 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</c
 ode></div><div class="line number54 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number55 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number56 index9 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>By using the <a href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a>, you can request that Tika return only the content of the document's body as a plain-text string.</p><div id="highlighter_973120" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number47 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseToPlainText() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number48 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">BodyContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler();</code></div><div class="line number49 index2 alt2">&nbsp;</div><div class="line number50 index3 alt1"><code class="java space
 s">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number51 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number52 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number53 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</c
 ode></div><div class="line number54 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number55 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number56 index9 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Parsing_to_XHTML">Parsing to XHTML</a></h4>
-<p>By using the <a href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>, you can get the XHTML content of the whole document as a string.</p><div id="highlighter_736820" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number61 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number62 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler();</code></div><div class="line number63 index2 alt2">&nbsp;</div><div class="line number64 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><cod
 e class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number65 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number66 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number67 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number68 in
 dex7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number69 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number70 index9 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div>
-<p>If you just want the body of the xhtml document, without the header, you can chain together a <a href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a> and a <a href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a> as shown:</p><div id="highlighter_690267" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number76 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseBodyToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number77 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler(</code></div><div class="line number78 index2 alt
 1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler());</code></div><div class="line number79 index3 alt2">&nbsp;</div><div class="line number80 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number81 index5 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number82 index6 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code 
 class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number83 index7 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number84 index8 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number85 index9 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number86 index10 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>By using the <a href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a>, you can get the XHTML content of the whole document as a string.</p><div id="highlighter_899572" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number61 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String parseToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number62 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler();</code></div><div class="line number63 index2 alt2">&nbsp;</div><div class="line number64 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><cod
 e class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number65 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number66 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number67 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number68 in
 dex7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number69 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number70 index9 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div>
+<p>If you just want the body of the xhtml document, without the header, you can chain together a <a href="./api/org/apache/tika/sax/BodyContentHandler.html">BodyContentHandler</a> and a <a href="./api/org/apache/tika/sax/ToXMLContentHandler.html">ToXMLContentHandler</a> as shown:</p><div id="highlighter_780612" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number76 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseBodyToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number77 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">BodyContentHandler(</code></div><div class="line number78 index2 alt
 1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler());</code></div><div class="line number79 index3 alt2">&nbsp;</div><div class="line number80 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number81 index5 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number82 index6 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code 
 class="java plain">.getResourceAsStream(</code><code class="java string">"test.doc"</code><code class="java plain">)) {</code></div><div class="line number83 index7 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number84 index8 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number85 index9 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number86 index10 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h4><a name="Fetching_just_certain_bits_of_the_XHTML">Fetching just certain bits of the XHTML</a></h4>
-<p>It possible to execute XPath queries on the parse results, to fetch only certain bits of the XHTML. </p><div id="highlighter_871146" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number92 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseOnePartToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number93 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// Only get things under html -> body -> div (class=header)</code></div><div class="line number94 index2 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">XPathParser xhtmlParser = </code><code class="java keyword">new</code> <code class="java plain">XPathParser(</code><code class="java string">
 "xhtml"</code><code class="java plain">, XHTMLContentHandler.XHTML);</code></div><div class="line number95 index3 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Matcher divContentMatcher = xhtmlParser.parse(</code><code class="java string">"/xhtml:html/xhtml:body/xhtml:div/descendant::node()"</code><code class="java plain">);</code></div><div class="line number96 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">MatchingContentHandler(</code></div><div class="line number97 index5 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler(), divContentMatcher);</code></div><div class="line number98 index6 alt1">&nbsp;</div><div class="line number99 index7 alt2"><code class=
 "java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number100 index8 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number101 index9 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test2.doc"</code><code class="java plain">)) {</code></div><div class="line number102 index10 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handle
 r, metadata);</code></div><div class="line number103 index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number104 index12 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number105 index13 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>It possible to execute XPath queries on the parse results, to fetch only certain bits of the XHTML. </p><div id="highlighter_330701" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number92 index0 alt1"><code class="java keyword">public</code> <code class="java plain">String parseOnePartToHTML() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number93 index1 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// Only get things under html -> body -> div (class=header)</code></div><div class="line number94 index2 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">XPathParser xhtmlParser = </code><code class="java keyword">new</code> <code class="java plain">XPathParser(</code><code class="java string">
 "xhtml"</code><code class="java plain">, XHTMLContentHandler.XHTML);</code></div><div class="line number95 index3 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Matcher divContentMatcher = xhtmlParser.parse(</code><code class="java string">"/xhtml:html/xhtml:body/xhtml:div/descendant::node()"</code><code class="java plain">);</code></div><div class="line number96 index4 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandler handler = </code><code class="java keyword">new</code> <code class="java plain">MatchingContentHandler(</code></div><div class="line number97 index5 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">new</code> <code class="java plain">ToXMLContentHandler(), divContentMatcher);</code></div><div class="line number98 index6 alt1">&nbsp;</div><div class="line number99 index7 alt2"><code class=
 "java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number100 index8 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number101 index9 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test2.doc"</code><code class="java plain">)) {</code></div><div class="line number102 index10 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handle
 r, metadata);</code></div><div class="line number103 index11 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">handler.toString();</code></div><div class="line number104 index12 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number105 index13 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Custom_Content_Handlers">Custom Content Handlers</a></h3>
 <p>The textual output of parsing a file with Tika is returned via the SAX <a class="externalLink" href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html">ContentHandler</a> you pass to the parse method. It is possible to customise your parsing by supplying your own ContentHandler which does special things.</p>
@@ -141,16 +141,16 @@
 <p>By using the <a href="./api/org/apache/tika/sax/PhoneExtractingContentHandler.html">PhoneExtractingContentHandler</a>, you can have any phone numbers found in the textual content of the document extracted and placed into the Metadata object for you.</p></div>
 <div class="section">
 <h4><a name="Streaming_the_plain_text_in_chunks">Streaming the plain text in chunks</a></h4>
-<p>Sometimes, you want to chunk the resulting text up, perhaps to output as you go minimising memory use, perhaps to output to HDFS files, or any other reason! With a small custom content handler, you can do that.</p><div id="highlighter_946282" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number113 index0 alt2"><code class="java keyword">public</code> <code class="java plain">List&lt;String> parseToPlainTextChunks() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number114 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">final</code> <code class="java plain">List&lt;String> chunks = </code><code class="java keyword">new</code> <code class="java plain">ArrayList&lt;>();</code></div><div class="line number115 index2 alt2"><code c
 lass="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.add(</code><code class="java string">""</code><code class="java plain">);</code></div><div class="line number116 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandlerDecorator handler = </code><code class="java keyword">new</code> <code class="java plain">ContentHandlerDecorator() {</code></div><div class="line number117 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java color1">@Override</code></div><div class="line number118 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">public</code> <code class="java keyword">void</code> <code class="java plain">characters(</code><code class="java keyword">char</code><code class="java plain">[] ch, </code><code class="java keyword">int</code> <code class="java plain">start, </c
 ode><code class="java keyword">int</code> <code class="java plain">length) {</code></div><div class="line number119 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">String lastChunk = chunks.get(chunks.size() - </code><code class="java value">1</code><code class="java plain">);</code></div><div class="line number120 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">String thisStr = </code><code class="java keyword">new</code> <code class="java plain">String(ch, start, length);</code></div><div class="line number121 index8 alt2">&nbsp;</div><div class="line number122 index9 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">if</code> <code class="java plain">(lastChunk.length() + length > MAXIMUM_TEXT_CHUNK_SIZE) {
 </code></div><div class="line number123 index10 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.add(thisStr);</code></div><div class="line number124 index11 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code class="java keyword">else</code> <code class="java plain">{</code></div><div class="line number125 index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.set(chunks.size() - </code><code class="java value">1</code><code class="java plain">, lastChunk + thisStr);</code></div><div class="line number126 index13 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</c
 ode></div><div class="line number127 index14 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number128 index15 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">};</code></div><div class="line number129 index16 alt2">&nbsp;</div><div class="line number130 index17 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number131 index18 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number132 index19 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class
 ="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test2.doc"</code><code class="java plain">)) {</code></div><div class="line number133 index20 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number134 index21 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">chunks;</code></div><div class="line number135 index22 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number136 index23 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>Sometimes, you want to chunk the resulting text up, perhaps to output as you go minimising memory use, perhaps to output to HDFS files, or any other reason! With a small custom content handler, you can do that.</p><div id="highlighter_172670" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number113 index0 alt2"><code class="java keyword">public</code> <code class="java plain">List&lt;String> parseToPlainTextChunks() </code><code class="java keyword">throws</code> <code class="java plain">IOException, SAXException, TikaException {</code></div><div class="line number114 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">final</code> <code class="java plain">List&lt;String> chunks = </code><code class="java keyword">new</code> <code class="java plain">ArrayList&lt;>();</code></div><div class="line number115 index2 alt2"><code c
 lass="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.add(</code><code class="java string">""</code><code class="java plain">);</code></div><div class="line number116 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">ContentHandlerDecorator handler = </code><code class="java keyword">new</code> <code class="java plain">ContentHandlerDecorator() {</code></div><div class="line number117 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java color1">@Override</code></div><div class="line number118 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">public</code> <code class="java keyword">void</code> <code class="java plain">characters(</code><code class="java keyword">char</code><code class="java plain">[] ch, </code><code class="java keyword">int</code> <code class="java plain">start, </c
 ode><code class="java keyword">int</code> <code class="java plain">length) {</code></div><div class="line number119 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">String lastChunk = chunks.get(chunks.size() - </code><code class="java value">1</code><code class="java plain">);</code></div><div class="line number120 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">String thisStr = </code><code class="java keyword">new</code> <code class="java plain">String(ch, start, length);</code></div><div class="line number121 index8 alt2">&nbsp;</div><div class="line number122 index9 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">if</code> <code class="java plain">(lastChunk.length() + length > MAXIMUM_TEXT_CHUNK_SIZE) {
 </code></div><div class="line number123 index10 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.add(thisStr);</code></div><div class="line number124 index11 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code class="java keyword">else</code> <code class="java plain">{</code></div><div class="line number125 index12 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">chunks.set(chunks.size() - </code><code class="java value">1</code><code class="java plain">, lastChunk + thisStr);</code></div><div class="line number126 index13 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</c
 ode></div><div class="line number127 index14 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number128 index15 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">};</code></div><div class="line number129 index16 alt2">&nbsp;</div><div class="line number130 index17 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">AutoDetectParser parser = </code><code class="java keyword">new</code> <code class="java plain">AutoDetectParser();</code></div><div class="line number131 index18 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">Metadata metadata = </code><code class="java keyword">new</code> <code class="java plain">Metadata();</code></div><div class="line number132 index19 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class
 ="java plain">(InputStream stream = ContentHandlerExample.</code><code class="java keyword">class</code><code class="java plain">.getResourceAsStream(</code><code class="java string">"test2.doc"</code><code class="java plain">)) {</code></div><div class="line number133 index20 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">parser.parse(stream, handler, metadata);</code></div><div class="line number134 index21 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">chunks;</code></div><div class="line number135 index22 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number136 index23 alt1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Translation">Translation</a></h3>
 <p>Tika provides a pluggable Translation system, which allow you to send the results of parsing off to an external system or program to have the text translated into another language.</p>
 <div class="section">
 <h4><a name="Translation_using_the_Microsoft_Translation_API">Translation using the Microsoft Translation API</a></h4>
-<p>In order to use the Microsoft Translation API, you need to sign up for a Microsoft account, get an API key, then pass the key to Tika before translating.</p><div id="highlighter_149004" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number23 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String microsoftTranslateToFrench(String text) {</code></div><div class="line number24 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">MicrosoftTranslator translator = </code><code class="java keyword">new</code> <code class="java plain">MicrosoftTranslator();</code></div><div class="line number25 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// Change the id and secret! See <a href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.">http://msdn.microso
 ft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line number26 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">translator.setId(</code><code class="java string">"dummy-id"</code><code class="java plain">);</code></div><div class="line number27 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">translator.setSecret(</code><code class="java string">"dummy-secret"</code><code class="java plain">);</code></div><div class="line number28 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">{</code></div><div class="line number29 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">translator.translate(text, </code><code class="java string">"fr"</code><code class="java plain">);</code></div><div class=
 "line number30 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code class="java keyword">catch</code> <code class="java plain">(Exception e) {</code></div><div class="line number31 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java string">"Error while translating."</code><code class="java plain">;</code></div><div class="line number32 index9 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number33 index10 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<p>In order to use the Microsoft Translation API, you need to sign up for a Microsoft account, get an API key, then pass the key to Tika before translating.</p><div id="highlighter_666145" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number23 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String microsoftTranslateToFrench(String text) {</code></div><div class="line number24 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">MicrosoftTranslator translator = </code><code class="java keyword">new</code> <code class="java plain">MicrosoftTranslator();</code></div><div class="line number25 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java comments">// Change the id and secret! See <a href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.">http://msdn.microso
 ft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line number26 index3 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">translator.setId(</code><code class="java string">"dummy-id"</code><code class="java plain">);</code></div><div class="line number27 index4 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">translator.setSecret(</code><code class="java string">"dummy-secret"</code><code class="java plain">);</code></div><div class="line number28 index5 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">try</code> <code class="java plain">{</code></div><div class="line number29 index6 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">translator.translate(text, </code><code class="java string">"fr"</code><code class="java plain">);</code></div><div class=
 "line number30 index7 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">} </code><code class="java keyword">catch</code> <code class="java plain">(Exception e) {</code></div><div class="line number31 index8 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java string">"Error while translating."</code><code class="java plain">;</code></div><div class="line number32 index9 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">}</code></div><div class="line number33 index10 alt2"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div></div>
 <div class="section">
 <h3><a name="Language_Identification">Language Identification</a></h3>
-<p>Tika provides support for identifying the language of text, through the <a href="./api/org/apache/tika/language/LanguageIdentifier.html">LanguageIdentifier</a> class.</p><div id="highlighter_628638" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number23 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String identifyLanguage(String text) {</code></div><div class="line number24 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">LanguageIdentifier identifier = </code><code class="java keyword">new</code> <code class="java plain">LanguageIdentifier(text);</code></div><div class="line number25 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">identifier.getLanguage();</code></div><div class="line number26 index3 alt
 1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
+<p>Tika provides support for identifying the language of text, through the <a href="./api/org/apache/tika/language/LanguageIdentifier.html">LanguageIdentifier</a> class.</p><div id="highlighter_757815" class="syntaxhighlighter nogutter  java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div class="container"><div class="line number23 index0 alt2"><code class="java keyword">public</code> <code class="java plain">String identifyLanguage(String text) {</code></div><div class="line number24 index1 alt1"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java plain">LanguageIdentifier identifier = </code><code class="java keyword">new</code> <code class="java plain">LanguageIdentifier(text);</code></div><div class="line number25 index2 alt2"><code class="java spaces">&nbsp;&nbsp;&nbsp;&nbsp;</code><code class="java keyword">return</code> <code class="java plain">identifier.getLanguage();</code></div><div class="line number26 index3 alt
 1"><code class="java plain">}</code></div></div></td></tr></tbody></table></div></div>
 <div class="section">
 <h3><a name="Additional_Examples">Additional Examples</a></h3>
 <p>A number of other examples are also available, including all of the examples from the <a class="externalLink" href="http://manning.com/mattmann/">Tika In Action book</a>. These can all be found in the <a class="externalLink" href="https://svn.apache.org/repos/asf/tika/trunk/tika-example">Tika Example module</a> in SVN.</p></div></div>
@@ -177,7 +177,7 @@
           </li>
               
     <li class="none">
-                    <a href="http://wiki.apache.org/tika/" class="externalLink">Tika Wiki</a>
+                    <a href="https://cwiki.apache.org/confluence/display/tika" class="externalLink">Tika Wiki</a>
           </li>
               
     <li class="none">
@@ -413,23 +413,23 @@
             <ul>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/" class="externalLink">About</a>
+                    <a href="https://www.apache.org/foundation/" class="externalLink">About</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/licenses/" class="externalLink">License</a>
+                    <a href="https://www.apache.org/licenses/" class="externalLink">License</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/security/" class="externalLink">Security</a>
+                    <a href="https://www.apache.org/security/" class="externalLink">Security</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
+                    <a href="https://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
+                    <a href="https://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
           </li>
           </ul>
       
@@ -460,9 +460,9 @@
       </div>
       <div id="footer">
         <p>
-          Copyright &#169; 2018
-          <a href="http://www.apache.org/">The Apache Software Foundation</a>.
-          Site powered by <a href="http://maven.apache.org/">Apache Maven</a>. 
+          Copyright &#169; 2019
+          <a href="https://www.apache.org/">The Apache Software Foundation</a>.
+          Site powered by <a href="https://maven.apache.org/">Apache Maven</a>. 
           Search powered by
           <a href="http://www.lucidimagination.com">Lucid Imagination</a>
           and <a href="http://sematext.com">Sematext</a>.

Modified: tika/site/publish/1.9/formats.html
URL: http://svn.apache.org/viewvc/tika/site/publish/1.9/formats.html?rev=1857884&r1=1857883&r2=1857884&view=diff
==============================================================================
--- tika/site/publish/1.9/formats.html (original)
+++ tika/site/publish/1.9/formats.html Sat Apr 20 21:04:17 2019
@@ -10,7 +10,7 @@
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+    https://www.apache.org/licenses/LICENSE-2.0
  
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
@@ -75,12 +75,12 @@
   <body onLoad="initProvider();">
     <div id="body">
       <div id="banner">
-        <a href="http://tika.apache.org" id="bannerLeft" title="Apache Tika"
-          ><img src="http://tika.apache.org/tika.png" alt="Apache Tika"
+        <a href="https://tika.apache.org" id="bannerLeft" title="Apache Tika"
+          ><img src="https://tika.apache.org/tika.png" alt="Apache Tika"
                 width="292" height="100"/></a>
-        <a href="http://www.apache.org/" id="bannerRight"
+        <a href="https://www.apache.org/" id="bannerRight"
            title="The Apache Software Foundation"
-          ><img src="http://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
+          ><img src="https://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
                 width="387" height="100"/></a>
       </div>
       <div id="content">
@@ -638,7 +638,7 @@
           </li>
               
     <li class="none">
-                    <a href="http://wiki.apache.org/tika/" class="externalLink">Tika Wiki</a>
+                    <a href="https://cwiki.apache.org/confluence/display/tika" class="externalLink">Tika Wiki</a>
           </li>
               
     <li class="none">
@@ -874,23 +874,23 @@
             <ul>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/" class="externalLink">About</a>
+                    <a href="https://www.apache.org/foundation/" class="externalLink">About</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/licenses/" class="externalLink">License</a>
+                    <a href="https://www.apache.org/licenses/" class="externalLink">License</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/security/" class="externalLink">Security</a>
+                    <a href="https://www.apache.org/security/" class="externalLink">Security</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
+                    <a href="https://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
+                    <a href="https://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
           </li>
           </ul>
       
@@ -921,9 +921,9 @@
       </div>
       <div id="footer">
         <p>
-          Copyright &#169; 2018
-          <a href="http://www.apache.org/">The Apache Software Foundation</a>.
-          Site powered by <a href="http://maven.apache.org/">Apache Maven</a>. 
+          Copyright &#169; 2019
+          <a href="https://www.apache.org/">The Apache Software Foundation</a>.
+          Site powered by <a href="https://maven.apache.org/">Apache Maven</a>. 
           Search powered by
           <a href="http://www.lucidimagination.com">Lucid Imagination</a>
           and <a href="http://sematext.com">Sematext</a>.

Modified: tika/site/publish/1.9/gettingstarted.html
URL: http://svn.apache.org/viewvc/tika/site/publish/1.9/gettingstarted.html?rev=1857884&r1=1857883&r2=1857884&view=diff
==============================================================================
--- tika/site/publish/1.9/gettingstarted.html (original)
+++ tika/site/publish/1.9/gettingstarted.html Sat Apr 20 21:04:17 2019
@@ -10,7 +10,7 @@
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+    https://www.apache.org/licenses/LICENSE-2.0
  
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
@@ -75,12 +75,12 @@
   <body onLoad="initProvider();">
     <div id="body">
       <div id="banner">
-        <a href="http://tika.apache.org" id="bannerLeft" title="Apache Tika"
-          ><img src="http://tika.apache.org/tika.png" alt="Apache Tika"
+        <a href="https://tika.apache.org" id="bannerLeft" title="Apache Tika"
+          ><img src="https://tika.apache.org/tika.png" alt="Apache Tika"
                 width="292" height="100"/></a>
-        <a href="http://www.apache.org/" id="bannerRight"
+        <a href="https://www.apache.org/" id="bannerRight"
            title="The Apache Software Foundation"
-          ><img src="http://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
+          ><img src="https://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
                 width="387" height="100"/></a>
       </div>
       <div id="content">
@@ -243,7 +243,7 @@ curl http://.../document.doc \
           </li>
               
     <li class="none">
-                    <a href="http://wiki.apache.org/tika/" class="externalLink">Tika Wiki</a>
+                    <a href="https://cwiki.apache.org/confluence/display/tika" class="externalLink">Tika Wiki</a>
           </li>
               
     <li class="none">
@@ -479,23 +479,23 @@ curl http://.../document.doc \
             <ul>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/" class="externalLink">About</a>
+                    <a href="https://www.apache.org/foundation/" class="externalLink">About</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/licenses/" class="externalLink">License</a>
+                    <a href="https://www.apache.org/licenses/" class="externalLink">License</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/security/" class="externalLink">Security</a>
+                    <a href="https://www.apache.org/security/" class="externalLink">Security</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
+                    <a href="https://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
+                    <a href="https://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
           </li>
           </ul>
       
@@ -526,9 +526,9 @@ curl http://.../document.doc \
       </div>
       <div id="footer">
         <p>
-          Copyright &#169; 2018
-          <a href="http://www.apache.org/">The Apache Software Foundation</a>.
-          Site powered by <a href="http://maven.apache.org/">Apache Maven</a>. 
+          Copyright &#169; 2019
+          <a href="https://www.apache.org/">The Apache Software Foundation</a>.
+          Site powered by <a href="https://maven.apache.org/">Apache Maven</a>. 
           Search powered by
           <a href="http://www.lucidimagination.com">Lucid Imagination</a>
           and <a href="http://sematext.com">Sematext</a>.

Modified: tika/site/publish/1.9/index.html
URL: http://svn.apache.org/viewvc/tika/site/publish/1.9/index.html?rev=1857884&r1=1857883&r2=1857884&view=diff
==============================================================================
--- tika/site/publish/1.9/index.html (original)
+++ tika/site/publish/1.9/index.html Sat Apr 20 21:04:17 2019
@@ -10,7 +10,7 @@
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+    https://www.apache.org/licenses/LICENSE-2.0
  
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
@@ -75,12 +75,12 @@
   <body onLoad="initProvider();">
     <div id="body">
       <div id="banner">
-        <a href="http://tika.apache.org" id="bannerLeft" title="Apache Tika"
-          ><img src="http://tika.apache.org/tika.png" alt="Apache Tika"
+        <a href="https://tika.apache.org" id="bannerLeft" title="Apache Tika"
+          ><img src="https://tika.apache.org/tika.png" alt="Apache Tika"
                 width="292" height="100"/></a>
-        <a href="http://www.apache.org/" id="bannerRight"
+        <a href="https://www.apache.org/" id="bannerRight"
            title="The Apache Software Foundation"
-          ><img src="http://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
+          ><img src="https://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
                 width="387" height="100"/></a>
       </div>
       <div id="content">
@@ -153,7 +153,7 @@
           </li>
               
     <li class="none">
-                    <a href="http://wiki.apache.org/tika/" class="externalLink">Tika Wiki</a>
+                    <a href="https://cwiki.apache.org/confluence/display/tika" class="externalLink">Tika Wiki</a>
           </li>
               
     <li class="none">
@@ -389,23 +389,23 @@
             <ul>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/" class="externalLink">About</a>
+                    <a href="https://www.apache.org/foundation/" class="externalLink">About</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/licenses/" class="externalLink">License</a>
+                    <a href="https://www.apache.org/licenses/" class="externalLink">License</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/security/" class="externalLink">Security</a>
+                    <a href="https://www.apache.org/security/" class="externalLink">Security</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
+                    <a href="https://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
+                    <a href="https://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
           </li>
           </ul>
       
@@ -436,9 +436,9 @@
       </div>
       <div id="footer">
         <p>
-          Copyright &#169; 2018
-          <a href="http://www.apache.org/">The Apache Software Foundation</a>.
-          Site powered by <a href="http://maven.apache.org/">Apache Maven</a>. 
+          Copyright &#169; 2019
+          <a href="https://www.apache.org/">The Apache Software Foundation</a>.
+          Site powered by <a href="https://maven.apache.org/">Apache Maven</a>. 
           Search powered by
           <a href="http://www.lucidimagination.com">Lucid Imagination</a>
           and <a href="http://sematext.com">Sematext</a>.

Modified: tika/site/publish/1.9/parser.html
URL: http://svn.apache.org/viewvc/tika/site/publish/1.9/parser.html?rev=1857884&r1=1857883&r2=1857884&view=diff
==============================================================================
--- tika/site/publish/1.9/parser.html (original)
+++ tika/site/publish/1.9/parser.html Sat Apr 20 21:04:17 2019
@@ -10,7 +10,7 @@
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+    https://www.apache.org/licenses/LICENSE-2.0
  
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
@@ -75,12 +75,12 @@
   <body onLoad="initProvider();">
     <div id="body">
       <div id="banner">
-        <a href="http://tika.apache.org" id="bannerLeft" title="Apache Tika"
-          ><img src="http://tika.apache.org/tika.png" alt="Apache Tika"
+        <a href="https://tika.apache.org" id="bannerLeft" title="Apache Tika"
+          ><img src="https://tika.apache.org/tika.png" alt="Apache Tika"
                 width="292" height="100"/></a>
-        <a href="http://www.apache.org/" id="bannerRight"
+        <a href="https://www.apache.org/" id="bannerRight"
            title="The Apache Software Foundation"
-          ><img src="http://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
+          ><img src="https://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
                 width="387" height="100"/></a>
       </div>
       <div id="content">
@@ -200,7 +200,7 @@ try {
           </li>
               
     <li class="none">
-                    <a href="http://wiki.apache.org/tika/" class="externalLink">Tika Wiki</a>
+                    <a href="https://cwiki.apache.org/confluence/display/tika" class="externalLink">Tika Wiki</a>
           </li>
               
     <li class="none">
@@ -436,23 +436,23 @@ try {
             <ul>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/" class="externalLink">About</a>
+                    <a href="https://www.apache.org/foundation/" class="externalLink">About</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/licenses/" class="externalLink">License</a>
+                    <a href="https://www.apache.org/licenses/" class="externalLink">License</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/security/" class="externalLink">Security</a>
+                    <a href="https://www.apache.org/security/" class="externalLink">Security</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
+                    <a href="https://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
+                    <a href="https://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
           </li>
           </ul>
       
@@ -483,9 +483,9 @@ try {
       </div>
       <div id="footer">
         <p>
-          Copyright &#169; 2018
-          <a href="http://www.apache.org/">The Apache Software Foundation</a>.
-          Site powered by <a href="http://maven.apache.org/">Apache Maven</a>. 
+          Copyright &#169; 2019
+          <a href="https://www.apache.org/">The Apache Software Foundation</a>.
+          Site powered by <a href="https://maven.apache.org/">Apache Maven</a>. 
           Search powered by
           <a href="http://www.lucidimagination.com">Lucid Imagination</a>
           and <a href="http://sematext.com">Sematext</a>.

Modified: tika/site/publish/1.9/parser_guide.html
URL: http://svn.apache.org/viewvc/tika/site/publish/1.9/parser_guide.html?rev=1857884&r1=1857883&r2=1857884&view=diff
==============================================================================
--- tika/site/publish/1.9/parser_guide.html (original)
+++ tika/site/publish/1.9/parser_guide.html Sat Apr 20 21:04:17 2019
@@ -10,7 +10,7 @@
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+    https://www.apache.org/licenses/LICENSE-2.0
  
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
@@ -75,12 +75,12 @@
   <body onLoad="initProvider();">
     <div id="body">
       <div id="banner">
-        <a href="http://tika.apache.org" id="bannerLeft" title="Apache Tika"
-          ><img src="http://tika.apache.org/tika.png" alt="Apache Tika"
+        <a href="https://tika.apache.org" id="bannerLeft" title="Apache Tika"
+          ><img src="https://tika.apache.org/tika.png" alt="Apache Tika"
                 width="292" height="100"/></a>
-        <a href="http://www.apache.org/" id="bannerRight"
+        <a href="https://www.apache.org/" id="bannerRight"
            title="The Apache Software Foundation"
-          ><img src="http://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
+          ><img src="https://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
                 width="387" height="100"/></a>
       </div>
       <div id="content">
@@ -198,7 +198,7 @@ public class HelloParser extends Abstrac
           </li>
               
     <li class="none">
-                    <a href="http://wiki.apache.org/tika/" class="externalLink">Tika Wiki</a>
+                    <a href="https://cwiki.apache.org/confluence/display/tika" class="externalLink">Tika Wiki</a>
           </li>
               
     <li class="none">
@@ -434,23 +434,23 @@ public class HelloParser extends Abstrac
             <ul>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/" class="externalLink">About</a>
+                    <a href="https://www.apache.org/foundation/" class="externalLink">About</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/licenses/" class="externalLink">License</a>
+                    <a href="https://www.apache.org/licenses/" class="externalLink">License</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/security/" class="externalLink">Security</a>
+                    <a href="https://www.apache.org/security/" class="externalLink">Security</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
+                    <a href="https://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
+                    <a href="https://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
           </li>
           </ul>
       
@@ -481,9 +481,9 @@ public class HelloParser extends Abstrac
       </div>
       <div id="footer">
         <p>
-          Copyright &#169; 2018
-          <a href="http://www.apache.org/">The Apache Software Foundation</a>.
-          Site powered by <a href="http://maven.apache.org/">Apache Maven</a>. 
+          Copyright &#169; 2019
+          <a href="https://www.apache.org/">The Apache Software Foundation</a>.
+          Site powered by <a href="https://maven.apache.org/">Apache Maven</a>. 
           Search powered by
           <a href="http://www.lucidimagination.com">Lucid Imagination</a>
           and <a href="http://sematext.com">Sematext</a>.

Modified: tika/site/publish/contribute.html
URL: http://svn.apache.org/viewvc/tika/site/publish/contribute.html?rev=1857884&r1=1857883&r2=1857884&view=diff
==============================================================================
--- tika/site/publish/contribute.html (original)
+++ tika/site/publish/contribute.html Sat Apr 20 21:04:17 2019
@@ -10,7 +10,7 @@
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+    https://www.apache.org/licenses/LICENSE-2.0
  
   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
@@ -75,16 +75,16 @@
   <body onLoad="initProvider();">
     <div id="body">
       <div id="banner">
-        <a href="http://tika.apache.org" id="bannerLeft" title="Apache Tika"
-          ><img src="http://tika.apache.org/tika.png" alt="Apache Tika"
+        <a href="https://tika.apache.org" id="bannerLeft" title="Apache Tika"
+          ><img src="https://tika.apache.org/tika.png" alt="Apache Tika"
                 width="292" height="100"/></a>
-        <a href="http://www.apache.org/" id="bannerRight"
+        <a href="https://www.apache.org/" id="bannerRight"
            title="The Apache Software Foundation"
-          ><img src="http://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
+          ><img src="https://tika.apache.org/asf-logo.gif" alt="The Apache Software Foundation"
                 width="387" height="100"/></a>
       </div>
       <div id="content">
-        <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
+        <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- https://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
 <h2><a name="Contribute_to_Apache_Tika"></a>Contribute to Apache Tika</h2>
 <p>Apache Tika is an Open Source project built and maintained by a diverse range of contributors. We welcome contributions of all types to the project - code, documentation, testing, bug triage, user support, and more! Send an email to the <a href="./mail-lists.html">Tika development list</a> if you're looking for somewhere to help.</p></div>
 <div class="section">
@@ -100,17 +100,17 @@
 <div class="section">
 <h2><a name="New_Parsers_Detectors_and_Mime_Types"></a>New Parsers, Detectors and Mime Types</h2>
 <p>The <a href="./1.20/parser_guide.html">Parser Quick Start Guide</a> provides instructions on adding new mime types and new parsers to Tika.</p>
-<p>If your new Parser or Detector depends on libraries which we cannot include in Tika for license reasons, you are encouraged to list it on the <a class="externalLink" href="http://wiki.apache.org/tika/3rd%20party%20parser%20plugins">3rd Party Parser Plugins</a> page on the Tika wiki.</p></div>
+<p>If your new Parser or Detector depends on libraries which we cannot include in Tika for license reasons, you are encouraged to list it on the <a class="externalLink" href="https://cwiki.apache.org/confluence/display/TIKA/3rd+party+parser+plugins">3rd Party Parser Plugins</a> page on the Tika wiki.</p></div>
 <div class="section">
 <h2><a name="Submitting_Enhancements_and_Fixes"></a>Submitting Enhancements and Fixes</h2>
 <p>All enhancements and fixes should have a <a class="externalLink" href="https://issues.apache.org/jira/browse/TIKA">JIRA Issue or Enhancement</a> opened for them. This should describe the problem and the proposed fix / new code. The JIRA can be used for discussions on the code, and provides a single identifier for the change.</p>
 <p>Git - Git users can run <tt>git diff --no-prefix</tt> to generate a patch of changed and new files, including binaries, which can then be attached to an issue.</p>
-<p>Github Pulls - If you are working from our <a class="externalLink" href="https://github.com/apache/tika/">GitHub mirror</a>, it is possible to open a pull request for your change. Please include the JIRA Issue number in the pull request, so it can be linked by the ASF GitHub bot. </p>
+<p>GitHub Pull Requests - If you are working from our <a class="externalLink" href="https://github.com/apache/tika/">GitHub mirror</a>, it is possible to open a pull request for your change. Please include the JIRA Issue number in the pull request, so it can be linked by the ASF GitHub bot. </p>
 <p>ReviewBoard - If you have a Work-In-Progress patch for which you would like feedback / review / assistance, you can use the <a class="externalLink" href="https://reviews.apache.org/dashboard/">Apache ReviewBoard Instance</a> to post your code. Please reference the JIRA Issue number from the review request, and add a link to it to the JIRA Issue.</p>
 <p>Unit tests, License Headers - Wherever possible, we like new functionality and fixes to include small-ish unit tests. Whenever you make changes, please re-run the unit test suite (<tt>mvn install</tt> is one way to trigger this), and ensure your changes don't break anything. If adding new files, please include the Apache License v2 license header at the top of the file.</p></div>
 <div class="section">
 <h2><a name="Dependencies"></a>Dependencies</h2>
-<p>Any new dependencies introduced must be under a suitable license. Broadly, they must be Open Source, and must not place restrictions on larger works they are incorporated within. A list of the allowed licenses is maintained by the <a class="externalLink" href="http://www.apache.org/legal/resolved.html">ASF Legal Affairs Committee</a>. If in doubt, check on the dev list.</p>
+<p>Any new dependencies introduced must be under a suitable license. Broadly, they must be Open Source, and must not place restrictions on larger works they are incorporated within. A list of the allowed licenses is maintained by the <a class="externalLink" href="https://www.apache.org/legal/resolved.html">ASF Legal Affairs Committee</a>. If in doubt, check on the dev list.</p>
 <p>All new and updated dependencies must be in Maven Central. (It is not possible for Apache releases to depend on additional repositories in their poms). If possible, the project producing the dependency should be asked to publish it to Central, such as through the <a class="externalLink" href="https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide">Sonatype OSS Maven Repo</a>. If that isn't possible, someone will need to upload it via the <a class="externalLink" href="https://docs.sonatype.org/display/Repository/Uploading+3rd-party+Artifacts+to+The+Central+Repository">Sonatype 3rd Party OSS Artifacts process</a>. This will need to be completed before any patches depending on the new library can be committed to Tika.</p></div>
 <div class="section">
 <h2><a name="Code_Formatting"></a>Code Formatting</h2>
@@ -120,8 +120,8 @@
 <div class="section">
 <h2><a name="Other_Resources"></a>Other Resources</h2>
 <ul>
-<li>The <a class="externalLink" href="http://community.apache.org/">Apache Community Development project (ComDev)</a> provide general advice on getting started with contributing to Apache projects</li>
-<li>The Apache Nutch project provide a comprehensive guide on <a class="externalLink" href="http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer">becoming a Nutch Devloper</a>, much of which applies equally for Apache Tika too</li>
+<li>The <a class="externalLink" href="https://community.apache.org/">Apache Community Development project (ComDev)</a> provide general advice on getting started with contributing to Apache projects</li>
+<li>The Apache Nutch project provide a comprehensive guide on <a class="externalLink" href="https://wiki.apache.org/nutch/Becoming_A_Nutch_Developer">becoming a Nutch Devloper</a>, much of which applies equally for Apache Tika too</li>
 <li>The book <a class="externalLink" href="http://manning.com/mattmann/">Tika in Action</a> has a lot of great information on how Tika works, and how to extend it</li></ul></div>
       </div>
       <div id="sidebar">
@@ -146,7 +146,7 @@
           </li>
               
     <li class="none">
-                    <a href="http://wiki.apache.org/tika/" class="externalLink">Tika Wiki</a>
+                    <a href="https://cwiki.apache.org/confluence/display/tika" class="externalLink">Tika Wiki</a>
           </li>
               
     <li class="none">
@@ -382,23 +382,23 @@
             <ul>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/" class="externalLink">About</a>
+                    <a href="https://www.apache.org/foundation/" class="externalLink">About</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/licenses/" class="externalLink">License</a>
+                    <a href="https://www.apache.org/licenses/" class="externalLink">License</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/security/" class="externalLink">Security</a>
+                    <a href="https://www.apache.org/security/" class="externalLink">Security</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
+                    <a href="https://www.apache.org/foundation/sponsorship.html" class="externalLink">Sponsorship</a>
           </li>
               
     <li class="none">
-                    <a href="http://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
+                    <a href="https://www.apache.org/foundation/thanks.html" class="externalLink">Thanks</a>
           </li>
           </ul>
       
@@ -429,9 +429,9 @@
       </div>
       <div id="footer">
         <p>
-          Copyright &#169; 2018
-          <a href="http://www.apache.org/">The Apache Software Foundation</a>.
-          Site powered by <a href="http://maven.apache.org/">Apache Maven</a>. 
+          Copyright &#169; 2019
+          <a href="https://www.apache.org/">The Apache Software Foundation</a>.
+          Site powered by <a href="https://maven.apache.org/">Apache Maven</a>. 
           Search powered by
           <a href="http://www.lucidimagination.com">Lucid Imagination</a>
           and <a href="http://sematext.com">Sematext</a>.