You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2016/02/19 21:25:06 UTC

[27/52] [partial] tika git commit: move test files to parser-modules

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testMHTMLFirefox.mhtml
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testMHTMLFirefox.mhtml b/tika-parsers/src/test/resources/test-documents/testMHTMLFirefox.mhtml
new file mode 100644
index 0000000..6322791
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testMHTMLFirefox.mhtml
@@ -0,0 +1,455 @@
+From: <Saved by Mozilla 5.0 (Windows; en-US)>
+Subject: Aperture Framework
+Date: Fri Mar 10 2006 13:40:00 GMT+0100
+MIME-Version: 1.0
+Content-Location: http://aperture.sourceforge.net/
+Content-Type: multipart/related;
+	boundary="----=_NextPart_000_0000_B40804DE.BBCA09DC";
+	type="text/html"
+X-MAF: Produced By MAF MHT Archive Handler V0.4.1
+
+This is a multi-part message in MIME format.
+
+------=_NextPart_000_0000_B40804DE.BBCA09DC
+Content-Type: text/html
+Content-Transfer-Encoding: quoted-printable
+Content-Location: http://aperture.sourceforge.net/
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/=
+TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html><head><!-- This document is inspired by the content style at http://ww=
+w.csszengarden.com -->
+
+
+
+<meta http-equiv=3D"content-type" content=3D"text/html; charset=3Diso-8859-1=
+">
+<meta name=3D"author" content=3D"Leo Sauermann, Christiaan Fluit">
+<meta name=3D"keywords" content=3D"aperture, rdf, data"><title>Aperture Fram=
+ework</title>
+
+<script type=3D"text/javascript"></script>
+<link title=3D"Default" rel=3D"stylesheet" type=3D"text/css" href=3D"index_f=
+iles/frontpage.css" media=3D"screen">
+<link title=3D"Default" rel=3D"stylesheet" type=3D"text/css" href=3D"index_f=
+iles/print.css" media=3D"print">
+<link title=3D"Basic" rel=3D"alternate stylesheet" type=3D"text/css" href=3D=
+"index_files/all.css" media=3D"all"></head><body>
+
+<div id=3D"header">
+
+<h1>Aperture</h1>
+<h2>a Java framework for getting data and metadata</h2>
+
+</div>  <!-- header -->
+
+<div id=3D"content">
+
+<div id=3D"preamble">
+
+<p>
+<b>Project name</b>
+</p>
+
+<p>
+From <a class=3D"ext-link" title=3D"http://www.webster.com/" href=3D"http://=
+www.webster.com/">Merriam-Webster Online</a>:
+</p>
+
+<p>
+Main Entry: <strong>ap=B7er=B7ture</strong>
+(sounds like <a class=3D"ext-link" title=3D"http://cougar.eb.com/sound/a/ape=
+rtu01.wav" href=3D"http://cougar.eb.com/sound/a/apertu01.wav">this</a>)<br>
+Pronunciation: 'ap-&amp;(r)-"chur, -ch&amp;r, -"tyur, -"tur<br>
+Function: noun<br>
+Etymology: Middle English, from Latin apertura, from apertus, past
+participle of aperire to open<br>
+</p>
+
+<ol>
+<li>an opening or open space : HOLE</li>
+<li>a : the opening in a photographic lens that admits the light<br>
+b : the diameter of the stop in an optical system that determines the diamet=
+er
+of the bundle of rays traversing the instrument<br>
+c : the diameter of the objective lens or mirror of a telescope</li>
+</ol>
+
+</div> <!-- preamble -->
+
+<h2>News</h2>
+
+<p>
+<b>March 6, 2006:</b> <a href=3D"https://sourceforge.net/project/showfiles.p=
+hp?group_id=3D150969">Aperture
+2006.1 alpha 2</a> released!
+</p>
+
+<p>
+This release adds support for crawling file systems, web sites, IMAP and Out=
+look mail boxes.
+Furthermore, the number of supported file formats has increased significantl=
+y.
+</p>
+
+<h2>Features</h2>
+
+<ul>
+<li>Crawl information systems such as file systems, websites, mail boxes and=
+ mail servers</li>
+<li>Extract full-text and metadata from many common file formats</li>
+<li>View files in their native applications</li>
+<li>Ease of use: easy to learn, easy to code, easy to deploy in industrial p=
+rojects</li>
+<li>Flexible architecture: can be extended with custom file formats, data so=
+urces, etc.,
+    with support for deployment on OSGi platforms</li>
+<li>Data exchange based on Semantic Web standards (e.g. RDF, SPARQL, ...)</l=
+i>
+</ul>
+
+<h2>Supported File Formats</h2>
+
+<ul>
+<li>Plain text</li>
+<li>HTML, XHTML</li>
+<li>XML</li>
+<li>PDF (Portable Document Format)</li>
+<li>RTF (Rich Text Format)</li>
+<li>Microsoft Office: Word, Excel, Powerpoint, Visio, Publisher</li>
+<li>Microsoft Works</li>
+<li>OpenOffice 1.x: Writer, Calc, Impress, Draw</li>
+<li>StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw</li>
+<li>OpenDocument (OpenOffice 2.x, StarOffice 8.x)</li>
+<li>Corel WordPerfect, Quattro, Presentations</li>
+<li>Emails (.eml files)</li>
+</ul>
+
+<h2>Crawlers</h2>
+
+<p>
+Crawlers support the extraction of information from heterogenous data source=
+s.
+At the moment we support the following source types:</p>
+
+<ul>
+<li>File Systems (local, remote, removeable media)</li>
+<li>Websites and intranets</li>
+<li>IMAP e-mail servers</li>
+<li>Microsoft Outlook (alpha)</li>
+</ul>
+
+<h2><a name=3D"support"></a>Support</h2>
+
+<p>
+At this moment the project is still in alpha stage and we provide only limit=
+ed support.
+If you have any questions about the project, feel free to join the
+<a href=3D"https://sourceforge.net/mail/?group_id=3D150969">development mail=
+inglist</a> and ask us.
+</p>
+
+<h2><a name=3D"development"></a>Development</h2>
+
+<p>
+To use Aperture in your own projects, read the <a href=3D"http://aperture.so=
+urceforge.net/documentation.html">documentation</a>
+for information about requirements and code examples.
+</p>
+
+<p>
+If you are interested in contributing, feel free to contact the project admi=
+ns or join the
+<a href=3D"https://sourceforge.net/mail/?group_id=3D150969">development mail=
+inglist</a>.
+We are very interested in new extractors and other contributions including c=
+rawlers.
+</p>
+
+</div>  <!-- content -->
+
+<div id=3D"sideBar">
+
+<p>
+Aperture is a Java framework for extracting and querying full-text
+content and metadata from various information systems (e.g. file systems,
+web sites, mail boxes) and the file formats (e.g. documents, images)
+occurring in these systems.
+</p>
+
+<h2>Contents</h2>
+
+<ul>
+<li><a href=3D"http://aperture.sourceforge.net/index.html">Home</a></li>
+<li><a href=3D"https://sourceforge.net/project/showfiles.php?group_id=3D1509=
+69">Download</a></li>
+<li><a href=3D"http://aperture.sourceforge.net/doc/javadoc/index.html">Javad=
+oc</a></li>
+<li><a href=3D"http://aperture.sourceforge.net/documentation.html">Documenta=
+tion</a></li>
+<li><a href=3D"http://aperture.sourceforge.net/faq.html">FAQ</a></li>
+<li><a href=3D"http://aperture.sourceforge.net/index.html#support">Support</=
+a></li>
+<li><a href=3D"http://aperture.sourceforge.net/index.html#development">Devel=
+opment</a></li>
+<li><a href=3D"http://aperture.sourceforge.net/license.html">License</a></li=
+>
+</ul>
+
+<h2>Developed By</h2>
+
+<ul>
+<li><a href=3D"http://aduna.biz/">Aduna</a></li>
+<li><a href=3D"http://www.dfki.de/">DFKI</a></li>
+</ul>
+
+<h2>Site Info</h2>
+
+<p>
+Hosted by <a href=3D"http://sourceforge.net/">SourceForge.net</a>
+</p>
+
+<p>
+<a href=3D"http://sourceforge.net/"><img class=3D"logo" src=3D"index_files/s=
+flogo.png" alt=3D"SourceForge.net Logo" height=3D"37" width=3D"125"></a>
+</p>
+
+<p>
+<br>
+Graphical design by <a href=3D"http://www.pixul.net/">Pixul.net</a>. Used wi=
+th permission.
+</p>
+
+</div>  <!-- sideBar -->
+
+<div id=3D"footer">
+<a href=3D"http://validator.w3.org/check/referer" title=3D"Check the validit=
+y of this site&#8217;s XHTML">xhtml</a>
+=A0<a href=3D"http://jigsaw.w3.org/css-validator/check/referer" title=3D"Che=
+ck the validity of this site&#8217;s CSS">css</a>
+</div>  <!-- footer -->
+
+</body></html>
+
+
+------=_NextPart_000_0000_B40804DE.BBCA09DC
+Content-Type: text/css
+Content-Transfer-Encoding: quoted-printable
+Content-Location: index_files/all.css
+
+@import url(../w3-html40-recommended.css);
+
+img {
+=09border: 0;
+}
+
+
+
+------=_NextPart_000_0000_B40804DE.BBCA09DC
+Content-Type: text/css
+Content-Transfer-Encoding: quoted-printable
+Content-Location: index_files/frontpage.css
+
+/*
+ Parts of this style-sheet are copied from the=20
+ css Zen Garden submission 164 - 'Chien', by Alex Miller, http://www.pixul.n=
+et/=20
+ http://www.csszengarden.com/?cssfile=3D/164/164.css&page=3D2
+=20
+ css released under Creative Commons License - http://creativecommons.org/li=
+censes/by-nc-sa/1.0/=20
+*/
+
+@import url(../w3-html40-recommended.css);
+
+html, body, div, ul, ol, p, li {
+=09margin: 0;
+=09border: 0;
+=09padding: 0;
+}
+
+html {
+=09background-image: url(img/background.gif);
+=09font-family: verdana, arial, serif;
+=09font-size: 82%;
+=09line-height: 120%;
+=09color: #333;
+}
+
+body {
+=09background-image: url(img/containerbackground.gif);
+=09background-repeat: repeat-y;
+=09width: 590px;
+=09margin-left: auto;
+=09margin-right: auto;
+=09padding: 0 38px 0 37px;
+}
+
+ul, ol, p {
+=09padding: 0 12px 10px 12px;
+}
+
+ul, ol {
+=09list-style-position: outside;
+=09padding-left: 16px;
+=09margin-left: 0px;
+}
+
+li {
+=09margin-left: 15px;
+=09margin-bottom: 8px;
+}
+
+h2 {
+=09margin: 20px 0 15px 0;
+=09padding: 0;
+=09text-align: center;
+=09font-size: 130%;
+}
+
+img {
+=09border: 0;
+}
+
+a:link {
+=09text-decoration: none;
+=09color: #CC0000;
+}
+=09
+a:visited {
+=09text-decoration: none;
+=09color: #CC6666;
+}
+=09
+a:hover {
+=09text-decoration: underline;
+=09color: #CC0000;
+}
+
+#header {
+=09color: #d88;
+=09background-color: rgb(156,26,0);
+=09padding: 20px;
+=09margin-bottom: 20px;
+}
+
+#header h1 {
+ =09color: #eaa;
+}
+
+#content {
+=09float: left;
+=09width: 389px;
+}
+
+#content h2 {
+=09text-align:center;
+=09color: #ffffff;
+=09background-image: url(img/bgheader-content.png);
+=09background-position: left;
+=09height: 28px;
+=09padding-top: 6px;
+}
+
+#sideBar {
+=09float: right;
+=09width: 192px;
+}
+
+#sideBar h2 {
+=09background-color: #f7b356;
+=09color: #fff;
+=09background-image: url(img/bgheader-sidebar.png);
+=09background-position: left;
+=09height: 28px;
+=09padding-top: 6px;
+}
+
+#preamble {
+=09font-size: 82%;
+=09color: #996666;
+}
+
+#footer {
+=09clear: both;
+=09border-top: 1px solid #999;
+=09padding: 6px 0 6px 0;
+=09background-color: #FFF;
+=09font-weight: bold;
+=09text-align: center;
+}
+
+
+
+------=_NextPart_000_0000_B40804DE.BBCA09DC
+Content-Type: text/css
+Content-Transfer-Encoding: quoted-printable
+Content-Location: index_files/print.css
+
+html, body {
+=09color: #000;
+=09background: #fff;
+=09font-family: "Times New Roman", "Times", serif;
+=09font-size: 100%;
+=09line-height: 110%;
+}
+
+
+------=_NextPart_000_0000_B40804DE.BBCA09DC
+Content-Type: image/png
+Content-Transfer-Encoding: base64
+Content-Location: index_files/sflogo.png
+
+iVBORw0KGgoAAAANSUhEUgAAAH0AAAAlCAIAAADgP3HoAAAABGdBTUEAALGLDJGlHAAAACBjSFJN
+AABumgAAdA8AAPQkAACEzwAAbV8AAOhsAAA8iwAAG1jJR08cAAAK3ElEQVR4nGJgGAUDAQACiBGI
+////P9DOGFmAkZERIICGZ7j/f/+C4eRsxo9XGATEGeRMGLiFGDj5GTglGbjkGJjYSDLq6dOnT548
+effunZCQkKqqKpCk3HnAcAcIoCEW7kD/T548GUhilZ04cSKQ/HVo379dU9h1JBi96xh4JZAVfP/8
+cumKdZev3EDTyMnJqaKi4unpKS0tDRc8efLkjh07gHYBBXV1dYERcOfOHUi4Q9QfPHjw+/fvWF3i
+4eEBNC0xMfEBGKxfv97R0fH9+/cQWWC4AwTQEAt3COjq6gKGAgPMewzgVAmMj5SUFKnrT74vmMQb
+ZM0WXYxd8/9/c+bOvXz5CgMsng4cOAAMF4hkbm4uMECBjGXLlgHDHciIiooyNzeHyAJDGagSKA5U
+A1QJjAagpRApiFEMsJRhZmYGdNgBMHj48OEFMICHMzDcAQKICZffINGLKQ60+w4YkBZUhADEWEho
+EgTA5IYmAkySwDhguvnybeF0ZnlFnIEOBIxM0dExyMWFg4MDPJkDgwlIQgIXIgUPdIi9wGgApn0I
+FxJDaABosr29PYQtICCgoKAA5BoYGBQUFCArAwjARxnbMBACQZDAAZIzx9SAvhVISdD34w5ISWmG
+Hr4LSx6xMnoZ/V904pbjdhfdY72JBDjGLrPW1lpphLdmONFao6qHKYEJIQiPz4yVUlK19w6AEtdR
+BCbk0ANDH+89hyKvtpNGKQWMc46tSuk1QoNhzPyPa6DREd4fY555vxT9Jx+fkfHmCURkOUNCROoT
+U8FzxBjZMDf9mUTJNoIk5/yH+QrAVx3bMBDCUBhOky4NQpkiUhbIXvRslEFS0bLGDZBPedLpCnJU
+xgbr8RvDgjtGAcpW7ehLI+Ol4FlGOka999Ya3KJU7neBYep9FELKYdgUx/ZQwGpj0qrBrlVanlIK
+p+Rp2ITU8uS0kG3vz+11vz4fJ8sypD1yn3PGqLXmpl9+PbT8RTkD55+MMcaxS5bjK4CwlDNv376F
+VxdAOyDehiRDeKBDADBMGcAJH8IFxhamacg1FdxMYJQ8BQOgXqACeKBDADB6IMrQNALjEldiB8bl
+9sbpQAaLpBhWBZgAYhTQp8DUDUnsQA8CIxtehGKWZsQAoGmnTp0iqAwggLCkd0icA1McJMSB7gNy
+gR7DjGSgy4A5ACgLjCpcFmB1/Y8fPxjAgQX0MKaxkOIIUxekMYcssgMMoLqe/tZhYGA31sLlEkyQ
+n58PZwNNBuY/ZHuBZR15RuFKHMgAIACjZXACAAjDQOd3CkdxDX/O4cvDQCiiYt8iaZOmOcwdztkU
++oE3u8QNB/SkNZF/iPCEKNhErOb2redL9pIAJUxdmgjVeabmwvPR+j8SQojvEC0ALG7elhGxXwF2
+RTDKM7LThwpdUwBhKWcgMQ8JbmBRCy/v8AB4yxQ/ANafwCIb2EQDGo5WZOECQD/cBgP8hTswi0Q2
+FgIZf5+/JsZYBnBUMYDLLmDkQUSAQQYMa3g6QAs+iDJIWw6oF1IYogFI0U9MRgEIIOztSEgbBhg0
+wHwHDCZIsYA1GiFSkFRPEADdWl9fD0zpQKcDEy8kU8PrNKwA0p0BAqBjgJUhPp8IcLEbyP++8+Dv
+S8JBD3QAPDkD0zgke0Ga5/CmAVqjFtJXgrClwQCryUBlwKAnWAAABBCWcIdXLMAAghS+UlJSQPLu
+3buYioElEqQDDWSjZUNcAGgmUAswcQErEiAXay2ENXUTbCQI1wcDya9z5xF0w/bt25G58CQPzNxA
+t8GrXPyZDBcABjpyYwkrAAggLOEOTInwtACsNhnARTDQZUA3oXWXgCqBdkBah8B4RnYl3NGYXSFI
+Mx+oAGggMHGBmiKooYBpEdxMNJWQ3AYHPAGmPBEOv8+e/HNyDx4/79i+DV5nQESAMQpPv8AgA/Z7
+ITkAWDDiSbnIXoM7GGgmpDuCHwAEoLvsVRiEgTje0jhUqqtbwdG14OTWl/B1fJTOrqFbpDTg6iT4
+DGbooFMxYH9wLZRCB8NxufzvI3c5b8dXVdU3q23bYRiSJPHe13WdpmlRFJLRlKFSKo5j6XJN0zAR
+SGFGUcQ0IfXLca01pjvn4Idh2HUdNPMxNxQEAVu0BJhoAY2UF7vxxFrLKkO2oE3TJK8qmMj3fc9B
+MXUcx2VZSAswhbM/5+vDrfaiDs/t8fTr7urvN6Ov78vDTrSDCc0qlUeggS3LEr2M+DDneSYUCOMv
+xObzx0x82RIokgwBhI0xnCJfsyz7F3Sa3EsAYRmfARoBDBdgAcIALuiRczfQTUBxSDIBSgFNR254
+AcMLogsoCNQFaSRAuPCUBW8jAkUg6QUoAjEWogBoJlAELosGgJZiJkDMdue/q0cZj/Uw8rIxqLsw
+SBkycPAxMPxmYOb4/o/76asvyCqBGuEpHWgjcscFYhfQYcAkAixCgaUisBqDj0oiK8bqTjwjl8D0
+BxBAQ3JcjFjw5QXDt9cMLMwMzCwMnCIMbFQYwqUKAIY7QADhHBcbDoBHgkFMl0FIi4FfjZ6BDiyZ
+4SM8uABAAA3rcB8IAKwsHzx4QFAZQACNhjs1ATCZA+tMYlQCBNBouFMNbNiwAc84JRoACCAQ+j/C
+wPnz54F9VAUFBYj/gV1ooCCQhIsYGBgAW8yYGvfv3x8QEAAPN6Cy+fPnQ6SA2pGDFCjlAAMFBQVo
+5gAVAATQSEzvwPBFDuWPHz8aGhouXLgQGKbAMBIQELhw4QIw5QLTL7KuxMRER0dHfX19YLMSEnkQ
+QSCAm4k842EPA0AtmG4ACCAQomniGrQAmHjhIQDs/UFCEwiADGDQQ4ISrhiSnIHKkE0AqoREHiTH
+/EdK9UDD8VgNVAAQQKPhDkqbaFLwuVBI8N2/fx/CBaZxrCqB8QThEh/uAAGEZfx9WIGfnxle3/r/
+l4lBTJWRkwerEn9/fzQRfn5+ZO6CBQsgjA1ggCwFLJGA5IcPH4AtGbRZM/wAIICGdbifX8BwYzWw
+v8r4+evvx3/+qMZxRiWTYQyeWWxICc4ALpRIMhMggIZvuB+bxPD2CIOGPgMbL8Pb+6xM9/6cWfnx
+wQ/+qmyyjUQbQKQEAATQ8GzP/P/wguHaOgZhMQYZdwbFaAYxNQY+Xha+X993Hfh14QHZxhLTESUS
+AATQ8Az3f7cvMfz9z/D3N8Ov9ww/XzP8+c7w79//fwwMfxm+bDhDqmmQ5g0DbFUTVQBAAA3PcP/9
+9O3ft/8YXjxneLCL4fZShic3GN5++PWG889LJjKGXuEVL7CNj0sNqVkBIICGZ7gziip/vSPw9+57
+huvXGa5e/H/70Y87bN/ucP98zMgkwE2qacBmO6TaBKb3CRMmYCoAdp0gbR54zkCOBqxRAhBAwzPc
+WXW1v94Q+XBa9PNprq9n2D6eFvxwSuDrKZZ/HDz8CdCld5AmIBA8fPgQTTuwBwthwAuW9evXQ8K0
+sLAQWLsCG44QcSADGOhAoyANefgoAlAZRC+w3QnsDMPtggOAAAIhavVEBhX4sv7UXd7whzJhjxTD
+7ouE32aOuMEQ+WH+wf+w8RnkEICPtABJNCkgFyIF7D3BpYBxABl7gTDgfV2ICcjagQrgYzhwABQH
+CKDhPN/07cC19xN3QCpSdgN5sf5YLgfQajJgIsVMgApgAFmujlUKwgZqBCZkeHoHJnDIylNkAFQD
+6V4BAx1YRsELHzhgZGQECCCKPTcKyAIAAQYA/CfxcS2gFiUAAAAASUVORK5CYII=
+
+------=_NextPart_000_0000_B40804DE.BBCA09DC--

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testMKV.mkv
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testMKV.mkv b/tika-parsers/src/test/resources/test-documents/testMKV.mkv
new file mode 100644
index 0000000..90f15fb
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testMKV.mkv differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD b/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD
new file mode 100644
index 0000000..b2c5de4
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testMYSQL.MYD differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI b/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI
new file mode 100644
index 0000000..bc6bfcd
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testMYSQL.MYI differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testMYSQL.frm
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testMYSQL.frm b/tika-parsers/src/test/resources/test-documents/testMYSQL.frm
new file mode 100644
index 0000000..9bcdcfb
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testMYSQL.frm differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testOPUS.opus
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testOPUS.opus b/tika-parsers/src/test/resources/test-documents/testOPUS.opus
new file mode 100644
index 0000000..3f5f5af
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testOPUS.opus differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.doc
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.doc b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.doc
new file mode 100644
index 0000000..3519e75
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.doc differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.docx
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.docx b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.docx
new file mode 100644
index 0000000..0fbaa76
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.docx differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.pdf
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.pdf b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.pdf
new file mode 100644
index 0000000..68c0099
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.pdf differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.ppt
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.ppt b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.ppt
new file mode 100644
index 0000000..ad1db44
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.ppt differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.pptx
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.pptx b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.pptx
new file mode 100644
index 0000000..0506ae3
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.pptx differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.rtf
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.rtf b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.rtf
new file mode 100644
index 0000000..1eb4691
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testOptionalHyphen.rtf
@@ -0,0 +1,158 @@
+{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff0\deff0\stshfdbch37\stshfloch37\stshfhich37\stshfbi0\deflang1033\deflangfe1033\themelang1033\themelangfe0\themelangcs0{\fonttbl{\f0\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f34\fbidi \froman\fcharset0\fprq2{\*\panose 02040503050406030204}Cambria Math;}
+{\f37\fbidi \fswiss\fcharset0\fprq2{\*\panose 020f0502020204030204}Calibri;}{\f208\fbidi \froman\fcharset0\fprq0{\*\panose 00000000000000000000}CG Omega{\*\falt Times New Roman};}
+{\flomajor\f31500\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\fdbmajor\f31501\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}
+{\fhimajor\f31502\fbidi \froman\fcharset0\fprq2{\*\panose 02040503050406030204}Cambria;}{\fbimajor\f31503\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}
+{\flominor\f31504\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\fdbminor\f31505\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}
+{\fhiminor\f31506\fbidi \fswiss\fcharset0\fprq2{\*\panose 020f0502020204030204}Calibri;}{\fbiminor\f31507\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f210\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}
+{\f211\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}{\f213\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}{\f214\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}{\f215\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}
+{\f216\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}{\f217\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}{\f218\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}{\f550\fbidi \froman\fcharset238\fprq2 Cambria Math CE;}
+{\f551\fbidi \froman\fcharset204\fprq2 Cambria Math Cyr;}{\f553\fbidi \froman\fcharset161\fprq2 Cambria Math Greek;}{\f554\fbidi \froman\fcharset162\fprq2 Cambria Math Tur;}{\f557\fbidi \froman\fcharset186\fprq2 Cambria Math Baltic;}
+{\f580\fbidi \fswiss\fcharset238\fprq2 Calibri CE;}{\f581\fbidi \fswiss\fcharset204\fprq2 Calibri Cyr;}{\f583\fbidi \fswiss\fcharset161\fprq2 Calibri Greek;}{\f584\fbidi \fswiss\fcharset162\fprq2 Calibri Tur;}
+{\f587\fbidi \fswiss\fcharset186\fprq2 Calibri Baltic;}{\flomajor\f31508\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}{\flomajor\f31509\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}
+{\flomajor\f31511\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}{\flomajor\f31512\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}{\flomajor\f31513\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}
+{\flomajor\f31514\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}{\flomajor\f31515\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}{\flomajor\f31516\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}
+{\fdbmajor\f31518\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}{\fdbmajor\f31519\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}{\fdbmajor\f31521\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}
+{\fdbmajor\f31522\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}{\fdbmajor\f31523\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\fdbmajor\f31524\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}
+{\fdbmajor\f31525\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}{\fdbmajor\f31526\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}{\fhimajor\f31528\fbidi \froman\fcharset238\fprq2 Cambria CE;}
+{\fhimajor\f31529\fbidi \froman\fcharset204\fprq2 Cambria Cyr;}{\fhimajor\f31531\fbidi \froman\fcharset161\fprq2 Cambria Greek;}{\fhimajor\f31532\fbidi \froman\fcharset162\fprq2 Cambria Tur;}
+{\fhimajor\f31535\fbidi \froman\fcharset186\fprq2 Cambria Baltic;}{\fbimajor\f31538\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}{\fbimajor\f31539\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}
+{\fbimajor\f31541\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}{\fbimajor\f31542\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}{\fbimajor\f31543\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}
+{\fbimajor\f31544\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}{\fbimajor\f31545\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}{\fbimajor\f31546\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}
+{\flominor\f31548\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}{\flominor\f31549\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}{\flominor\f31551\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}
+{\flominor\f31552\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}{\flominor\f31553\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\flominor\f31554\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}
+{\flominor\f31555\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}{\flominor\f31556\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}{\fdbminor\f31558\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}
+{\fdbminor\f31559\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}{\fdbminor\f31561\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}{\fdbminor\f31562\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}
+{\fdbminor\f31563\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\fdbminor\f31564\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}{\fdbminor\f31565\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}
+{\fdbminor\f31566\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}{\fhiminor\f31568\fbidi \fswiss\fcharset238\fprq2 Calibri CE;}{\fhiminor\f31569\fbidi \fswiss\fcharset204\fprq2 Calibri Cyr;}
+{\fhiminor\f31571\fbidi \fswiss\fcharset161\fprq2 Calibri Greek;}{\fhiminor\f31572\fbidi \fswiss\fcharset162\fprq2 Calibri Tur;}{\fhiminor\f31575\fbidi \fswiss\fcharset186\fprq2 Calibri Baltic;}
+{\fbiminor\f31578\fbidi \froman\fcharset238\fprq2 Times New Roman CE;}{\fbiminor\f31579\fbidi \froman\fcharset204\fprq2 Times New Roman Cyr;}{\fbiminor\f31581\fbidi \froman\fcharset161\fprq2 Times New Roman Greek;}
+{\fbiminor\f31582\fbidi \froman\fcharset162\fprq2 Times New Roman Tur;}{\fbiminor\f31583\fbidi \froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\fbiminor\f31584\fbidi \froman\fcharset178\fprq2 Times New Roman (Arabic);}
+{\fbiminor\f31585\fbidi \froman\fcharset186\fprq2 Times New Roman Baltic;}{\fbiminor\f31586\fbidi \froman\fcharset163\fprq2 Times New Roman (Vietnamese);}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;
+\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;
+\red192\green192\blue192;}{\*\defchp \loch\af37\hich\af37\dbch\af37 }{\*\defpap \ql \li0\ri0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 }\noqfpromote {\stylesheet{\ql \li0\ri0\sa200\sl276\slmult1
+\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \rtlch\fcs1 \af0\afs22\alang1025 \ltrch\fcs0 \fs22\lang1033\langfe1033\loch\f37\hich\af37\dbch\af37\cgrid\langnp1033\langfenp1033 \snext0 \sqformat \spriority0 \styrsid14237822 
+Normal;}{\*\cs10 \additive \ssemihidden \sunhideused \spriority1 Default Paragraph Font;}{\*
+\ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\tblind0\tblindtype3\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv 
+\ql \li0\ri0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \rtlch\fcs1 \af0\afs20\alang1025 \ltrch\fcs0 \fs20\lang1033\langfe1033\loch\f37\hich\af37\dbch\af37\cgrid\langnp1033\langfenp1033 
+\snext11 \ssemihidden \sunhideused \sqformat Normal Table;}}{\*\rsidtbl \rsid281469\rsid1661966\rsid2255182\rsid3621088\rsid4260063\rsid14237822\rsid15422635\rsid15747590}{\mmathPr\mmathFont34\mbrkBin0\mbrkBinSub0\msmallFrac0\mdispDef1\mlMargin0\mrMargin0
+\mdefJc1\mwrapIndent1440\mintLim0\mnaryLim1}{\info{\author Michael McCandless}{\operator Michael McCandless}{\creatim\yr2011\mo9\dy9\hr11\min51}{\revtim\yr2011\mo9\dy9\hr11\min51}{\version2}{\edmins0}{\nofpages1}{\nofwords2}{\nofchars14}{\nofcharsws15}
+{\vern32771}}{\*\xmlnstbl {\xmlns1 http://schemas.microsoft.com/office/word/2003/wordml}}\paperw12240\paperh15840\margl1440\margr1440\margt1440\margb1440\gutter0\ltrsect 
+\widowctrl\ftnbj\aenddoc\trackmoves0\trackformatting1\donotembedsysfont1\relyonvml0\donotembedlingdata0\grfdocevents0\validatexml1\showplaceholdtext0\ignoremixedcontent0\saveinvalidxml0\showxmlerrors1\noxlattoyen
+\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\formshade\horzdoc\dgmargin\dghspace180\dgvspace180\dghorigin1440\dgvorigin1440\dghshow1\dgvshow1
+\jexpand\viewkind1\viewscale150\pgbrdrhead\pgbrdrfoot\splytwnine\ftnlytwnine\htmautsp\nolnhtadjtbl\useltbaln\alntblind\lytcalctblwd\lyttblrtgr\lnbrkrule\nobrkwrptbl\snaptogridincell\allowfieldendsel\wrppunct
+\asianbrkrule\rsidroot1661966\newtblstyruls\nogrowautofit\utinl \fet0{\*\wgrffmtfilter 2450}\ilfomacatclnup0\ltrpar \sectd \ltrsect\linex0\endnhere\sectlinegrid360\sectdefaultcl\sectrsid14237822\sftnbj {\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang 
+{\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang {\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang {\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang {\pntxtb (}
+{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{\*\pnseclvl9
+\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}\pard\plain \ltrpar\ql \li0\ri0\sa200\sl276\slmult1\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \rtlch\fcs1 \af0\afs22\alang1025 \ltrch\fcs0 
+\fs22\lang1033\langfe1033\loch\af37\hich\af37\dbch\af37\cgrid\langnp1033\langfenp1033 {\rtlch\fcs1 \af0 \ltrch\fcs0 \f208\expnd0\expndtw-3\insrsid3621088 \hich\af208\dbch\af37\loch\f208 optional\-\hich\af208\dbch\af37\loch\f208 hyphen}{\rtlch\fcs1 \af0 
+\ltrch\fcs0 \insrsid15481255 
+\par }{\*\themedata 504b030414000600080000002100828abc13fa0000001c020000130000005b436f6e74656e745f54797065735d2e786d6cac91cb6ac3301045f785fe83d0b6d8
+72ba28a5d8cea249777d2cd20f18e4b12d6a8f843409c9df77ecb850ba082d74231062ce997b55ae8fe3a00e1893f354e9555e6885647de3a8abf4fbee29bbd7
+2a3150038327acf409935ed7d757e5ee14302999a654e99e393c18936c8f23a4dc072479697d1c81e51a3b13c07e4087e6b628ee8cf5c4489cf1c4d075f92a0b
+44d7a07a83c82f308ac7b0a0f0fbf90c2480980b58abc733615aa2d210c2e02cb04430076a7ee833dfb6ce62e3ed7e14693e8317d8cd0433bf5c60f53fea2fe7
+065bd80facb647e9e25c7fc421fd2ddb526b2e9373fed4bb902e182e97b7b461e6bfad3f010000ffff0300504b030414000600080000002100a5d6a7e7c00000
+00360100000b0000005f72656c732f2e72656c73848fcf6ac3300c87ef85bd83d17d51d2c31825762fa590432fa37d00e1287f68221bdb1bebdb4fc7060abb08
+84a4eff7a93dfeae8bf9e194e720169aaa06c3e2433fcb68e1763dbf7f82c985a4a725085b787086a37bdbb55fbc50d1a33ccd311ba548b63095120f88d94fbc
+52ae4264d1c910d24a45db3462247fa791715fd71f989e19e0364cd3f51652d73760ae8fa8c9ffb3c330cc9e4fc17faf2ce545046e37944c69e462a1a82fe353
+bd90a865aad41ed0b5b8f9d6fd010000ffff0300504b0304140006000800000021006b799616830000008a0000001c0000007468656d652f7468656d652f7468
+656d654d616e616765722e786d6c0ccc4d0ac3201040e17da17790d93763bb284562b2cbaebbf600439c1a41c7a0d29fdbd7e5e38337cedf14d59b4b0d592c9c
+070d8a65cd2e88b7f07c2ca71ba8da481cc52c6ce1c715e6e97818c9b48d13df49c873517d23d59085adb5dd20d6b52bd521ef2cdd5eb9246a3d8b4757e8d3f7
+29e245eb2b260a0238fd010000ffff0300504b03041400060008000000210096b5ade296060000501b0000160000007468656d652f7468656d652f7468656d65
+312e786d6cec594f6fdb3614bf0fd87720746f6327761a07758ad8b19b2d4d1bc46e871e698996d850a240d2497d1bdae38001c3ba618715d86d87615b8116d8
+a5fb34d93a6c1dd0afb0475292c5585e9236d88aad3e2412f9e3fbff1e1fa9abd7eec70c1d1221294fda5efd72cd4324f1794093b0eddd1ef62fad79482a9c04
+98f184b4bd2991deb58df7dfbb8ad755446282607d22d771db8b944ad79796a40fc3585ee62949606ecc458c15bc8a702910f808e8c66c69b9565b5d8a314d3c
+94e018c8de1a8fa94fd05093f43672e23d06af89927ac06762a049136785c10607758d9053d965021d62d6f6804fc08f86e4bef210c352c144dbab999fb7b471
+7509af678b985ab0b6b4ae6f7ed9ba6c4170b06c788a705430adf71bad2b5b057d03606a1ed7ebf5babd7a41cf00b0ef83a6569632cd467faddec9699640f671
+9e76b7d6ac355c7c89feca9cccad4ea7d36c65b258a206641f1b73f8b5da6a6373d9c11b90c537e7f08dce66b7bbeae00dc8e257e7f0fd2badd5868b37a088d1
+e4600ead1ddaef67d40bc898b3ed4af81ac0d76a197c86826828a24bb318f3442d8ab518dfe3a20f000d6458d104a9694ac6d88728eee2782428d60cf03ac1a5
+193be4cbb921cd0b495fd054b5bd0f530c1931a3f7eaf9f7af9e3f45c70f9e1d3ff8e9f8e1c3e3073f5a42ceaa6d9c84e5552fbffdeccfc71fa33f9e7ef3f2d1
+17d57859c6fffac327bffcfc793510d26726ce8b2f9ffcf6ecc98baf3efdfdbb4715f04d814765f890c644a29be408edf3181433567125272371be15c308d3f2
+8acd249438c19a4b05fd9e8a1cf4cd296699771c393ac4b5e01d01e5a30a787d72cf1178108989a2159c77a2d801ee72ce3a5c545a6147f32a99793849c26ae6
+6252c6ed637c58c5bb8b13c7bfbd490a75330f4b47f16e441c31f7184e140e494214d273fc80900aedee52ead87597fa824b3e56e82e451d4c2b4d32a423279a
+668bb6690c7e9956e90cfe766cb37b077538abd27a8b1cba48c80acc2a841f12e698f13a9e281c57911ce298950d7e03aba84ac8c154f8655c4f2af074481847
+bd804859b5e696007d4b4edfc150b12addbecba6b18b148a1e54d1bc81392f23b7f84137c2715a851dd0242a633f900710a218ed715505dfe56e86e877f0034e
+16bafb0e258ebb4faf06b769e888340b103d3311da9750aa9d0a1cd3e4efca31a3508f6d0c5c5c398602f8e2ebc71591f5b616e24dd893aa3261fb44f95d843b
+5974bb5c04f4edafb95b7892ec1108f3f98de75dc97d5772bdff7cc95d94cf672db4b3da0a6557f70db629362d72bcb0431e53c6066acac80d699a6409fb44d0
+8741bdce9c0e4971624a2378cceaba830b05366b90e0ea23aaa241845368b0eb9e2612ca8c742851ca251ceccc70256d8d87265dd96361531f186c3d9058edf2
+c00eafe8e1fc5c509031bb4d680e9f39a3154de0accc56ae644441edd76156d7429d995bdd88664a9dc3ad50197c38af1a0c16d684060441db02565e85f3b966
+0d0713cc48a0ed6ef7dedc2dc60b17e92219e180643ed27acffba86e9c94c78ab90980d8a9f0913ee49d62b512b79626fb06dccee2a432bbc60276b9f7dec44b
+7904cfbca4f3f6443ab2a49c9c2c41476dafd55c6e7ac8c769db1bc399161ee314bc2e75cf8759081743be1236ec4f4d6693e5336fb672c5dc24a8c33585b5fb
+9cc24e1d4885545b58463634cc5416022cd19cacfccb4d30eb45296023fd35a458598360f8d7a4003bbaae25e331f155d9d9a5116d3bfb9a95523e51440ca2e0
+088dd844ec6370bf0e55d027a012ae264c45d02f708fa6ad6da6dce29c255df9f6cae0ec38666984b372ab5334cf640b37795cc860de4ae2816e95b21be5ceaf
+8a49f90b52a51cc6ff3355f47e0237052b81f6800fd7b802239daf6d8f0b1571a8426944fdbe80c6c1d40e8816b88b8569082ab84c36ff0539d4ff6dce591a26
+ade1c0a7f669880485fd484582903d284b26fa4e2156cff62e4b9265844c4495c495a9157b440e091bea1ab8aaf7760f4510eaa69a6465c0e04ec69ffb9e65d0
+28d44d4e39df9c1a52ecbd3607fee9cec7263328e5d661d3d0e4f62f44acd855ed7ab33cdf7bcb8ae889599bd5c8b3029895b6825696f6af29c239b75a5bb1e6
+345e6ee6c28117e73586c1a2214ae1be07e93fb0ff51e133fb65426fa843be0fb515c187064d0cc206a2fa926d3c902e907670048d931db4c1a44959d366ad93
+b65abe595f70a75bf03d616c2dd959fc7d4e6317cd99cbcec9c58b34766661c7d6766ca1a9c1b327531486c6f941c638c67cd22a7f75e2a37be0e82db8df9f30
+254d30c1372581a1f51c983c80e4b71ccdd28dbf000000ffff0300504b0304140006000800000021000dd1909fb60000001b010000270000007468656d652f74
+68656d652f5f72656c732f7468656d654d616e616765722e786d6c2e72656c73848f4d0ac2301484f78277086f6fd3ba109126dd88d0add40384e4350d363f24
+51eced0dae2c082e8761be9969bb979dc9136332de3168aa1a083ae995719ac16db8ec8e4052164e89d93b64b060828e6f37ed1567914b284d262452282e3198
+720e274a939cd08a54f980ae38a38f56e422a3a641c8bbd048f7757da0f19b017cc524bd62107bd5001996509affb3fd381a89672f1f165dfe514173d9850528
+a2c6cce0239baa4c04ca5bbabac4df000000ffff0300504b01022d0014000600080000002100828abc13fa0000001c0200001300000000000000000000000000
+000000005b436f6e74656e745f54797065735d2e786d6c504b01022d0014000600080000002100a5d6a7e7c0000000360100000b000000000000000000000000
+002b0100005f72656c732f2e72656c73504b01022d00140006000800000021006b799616830000008a0000001c00000000000000000000000000140200007468
+656d652f7468656d652f7468656d654d616e616765722e786d6c504b01022d001400060008000000210096b5ade296060000501b000016000000000000000000
+00000000d10200007468656d652f7468656d652f7468656d65312e786d6c504b01022d00140006000800000021000dd1909fb60000001b010000270000000000
+00000000000000009b0900007468656d652f7468656d652f5f72656c732f7468656d654d616e616765722e786d6c2e72656c73504b050600000000050005005d010000960a00000000}
+{\*\colorschememapping 3c3f786d6c2076657273696f6e3d22312e302220656e636f64696e673d225554462d3822207374616e64616c6f6e653d22796573223f3e0d0a3c613a636c724d
+617020786d6c6e733a613d22687474703a2f2f736368656d61732e6f70656e786d6c666f726d6174732e6f72672f64726177696e676d6c2f323030362f6d6169
+6e22206267313d226c743122207478313d22646b3122206267323d226c743222207478323d22646b322220616363656e74313d22616363656e74312220616363
+656e74323d22616363656e74322220616363656e74333d22616363656e74332220616363656e74343d22616363656e74342220616363656e74353d22616363656e74352220616363656e74363d22616363656e74362220686c696e6b3d22686c696e6b2220666f6c486c696e6b3d22666f6c486c696e6b222f3e}
+{\*\latentstyles\lsdstimax267\lsdlockeddef0\lsdsemihiddendef1\lsdunhideuseddef1\lsdqformatdef0\lsdprioritydef99{\lsdlockedexcept \lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority0 \lsdlocked0 Normal;
+\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority9 \lsdlocked0 heading 1;\lsdqformat1 \lsdpriority9 \lsdlocked0 heading 2;\lsdqformat1 \lsdpriority9 \lsdlocked0 heading 3;\lsdqformat1 \lsdpriority9 \lsdlocked0 heading 4;
+\lsdqformat1 \lsdpriority9 \lsdlocked0 heading 5;\lsdqformat1 \lsdpriority9 \lsdlocked0 heading 6;\lsdqformat1 \lsdpriority9 \lsdlocked0 heading 7;\lsdqformat1 \lsdpriority9 \lsdlocked0 heading 8;\lsdqformat1 \lsdpriority9 \lsdlocked0 heading 9;
+\lsdpriority39 \lsdlocked0 toc 1;\lsdpriority39 \lsdlocked0 toc 2;\lsdpriority39 \lsdlocked0 toc 3;\lsdpriority39 \lsdlocked0 toc 4;\lsdpriority39 \lsdlocked0 toc 5;\lsdpriority39 \lsdlocked0 toc 6;\lsdpriority39 \lsdlocked0 toc 7;
+\lsdpriority39 \lsdlocked0 toc 8;\lsdpriority39 \lsdlocked0 toc 9;\lsdqformat1 \lsdpriority35 \lsdlocked0 caption;\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority10 \lsdlocked0 Title;\lsdpriority1 \lsdlocked0 Default Paragraph Font;
+\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority11 \lsdlocked0 Subtitle;\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority22 \lsdlocked0 Strong;\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority20 \lsdlocked0 Emphasis;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority59 \lsdlocked0 Table Grid;\lsdunhideused0 \lsdlocked0 Placeholder Text;\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority1 \lsdlocked0 No Spacing;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority60 \lsdlocked0 Light Shading;\lsdsemihidden0 \lsdunhideused0 \lsdpriority61 \lsdlocked0 Light List;\lsdsemihidden0 \lsdunhideused0 \lsdpriority62 \lsdlocked0 Light Grid;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority63 \lsdlocked0 Medium Shading 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority64 \lsdlocked0 Medium Shading 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority65 \lsdlocked0 Medium List 1;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority66 \lsdlocked0 Medium List 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority67 \lsdlocked0 Medium Grid 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority68 \lsdlocked0 Medium Grid 2;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority69 \lsdlocked0 Medium Grid 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority70 \lsdlocked0 Dark List;\lsdsemihidden0 \lsdunhideused0 \lsdpriority71 \lsdlocked0 Colorful Shading;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority72 \lsdlocked0 Colorful List;\lsdsemihidden0 \lsdunhideused0 \lsdpriority73 \lsdlocked0 Colorful Grid;\lsdsemihidden0 \lsdunhideused0 \lsdpriority60 \lsdlocked0 Light Shading Accent 1;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority61 \lsdlocked0 Light List Accent 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority62 \lsdlocked0 Light Grid Accent 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority63 \lsdlocked0 Medium Shading 1 Accent 1;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority64 \lsdlocked0 Medium Shading 2 Accent 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority65 \lsdlocked0 Medium List 1 Accent 1;\lsdunhideused0 \lsdlocked0 Revision;
+\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority34 \lsdlocked0 List Paragraph;\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority29 \lsdlocked0 Quote;\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority30 \lsdlocked0 Intense Quote;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority66 \lsdlocked0 Medium List 2 Accent 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority67 \lsdlocked0 Medium Grid 1 Accent 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority68 \lsdlocked0 Medium Grid 2 Accent 1;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority69 \lsdlocked0 Medium Grid 3 Accent 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority70 \lsdlocked0 Dark List Accent 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority71 \lsdlocked0 Colorful Shading Accent 1;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority72 \lsdlocked0 Colorful List Accent 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority73 \lsdlocked0 Colorful Grid Accent 1;\lsdsemihidden0 \lsdunhideused0 \lsdpriority60 \lsdlocked0 Light Shading Accent 2;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority61 \lsdlocked0 Light List Accent 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority62 \lsdlocked0 Light Grid Accent 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority63 \lsdlocked0 Medium Shading 1 Accent 2;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority64 \lsdlocked0 Medium Shading 2 Accent 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority65 \lsdlocked0 Medium List 1 Accent 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority66 \lsdlocked0 Medium List 2 Accent 2;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority67 \lsdlocked0 Medium Grid 1 Accent 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority68 \lsdlocked0 Medium Grid 2 Accent 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority69 \lsdlocked0 Medium Grid 3 Accent 2;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority70 \lsdlocked0 Dark List Accent 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority71 \lsdlocked0 Colorful Shading Accent 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority72 \lsdlocked0 Colorful List Accent 2;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority73 \lsdlocked0 Colorful Grid Accent 2;\lsdsemihidden0 \lsdunhideused0 \lsdpriority60 \lsdlocked0 Light Shading Accent 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority61 \lsdlocked0 Light List Accent 3;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority62 \lsdlocked0 Light Grid Accent 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority63 \lsdlocked0 Medium Shading 1 Accent 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority64 \lsdlocked0 Medium Shading 2 Accent 3;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority65 \lsdlocked0 Medium List 1 Accent 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority66 \lsdlocked0 Medium List 2 Accent 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority67 \lsdlocked0 Medium Grid 1 Accent 3;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority68 \lsdlocked0 Medium Grid 2 Accent 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority69 \lsdlocked0 Medium Grid 3 Accent 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority70 \lsdlocked0 Dark List Accent 3;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority71 \lsdlocked0 Colorful Shading Accent 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority72 \lsdlocked0 Colorful List Accent 3;\lsdsemihidden0 \lsdunhideused0 \lsdpriority73 \lsdlocked0 Colorful Grid Accent 3;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority60 \lsdlocked0 Light Shading Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority61 \lsdlocked0 Light List Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority62 \lsdlocked0 Light Grid Accent 4;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority63 \lsdlocked0 Medium Shading 1 Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority64 \lsdlocked0 Medium Shading 2 Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority65 \lsdlocked0 Medium List 1 Accent 4;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority66 \lsdlocked0 Medium List 2 Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority67 \lsdlocked0 Medium Grid 1 Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority68 \lsdlocked0 Medium Grid 2 Accent 4;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority69 \lsdlocked0 Medium Grid 3 Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority70 \lsdlocked0 Dark List Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority71 \lsdlocked0 Colorful Shading Accent 4;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority72 \lsdlocked0 Colorful List Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority73 \lsdlocked0 Colorful Grid Accent 4;\lsdsemihidden0 \lsdunhideused0 \lsdpriority60 \lsdlocked0 Light Shading Accent 5;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority61 \lsdlocked0 Light List Accent 5;\lsdsemihidden0 \lsdunhideused0 \lsdpriority62 \lsdlocked0 Light Grid Accent 5;\lsdsemihidden0 \lsdunhideused0 \lsdpriority63 \lsdlocked0 Medium Shading 1 Accent 5;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority64 \lsdlocked0 Medium Shading 2 Accent 5;\lsdsemihidden0 \lsdunhideused0 \lsdpriority65 \lsdlocked0 Medium List 1 Accent 5;\lsdsemihidden0 \lsdunhideused0 \lsdpriority66 \lsdlocked0 Medium List 2 Accent 5;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority67 \lsdlocked0 Medium Grid 1 Accent 5;\lsdsemihidden0 \lsdunhideused0 \lsdpriority68 \lsdlocked0 Medium Grid 2 Accent 5;\lsdsemihidden0 \lsdunhideused0 \lsdpriority69 \lsdlocked0 Medium Grid 3 Accent 5;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority70 \lsdlocked0 Dark List Accent 5;\lsdsemihidden0 \lsdunhideused0 \lsdpriority71 \lsdlocked0 Colorful Shading Accent 5;\lsdsemihidden0 \lsdunhideused0 \lsdpriority72 \lsdlocked0 Colorful List Accent 5;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority73 \lsdlocked0 Colorful Grid Accent 5;\lsdsemihidden0 \lsdunhideused0 \lsdpriority60 \lsdlocked0 Light Shading Accent 6;\lsdsemihidden0 \lsdunhideused0 \lsdpriority61 \lsdlocked0 Light List Accent 6;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority62 \lsdlocked0 Light Grid Accent 6;\lsdsemihidden0 \lsdunhideused0 \lsdpriority63 \lsdlocked0 Medium Shading 1 Accent 6;\lsdsemihidden0 \lsdunhideused0 \lsdpriority64 \lsdlocked0 Medium Shading 2 Accent 6;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority65 \lsdlocked0 Medium List 1 Accent 6;\lsdsemihidden0 \lsdunhideused0 \lsdpriority66 \lsdlocked0 Medium List 2 Accent 6;\lsdsemihidden0 \lsdunhideused0 \lsdpriority67 \lsdlocked0 Medium Grid 1 Accent 6;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority68 \lsdlocked0 Medium Grid 2 Accent 6;\lsdsemihidden0 \lsdunhideused0 \lsdpriority69 \lsdlocked0 Medium Grid 3 Accent 6;\lsdsemihidden0 \lsdunhideused0 \lsdpriority70 \lsdlocked0 Dark List Accent 6;
+\lsdsemihidden0 \lsdunhideused0 \lsdpriority71 \lsdlocked0 Colorful Shading Accent 6;\lsdsemihidden0 \lsdunhideused0 \lsdpriority72 \lsdlocked0 Colorful List Accent 6;\lsdsemihidden0 \lsdunhideused0 \lsdpriority73 \lsdlocked0 Colorful Grid Accent 6;
+\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority19 \lsdlocked0 Subtle Emphasis;\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority21 \lsdlocked0 Intense Emphasis;
+\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority31 \lsdlocked0 Subtle Reference;\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority32 \lsdlocked0 Intense Reference;
+\lsdsemihidden0 \lsdunhideused0 \lsdqformat1 \lsdpriority33 \lsdlocked0 Book Title;\lsdpriority37 \lsdlocked0 Bibliography;\lsdqformat1 \lsdpriority39 \lsdlocked0 TOC Heading;}}{\*\datastore 010500000200000018000000
+4d73786d6c322e534158584d4c5265616465722e352e3000000000000000000000060000
+d0cf11e0a1b11ae1000000000000000000000000000000003e000300feff090006000000000000000000000001000000010000000000000000100000feffffff00000000feffffff0000000000000000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+fffffffffffffffffdfffffffeffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+ffffffffffffffffffffffffffffffff52006f006f007400200045006e00740072007900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000016000500ffffffffffffffffffffffffec69d9888b8b3d4c859eaf6cd158be0f000000000000000000000000f09e
+745c086fcc01feffffff00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ffffffffffffffffffffffff00000000000000000000000000000000000000000000000000000000
+00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ffffffffffffffffffffffff0000000000000000000000000000000000000000000000000000
+000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ffffffffffffffffffffffff000000000000000000000000000000000000000000000000
+0000000000000000000000000000000000000000000000000105000000000000}}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testPBM.pbm
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testPBM.pbm b/tika-parsers/src/test/resources/test-documents/testPBM.pbm
new file mode 100644
index 0000000..8eb8b82
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testPBM.pbm
@@ -0,0 +1,3 @@
+P1
+1 1
+0
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testPGM.pgm
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testPGM.pgm b/tika-parsers/src/test/resources/test-documents/testPGM.pgm
new file mode 100644
index 0000000..1077ec0
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testPGM.pgm
@@ -0,0 +1,4 @@
+P2
+1 1
+255
+0
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testPICT.pct
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testPICT.pct b/tika-parsers/src/test/resources/test-documents/testPICT.pct
new file mode 100644
index 0000000..f00dbd1
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testPICT.pct differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testPPT_2imgs.ppt
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testPPT_2imgs.ppt b/tika-parsers/src/test/resources/test-documents/testPPT_2imgs.ppt
new file mode 100644
index 0000000..ce68bcf
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testPPT_2imgs.ppt differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testPageNumber.pdf
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testPageNumber.pdf b/tika-parsers/src/test/resources/test-documents/testPageNumber.pdf
new file mode 100644
index 0000000..0ec2693
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testPageNumber.pdf differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testPhoneNumberExtractor.odt
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testPhoneNumberExtractor.odt b/tika-parsers/src/test/resources/test-documents/testPhoneNumberExtractor.odt
new file mode 100644
index 0000000..d32e834
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testPhoneNumberExtractor.odt differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testPopupAnnotation.pdf
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testPopupAnnotation.pdf b/tika-parsers/src/test/resources/test-documents/testPopupAnnotation.pdf
new file mode 100644
index 0000000..c82107d
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testPopupAnnotation.pdf differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testQUATTRO.qpw
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testQUATTRO.qpw b/tika-parsers/src/test/resources/test-documents/testQUATTRO.qpw
new file mode 100644
index 0000000..ec34f47
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testQUATTRO.qpw differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testQUATTRO.wb3
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testQUATTRO.wb3 b/tika-parsers/src/test/resources/test-documents/testQUATTRO.wb3
new file mode 100644
index 0000000..8fc7022
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testQUATTRO.wb3 differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testRDF.rdf
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testRDF.rdf b/tika-parsers/src/test/resources/test-documents/testRDF.rdf
new file mode 100644
index 0000000..1bed3b0
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testRDF.rdf
@@ -0,0 +1,23 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+         xmlns:dc="http://purl.org/dc/elements/1.1/">
+  <rdf:Description
+      rdf:about="http://lucene.apache.org/org.apache.tika/"
+      dc:title="Apache Tika"/>
+</rdf:RDF>

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testRFC822-CC-BCC
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testRFC822-CC-BCC b/tika-parsers/src/test/resources/test-documents/testRFC822-CC-BCC
new file mode 100644
index 0000000..6fe7c2e
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testRFC822-CC-BCC
@@ -0,0 +1,44 @@
+Message-ID: <48...@thyme>
+Date: Tue, 10 Apr 2001 11:52:00 -0700 (PDT)
+From: beth.apollo@enron.com
+To: shona.wilson@enron.com, jeffrey.gossett@enron.com, stacey.white@enron.com,
+	d.hall@enron.com, sheri.thomas@enron.com, brenda.herod@enron.com,
+	john.j.boudreaux@us.arthurandersen.com,
+	john.vickers@us.arthurandersen.com, kate.agnew@us.arthurandersen.com,
+	jennifer.stevenson@us.arthurandersen.com
+Subject: Confidential Folder to safely pass information to  Arthur Andersen
+Cc: sally.beck@enron.com, tom.bauer@us.arthurandersen.com,
+	georgeanne.hodges@enron.com, vanessa.schulte@enron.com,
+	bob.hall@enron.com, leslie.reeves@enron.com, brent.price@enron.com
+Mime-Version: 1.0
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Bcc: sally.beck@enron.com, tom.bauer@us.arthurandersen.com,
+	georgeanne.hodges@enron.com, vanessa.schulte@enron.com,
+	bob.hall@enron.com, leslie.reeves@enron.com, brent.price@enron.com
+X-From: Beth Apollo <Beth Apollo/ENRON@enronXgate@ENRON>
+X-To: Shona Wilson <Shona Wilson/NA/Enron@Enron>, Jeffrey C Gossett <Jeffrey C Gossett/HOU/ECT@ECT>, Stacey W White <Stacey W White/HOU/ECT@ECT>, D Todd Hall <D Todd Hall/ENRON@enronXgate>, Sheri Thomas <Sheri Thomas/HOU/ECT@ECT>, Brenda F Herod <Brenda F Herod/ENRON@enronXgate>, john.j.boudreaux@us.arthurandersen.com@SMTP <jo...@enronXgate>, john.vickers@us.arthurandersen.com@SMTP <jo...@enronXgate>, kate.agnew@us.arthurandersen.com@SMTP <ka...@enronXgate>, jennifer.stevenson@us.arthurandersen.com@SMTP <je...@enronXgate>
+X-cc: Sally Beck <Sally Beck/HOU/ECT@ECT>, tom.bauer@us.arthurandersen.com@SMTP <to...@enronXgate>, Georgeanne Hodges <Georgeanne Hodges/ENRON@enronXgate>, Vanessa Schulte <Vanessa Schulte/ENRON@enronXgate>, Bob M Hall <Bob M Hall/NA/Enron@Enron>, Leslie Reeves <Leslie Reeves/HOU/ECT@ECT>, Brent A Price <Brent A Price/ENRON@enronXgate>
+X-bcc:
+X-Folder: \Beck, Sally\Beck, Sally\Apollo, Beth
+X-Origin: BECK-S
+X-FileName: Beck, Sally.pst
+
+
+We have become increasingly concerned about confidential information (dpr/position info, curves, validations/stress tests, etc) being passed to Arthur Andersen for audit purposes over the Web to their Arthur Andersen email addresses. (necessary now they no longer have access to Enron's internal email system)
+
+Please use the folder described below when passing any info (that you would have concerns about if it was picked up by a third party) via the shared drive that has been set up for this specific purpose.
+
+Note:  AA should also use the shared drive to pass info back if there are questions, or the data needs updating.  We should also consider the sensitivity of audit findings and special presentations if they are being distributed electronically.
+
+
+Please pass this note to others in your groups who have the need to pass info back and forth.
+
+
+Details on how to access for those who will use this method to pass info:
+
+A secured folder has been set up on the "o" drive under Corporate called Arthur_Andersen (O:\Corporate\Arthur_Anderson).  Please post all confidential files in this folder rather than emailing the files to their company email address.  If you need access to this folder, submit an eRequest through the IT Central site: http://itcentral.enron.com/Data/Services/SecurityRequests/.  Arthur Andersen will be able to retrieve these files for review with their terminal server access at the Three Allen Center location.
+
+Please contact Vanessa Schulte if you have any problems or questions
+
+Beth Apollo
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testRFC822-big
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testRFC822-big b/tika-parsers/src/test/resources/test-documents/testRFC822-big
new file mode 100644
index 0000000..6875959
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testRFC822-big
@@ -0,0 +1,199 @@
+Date: Thu, 7 Jun 2001 02:15:00 -0700 (PDT)
+Message-ID: <00...@PMZL01>
+MIME-Version: 1.0
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+From:  Janette Elbertson
+To:  Alan Aronowitz, Sandi M Braband, Robert Bruce, Teresa G Bushman, Michelle Cash,
+	 Dominic Carolan, Barton Clark, Harry M Collins, Mary Cook, Nancy Corbet, Ned
+	 E Crady, Eddy Daniels, Angela Davis, Peter del Vecchio, Stacy E Dickson, Andrew
+	 Edison, Roseann Engeldorf, Shawna Flynn, Robert H George, Barbara N Gray, Mark
+	 Greenberg, Wayne Gresham, Leslie Hansen, Jeffrey T Hodge, Brent Hendry, Dan
+	 J Hyvl, Anne C Koehler, Cheryl Lindeman, Dan Lyons, Kay Mann, Travis McCullough,
+	 Lisa Mellencamp, Janet H Moore, Harlan Murphy, Julia Murray, Cheryl Nelson,
+	 Gerald Nemec, Marcus Nettelton, Francisco Pinto Leite, David Portz, Coralina
+	 Rivera, Michael A Robison, Daniel R Rogers, Elizabeth Sager, Richard B Sanders,
+	 Frank Sayre, Lance Schuler-Legal, Sara Shackleton, Carlos Sole, Carol St Clair,
+	 Lou Stoler, Mark Taylor, Sheila Tweed, Steve Van Hooser, John Viverito, Ann
+	 Elizabeth White, Randy Young, Susan Bailey, Kimberlee A Bennick, Martha Braddy,
+	 Sarah Bruck, Genia FitzGerald, Nony Flores, Diane Goode, Linda R Guinn, Marie
+	 Heard, Ed B Hearn III, Mary J Heinitz, Tana Jones, Kathleen Carnahan, Deb Korkmas,
+	 Laurie Mayer, Matt Maxwell, Mary Ogden, Stephanie Panus, Debra Perlingiere,
+	 Robert Walker, Kay Young, Merrill W Haas, Samantha Ferguson, Majed Nachawati,
+	 Suzanne Adams, Connie Castillo, Margaret Doucette, Keegan Farrell, Nita Garcia,
+	 Carolyn George, Holly Keiser, MaryHelen Martinez, Taffy Milligan, Linda J Simmons,
+	 Becky Spencer, Twanda Sweet, Alice Wright, Theresa Zucha, Reginald Shanks,
+	 Elizabeth Lauterbach, Claudia Meraz
+Cc:  Gary Bode, Vanessa Griffin, Esmeralda Gonzalez, Martha Keesler, Rae Meadows,
+	 Stephanie Truss
+Subject:  Outlook Migration - EWS Legal
+X-Filename:  sbailey2.nsf
+X-Folder:  \All documents
+X-SDOC:  421977
+X-ZLID:  zl-edrm-enron-v2-bailey-s-1216.eml
+
+Our department will be migrated to Outlook in two groups.  The first group
+will be migrated on Monday, June 11,  and the second group will be migrated
+on Tuesday, June 12.   You will receive four e-mails from the Outlook
+migration team.  Please do not delete them.  You will need to open the four
+e-mails and follow the instructions to migrate to Outlook.
+
+Assistants, you will be responsible for scheduling training for yourself and
+your assignments.  It is recommended everyone attend a one hour training
+class.  Training can be scheduled by contacting Maggie Cruz at extension
+3-1816.  (Assistants, please coordinate training with your backup so both of
+you are not in training at the same time.)  Outlook migration specialists
+will be on the 38th floor to answer questions Tuesday and Wednesday, June 12
+and 13.
+
+Listed below is useful information provided to us by the Outlook Migration
+team.
+
+E-mail Policies
+
+Users will be restricted to a Mailbox size of 100 MB.
+
+Further mailbox size restrictions are detailed as follows:
+
+Issue Warning at 75 MB - users are automatically sent a warning from the
+System Administrator explaining they are near their Mailbox limit.
+
+Prohibit Send at 100 MB - users are prevented from sending e-mail, yet they
+can still receive internal and external messages.  Users must reduce the size
+of their mailbox by deleting old mail, saving attachments to a local drive,
+etc. before they can send e-mail again.
+
+Inbound/Outbound Mail Size Limits - inbound and outbound e-mail messages will
+be limited to a size of 10MB.
+
+Deleted Item Retention - users will be able to recover deleted items from
+their mailbox as old as 8 days.  Deleted items include e-mail messages,
+folders, contacts, calendar entries, tasks, notes, journal entries and
+meeting notices.
+
+Archiving - archiving will not be a supported feature of Outlook 2000.
+
+Migration Preparation
+
+Clean Your Mailbox - due to new space limitations on your mailbox, you are
+advised to clean your Notes mailbox of old, unneeded messages BEFORE
+migration.  If you are at the 100MB limit on the day of migration, you will
+not be able to send messages once you are in Outlook.
+
+Limits on Items Migrated  - from the day of your migration, only 30 days of
+old mail will be migrated from your mailbox.  This includes mail in your
+inbox and other folders.  Calendar items dating back one year from the day of
+migration will be migrated (with the exception of repeating appointments).
+
+The following people will be migrated Monday evening, June 11.
+
+Adams, Suzanne
+Bushman, Teresa
+Cash, Michelle
+Clark, Bart
+Corbet, Nancy
+Daniels, Eddy
+Davis, Angela
+Dickson, Stacy
+Edison, Andy
+Elbertson, Janette
+FitzGerald, Genia
+Flores, Nony
+George, Robert H.
+Goode, Diane
+Guinn, Linda
+Haedicke, Mark
+Hansen, Leslie
+Hearn, Ed
+Heinitz, Mary
+Hodge, Jeff
+Legal Temp 1
+Legal Temp 2
+Legal Temp 3
+Legal Temp 4
+Mann, Kay
+Maxwell, Matt
+McCullough, Travis
+Meraz, Claudia
+Mellencamp, Lisa
+Milligan, Taffy
+Moore, Janet H.
+Nemec, Gerald
+Nettelton, Marcus
+Ogden, Mary
+Perlingiere, Debra
+Portz, David
+Sager, Elizabeth
+Sanders, Richard
+Simmons, Linda
+Sol,, Carlos
+St. Clair, Carol
+Sweet, Twanda
+Tweed, Sheila
+Van Hooser, Steve
+White, Ann Elizabeth
+Zucha, Theresa
+
+The following people will be migrated Tuesday evening, June 12.
+
+Aronowitz, Alan
+Bailey, Susan
+Boyd, Samantha
+Braddy, Martha
+Bruce, Robert
+Bruck, Sarah
+Carolan, Dominic
+Castillo, Connie
+Collins, Harry
+Cook, Mary
+Crady, Ned
+del Vecchio, Peter
+Doucette, Margaret
+Farrell, Keegan
+Ferguson, Samantha
+Garcia, Nita
+George, Carolyn
+Gray, Barbara
+Greenberg, Mark
+Gresham, Wayne
+Haas, Merrill
+Heard, Marie
+Hendry, Brent
+Jones, Tana
+Keiser, Holly
+Koehler, Anne
+Korkmas, Deb
+Lauterbach, Elizabeth
+Legal Temp 5
+Legal Temp 6
+Legal Temp 7
+Lindeman, Cheryl
+Lovelady, Steven
+Lyons, Dan
+Martinez, Mary Helen
+Mayer, Laurie
+Murray, Julia Heintz
+Nachawati, Majed
+Nelson,  Cheryl
+Panus, Stephanie
+Pinto Leite, Francisco
+Rivera, Coralina
+Robison, Michael
+Rogers, Daniel
+Sayre, Frank
+Shackleton, Sara
+Shanks, Reginald
+Spencer, Becky
+Stoler, Lou
+Taylor, Mark
+Viverito, John
+Young, Randy
+
+
+Many thanks for your help in making this a smooth migration to Outlook.
+
+Nony Flores and Janette Elbertson
+
+***********
+EDRM Enron Email Data Set has been produced in EML, PST and NSF format by ZL Technologies, Inc. This Data Set is licensed under a Creative Commons Attribution 3.0 United States License <http://creativecommons.org/licenses/by/3.0/us/> . To provide attribution, please cite to "ZL Technologies, Inc. (http://www.zlti.com)."
+***********
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testSQLITE3.db
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testSQLITE3.db b/tika-parsers/src/test/resources/test-documents/testSQLITE3.db
new file mode 100644
index 0000000..7c1e0c3
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testSQLITE3.db differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testSVG.svg
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testSVG.svg b/tika-parsers/src/test/resources/test-documents/testSVG.svg
new file mode 100644
index 0000000..f78a87d
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testSVG.svg
@@ -0,0 +1,7 @@
+<?xml version="1.0"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" 
+          "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg width="1cm" height="1cm" version="1.1" xmlns="http://www.w3.org/2000/svg">
+  <desc>Test SVG image</desc>
+  <rect x="0.1cm" y="0.1cm" width="0.8cm" height="0.8cm"/>
+</svg>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testSolaris-x86-32
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testSolaris-x86-32 b/tika-parsers/src/test/resources/test-documents/testSolaris-x86-32
new file mode 100644
index 0000000..8644f92
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testSolaris-x86-32 differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-calc.sdc
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-calc.sdc b/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-calc.sdc
new file mode 100644
index 0000000..6390f50
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-calc.sdc differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-draw.sda
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-draw.sda b/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-draw.sda
new file mode 100644
index 0000000..dc69b4a
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-draw.sda differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-impress.sdd
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-impress.sdd b/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-impress.sdd
new file mode 100644
index 0000000..eed8584
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-impress.sdd differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-writer.sdw
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-writer.sdw b/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-writer.sdw
new file mode 100644
index 0000000..49b0c70
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testStarOffice-5.2-writer.sdw differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testTXT-tika.axx
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testTXT-tika.axx b/tika-parsers/src/test/resources/test-documents/testTXT-tika.axx
new file mode 100644
index 0000000..e65c933
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testTXT-tika.axx differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testTXT.txt
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testTXT.txt b/tika-parsers/src/test/resources/test-documents/testTXT.txt
new file mode 100644
index 0000000..0b5605a
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testTXT.txt
@@ -0,0 +1,2 @@
+Test d'indexation de Txt
+http://www.apache.org

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testTXT.zlib0
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testTXT.zlib0 b/tika-parsers/src/test/resources/test-documents/testTXT.zlib0
new file mode 100644
index 0000000..9ae5da1
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testTXT.zlib0 differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testTXT.zlib5
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testTXT.zlib5 b/tika-parsers/src/test/resources/test-documents/testTXT.zlib5
new file mode 100644
index 0000000..23538e3
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testTXT.zlib5 differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testTXT.zlib9
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testTXT.zlib9 b/tika-parsers/src/test/resources/test-documents/testTXT.zlib9
new file mode 100644
index 0000000..3f56c14
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testTXT.zlib9 differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testTXTNonASCIIUTF8.txt
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testTXTNonASCIIUTF8.txt b/tika-parsers/src/test/resources/test-documents/testTXTNonASCIIUTF8.txt
new file mode 100644
index 0000000..f6aeb6e
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testTXTNonASCIIUTF8.txt
@@ -0,0 +1,7 @@
+The quick brown fox jumps over the lazy dog
+
+Le renard brun rapide saute par-dessus le chien paresseux
+
+Der schnelle braune Fuchs springt über den faulen Hund
+
+براون وكس السريع يقفز فوق الكلب كسالي
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testThunderbirdEml.eml
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testThunderbirdEml.eml b/tika-parsers/src/test/resources/test-documents/testThunderbirdEml.eml
new file mode 100644
index 0000000..a35e7ec
--- /dev/null
+++ b/tika-parsers/src/test/resources/test-documents/testThunderbirdEml.eml
@@ -0,0 +1,32 @@
+x-store-info:J++/JTCzmObr++wNraA4 .....
+Authentication-Results: something.com; sender-id= ......
+X-SID-PRA: vladimir_l@example.com
+X-SID-Result: Pass
+X-DKIM-Result: None
+X-AUTH-Result: PASS
+X-Message-Status: n:n
+X-Message-Delivery: Vj0xLjE7dXM .....
+X-Message-Info: aKlYzGSc+Ll01bU5 ....
+Received: from mailout- ....
+Received: (qmail invoked by alias); 21 Nov 2012 20:11:35 -0000
+Received: from mp017. ....
+X-Authenticated: #2407 ....
+X-Provags-ID: V01U2FsdGVkX ....
+Received: (qmail 22194 invoked by uid 0); 21 Nov 2012 20:11:34 -0000
+Received: from ....
+Content-Type: text/plain; charset="utf-8"
+Date: Wed, 21 Nov 2012 21:11:32 +0100
+From: "Vladimir L." <vl...@example.com>
+Message-ID: <20...@example.com>
+MIME-Version: 1.0
+Subject: JUnit test message
+To: vladimir_l@something.com
+X-Flags: 0001
+X-Mailer: WWW-Mail 6100 (Global Message Exchange)
+X-Priority: 3
+Content-Transfer-Encoding: 8bit
+Return-Path: vladimir_l@example.com
+X-OriginalArrivalTime: 21 Nov 2012 20:11:36.0285 ....
+
+Dear Vladimir .....
+

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testTinyPE.exe
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testTinyPE.exe b/tika-parsers/src/test/resources/test-documents/testTinyPE.exe
new file mode 100644
index 0000000..a45435f
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testTinyPE.exe differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVISIO.vsdm
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVISIO.vsdm b/tika-parsers/src/test/resources/test-documents/testVISIO.vsdm
new file mode 100644
index 0000000..0ac9a8f
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVISIO.vsdm differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVISIO.vsdx
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVISIO.vsdx b/tika-parsers/src/test/resources/test-documents/testVISIO.vsdx
new file mode 100644
index 0000000..1fa6903
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVISIO.vsdx differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVISIO.vssm
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVISIO.vssm b/tika-parsers/src/test/resources/test-documents/testVISIO.vssm
new file mode 100644
index 0000000..8d1fe0f
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVISIO.vssm differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVISIO.vssx
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVISIO.vssx b/tika-parsers/src/test/resources/test-documents/testVISIO.vssx
new file mode 100644
index 0000000..0163463
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVISIO.vssx differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVISIO.vstm
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVISIO.vstm b/tika-parsers/src/test/resources/test-documents/testVISIO.vstm
new file mode 100644
index 0000000..1d13a77
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVISIO.vstm differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVISIO.vstx
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVISIO.vstx b/tika-parsers/src/test/resources/test-documents/testVISIO.vstx
new file mode 100644
index 0000000..38f2164
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVISIO.vstx differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVORBIS.ogg
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVORBIS.ogg b/tika-parsers/src/test/resources/test-documents/testVORBIS.ogg
new file mode 100644
index 0000000..1a02d22
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVORBIS.ogg differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVORCalcTemplate.vor
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVORCalcTemplate.vor b/tika-parsers/src/test/resources/test-documents/testVORCalcTemplate.vor
new file mode 100644
index 0000000..6390f50
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVORCalcTemplate.vor differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVORDrawTemplate.vor
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVORDrawTemplate.vor b/tika-parsers/src/test/resources/test-documents/testVORDrawTemplate.vor
new file mode 100644
index 0000000..50eb255
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVORDrawTemplate.vor differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVORImpressTemplate.vor
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVORImpressTemplate.vor b/tika-parsers/src/test/resources/test-documents/testVORImpressTemplate.vor
new file mode 100644
index 0000000..5d137cb
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVORImpressTemplate.vor differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testVORWriterTemplate.vor
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testVORWriterTemplate.vor b/tika-parsers/src/test/resources/test-documents/testVORWriterTemplate.vor
new file mode 100644
index 0000000..27828eb
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testVORWriterTemplate.vor differ

http://git-wip-us.apache.org/repos/asf/tika/blob/38916f89/tika-parsers/src/test/resources/test-documents/testWAR.war
----------------------------------------------------------------------
diff --git a/tika-parsers/src/test/resources/test-documents/testWAR.war b/tika-parsers/src/test/resources/test-documents/testWAR.war
new file mode 100644
index 0000000..3cdcf5b
Binary files /dev/null and b/tika-parsers/src/test/resources/test-documents/testWAR.war differ