You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ni...@apache.org on 2011/03/30 14:05:04 UTC

svn commit: r1086919 [2/2] - in /tika/site/publish: 0.8/formats.html 0.8/parser.html 0.8/parser_guide.html 0.9/formats.html 0.9/parser.html 0.9/parser_guide.html

Modified: tika/site/publish/0.9/parser_guide.html
URL: http://svn.apache.org/viewvc/tika/site/publish/0.9/parser_guide.html?rev=1086919&r1=1086918&r2=1086919&view=diff
==============================================================================
--- tika/site/publish/0.9/parser_guide.html (original)
+++ tika/site/publish/0.9/parser_guide.html Wed Mar 30 12:05:03 2011
@@ -84,7 +84,7 @@
                 width="387" height="100"/></a>
       </div>
       <div id="content">
-        <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>Get Tika parsing up and running in 5 minutes<a name="Get_Tika_parsing_up_and_running_in_5_min
 utes"></a></h2><p>This page is a quick start guide showing how to add a new parser to Apache Tika. Following the simple steps listed below your new parser can be running in only 5 minutes.</p><ul><li><a href="#Get_Tika_parsing_up_and_running_in_5_minutes">Get Tika parsing up and running in 5 minutes</a><ul><li><a href="#Getting_Started">Getting Started</a></li><li><a href="#Add_your_MIME-Type">Add your MIME-Type</a></li><li><a href="#Create_your_Parser_class">Create your Parser class</a></li><li><a href="#List_the_new_parser">List the new parser</a></li></ul></li></ul><div class="section"><h3><a name="Getting_Started">Getting Started</a><a name="Getting_Started"></a></h3><p>The <a href="#gettingstarted.html">Getting Started</a> document describes how to build Apache Tika from sources and how to start using Tika in an application. Pay close attention and follow the instructions in the &quot;Getting and building the sources&quot; section.</p></div><div class="section"><h3><a n
 ame="Add_your_MIME-Type">Add your MIME-Type</a><a name="Add_your_MIME-Type"></a></h3><p>You first need to modify <a class="externalLink" href="http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml">tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml</a> in order to Tika can map the file extension with its MIME-Type. You should add something like this:</p><div><pre> &lt;mime-type type=&quot;application/hello&quot;&gt;
+        <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>Get Tika parsing up and running in 5 minutes<a name="Get_Tika_parsing_up_and_running_in_5_min
 utes"></a></h2><p>This page is a quick start guide showing how to add a new parser to Apache Tika. Following the simple steps listed below your new parser can be running in only 5 minutes.</p><ul><li><a href="#Get_Tika_parsing_up_and_running_in_5_minutes">Get Tika parsing up and running in 5 minutes</a><ul><li><a href="#Getting_Started">Getting Started</a></li><li><a href="#Add_your_MIME-Type">Add your MIME-Type</a></li><li><a href="#Create_your_Parser_class">Create your Parser class</a></li><li><a href="#List_the_new_parser">List the new parser</a></li></ul></li></ul><div class="section"><h3><a name="Getting_Started">Getting Started</a><a name="Getting_Started"></a></h3><p>The <a href="./gettingstarted.html">Getting Started</a> document describes how to build Apache Tika from sources and how to start using Tika in an application. Pay close attention and follow the instructions in the &quot;Getting and building the sources&quot; section.</p></div><div class="section"><h3><a 
 name="Add_your_MIME-Type">Add your MIME-Type</a><a name="Add_your_MIME-Type"></a></h3><p>You first need to modify <a class="externalLink" href="http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml">tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml</a> in order to Tika can map the file extension with its MIME-Type. You should add something like this:</p><div><pre> &lt;mime-type type=&quot;application/hello&quot;&gt;
         &lt;glob pattern=&quot;*.hi&quot;/&gt;
  &lt;/mime-type&gt;</pre></div></div><div class="section"><h3><a name="Create_your_Parser_class">Create your Parser class</a><a name="Create_your_Parser_class"></a></h3><p>Now, you need to create your new parser. This is a class that must implement the Parser interface offered by Tika. A very simple Tika Parser looks like this:</p><div><pre>/*
  * Licensed to the Apache Software Foundation (ASF) under one or more