You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by ot...@apache.org on 2002/09/17 06:21:06 UTC

cvs commit: jakarta-lucene/xdocs/lucene-sandbox/indyo tutorial.xml

otis        2002/09/16 21:21:06

  Modified:    docs/lucene-sandbox index.html
               xdocs/lucene-sandbox index.xml
  Added:       docs/lucene-sandbox/indyo tutorial.html
               xdocs/lucene-sandbox/indyo tutorial.xml
  Log:
  - Indyo docs.
  Submitted by:	Kelvin Tan
  
  Revision  Changes    Path
  1.2       +18 -0     jakarta-lucene/docs/lucene-sandbox/index.html
  
  Index: index.html
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/docs/lucene-sandbox/index.html,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- index.html	14 Jul 2002 19:05:12 -0000	1.1
  +++ index.html	17 Sep 2002 04:21:06 -0000	1.2
  @@ -112,6 +112,24 @@
   You can access Lucene Sandbox CVS repository at
   <A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>.
   </P>
  +                                                    <table border="0" cellspacing="0" cellpadding="2" width="100%">
  +      <tr><td bgcolor="#828DA6">
  +        <font color="#ffffff" face="arial,helvetica,sanserif">
  +          <a name="Indyo"><strong>Indyo</strong></a>
  +        </font>
  +      </td></tr>
  +      <tr><td>
  +        <blockquote>
  +                                    <p>
  +Indyo is a datasource-independent Lucene indexing framework.
  +</p>
  +                                                <p>
  +A tutorial for using Indyo can be found <a href="indyo/tutorial.html">here</a>.
  +</p>
  +                            </blockquote>
  +      </td></tr>
  +      <tr><td><br/></td></tr>
  +    </table>
                               </blockquote>
           </p>
         </td></tr>
  
  
  
  1.1                  jakarta-lucene/docs/lucene-sandbox/indyo/tutorial.html
  
  Index: tutorial.html
  ===================================================================
  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  
  <!-- Content Stylesheet for Site -->
  
          
  <!-- start the processing -->
      <!-- ====================================================================== -->
      <!-- Main Page Section -->
      <!-- ====================================================================== -->
      <html>
          <head>
              <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
  
                                                      <meta name="author" value="Kelvin Tan">
              <meta name="email" value="kelvint@apache.org">
              
             
                                      
              <title>Jakarta Lucene - Indyo Tutorial</title>
          </head>
  
          <body bgcolor="#ffffff" text="#000000" link="#525D76">        
              <table border="0" width="100%" cellspacing="0">
                  <!-- TOP IMAGE -->
                  <tr>
                      <td align="left">
  <a href="http://jakarta.apache.org"><img src="http://jakarta.apache.org/images/jakarta-logo.gif" border="0"/></a>
  </td>
  <td align="right">
  <a href="http://jakarta.apache.org/lucene/"><img src="../../images/lucene_green_300.gif" alt="Jakarta Lucene" border="0"/></a>
  </td>
                  </tr>
              </table>
              <table border="0" width="100%" cellspacing="4">
                  <tr><td colspan="2">
                      <hr noshade="" size="1"/>
                  </td></tr>
                  
                  <tr>
                      <!-- LEFT SIDE NAVIGATION -->
                      <td width="20%" valign="top" nowrap="true">
                                  <p><strong>About</strong></p>
          <ul>
                      <li>    <a href="../../index.html">Overview</a>
  </li>
                      <li>    <a href="../../powered.html">Powered by Lucene</a>
  </li>
                      <li>    <a href="../../whoweare.html">Who We Are</a>
  </li>
                      <li>    <a href="http://jakarta.apache.org/site/mail.html">Mailing Lists</a>
  </li>
                  </ul>
              <p><strong>Resources</strong></p>
          <ul>
                      <li>    <a href="http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi">FAQ (Official)</a>
  </li>
                      <li>    <a href="http://www.jguru.com/faq/Lucene">JGuru FAQ</a>
  </li>
                      <li>    <a href="../../gettingstarted.html">Getting Started</a>
  </li>
                      <li>    <a href="http://jakarta.apache.org/site/bugs.html">Bugs</a>
  </li>
                      <li>    <a href="http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=%27Importance%27">Lucene Bugs</a>
  </li>
                      <li>    <a href="../../queryparsersyntax.html">Query Syntax</a>
  </li>
                      <li>    <a href="../../api/index.html">Javadoc</a>
  </li>
                      <li>    <a href="../../contributions.html">Contributions</a>
  </li>
                      <li>    <a href="../../lucenesandbox.html">Lucene Sandbox</a>
  </li>
                      <li>    <a href="../../resources.html">Articles, etc.</a>
  </li>
                  </ul>
              <p><strong>Plans</strong></p>
          <ul>
                      <li>    <a href="../../luceneplan.html">Application Extensions</a>
  </li>
                  </ul>
              <p><strong>Download</strong></p>
          <ul>
                      <li>    <a href="http://jakarta.apache.org/site/binindex.html">Binaries</a>
  </li>
                      <li>    <a href="http://jakarta.apache.org/site/sourceindex.html">Source Code</a>
  </li>
                      <li>    <a href="http://jakarta.apache.org/site/cvsindex.html">CVS Repositories</a>
  </li>
                  </ul>
              <p><strong>Jakarta</strong></p>
          <ul>
                      <li>    <a href="http://jakarta.apache.org/site/getinvolved.html">Get Involved</a>
  </li>
                      <li>    <a href="http://jakarta.apache.org/site/acknowledgements.html">Acknowledgements</a>
  </li>
                      <li>    <a href="http://jakarta.apache.org/site/contact.html">Contact</a>
  </li>
                      <li>    <a href="http://jakarta.apache.org/site/legal.html">Legal</a>
  </li>
                  </ul>
                          </td>
                      <td width="80%" align="left" valign="top">
                                                                      <table border="0" cellspacing="0" cellpadding="2" width="100%">
        <tr><td bgcolor="#525D76">
          <font color="#ffffff" face="arial,helvetica,sanserif">
            <a name="About this Tutorial"><strong>About this Tutorial</strong></a>
          </font>
        </td></tr>
        <tr><td>
          <blockquote>
                                      <p>
    This tutorial is intended to give first-time users an
    introduction to using Indyo, a datasource-independent 
    Lucene indexing framework.
  </p>
                                                  <p>
    This will include how to obtain Indyo, configuring Indyo
    and indexing a directory on a filesystem.
  </p>
                              </blockquote>
          </p>
        </td></tr>
        <tr><td><br/></td></tr>
      </table>
                                                  <table border="0" cellspacing="0" cellpadding="2" width="100%">
        <tr><td bgcolor="#525D76">
          <font color="#ffffff" face="arial,helvetica,sanserif">
            <a name="Step 1: Obtaining Indyo"><strong>Step 1: Obtaining Indyo</strong></a>
          </font>
        </td></tr>
        <tr><td>
          <blockquote>
                                      <p>
    First, you need to obtain Indyo.  As
    of this writing, Indyo is only available via CVS, from the 
    "jakarta-lucene-sandbox" repository. See 
    <a href="http://jakarta.apache.org/cvsindex.html">Jakarta CVS</a> 
    on accessing files via CVS.</p>
                              </blockquote>
          </p>
        </td></tr>
        <tr><td><br/></td></tr>
      </table>
                                                  <table border="0" cellspacing="0" cellpadding="2" width="100%">
        <tr><td bgcolor="#525D76">
          <font color="#ffffff" face="arial,helvetica,sanserif">
            <a name="Step 2: Building Indyo"><strong>Step 2: Building Indyo</strong></a>
          </font>
        </td></tr>
        <tr><td>
          <blockquote>
                                      <p>
    Get a copy of <a href="http://jakarta.apache.org/ant">Ant</a> if 
    you don't already have it installed. Then simply type "ant" in the 
    directory where the local copy of the Indyo sources reside.
  </p>
                                                  <p>
    Voila! You should now have a jar file "indyo-&lt;version number&gt;.jar".
  </p>
                              </blockquote>
          </p>
        </td></tr>
        <tr><td><br/></td></tr>
      </table>
                                                  <table border="0" cellspacing="0" cellpadding="2" width="100%">
        <tr><td bgcolor="#525D76">
          <font color="#ffffff" face="arial,helvetica,sanserif">
            <a name="Step 3: Configuring Indyo"><strong>Step 3: Configuring Indyo</strong></a>
          </font>
        </td></tr>
        <tr><td>
          <blockquote>
                                      <p>
    The "src/conf" folder contains a default configuration file which is 
    sufficient for normal use. 
  </p>
                              </blockquote>
          </p>
        </td></tr>
        <tr><td><br/></td></tr>
      </table>
                                                  <table border="0" cellspacing="0" cellpadding="2" width="100%">
        <tr><td bgcolor="#525D76">
          <font color="#ffffff" face="arial,helvetica,sanserif">
            <a name="Step 4: Using Indyo"><strong>Step 4: Using Indyo</strong></a>
          </font>
        </td></tr>
        <tr><td>
          <blockquote>
                                      <p>
    Congratulations, you have finally reached the fun the
    part of this tutorial.  This is where you'll discover
    the power of Indyo.  
  </p>
                                                  <p>
    To index a datasource, first instantiate the respective 
    datasource, then hand it to IndyoIndexer for indexing. 
    For example:
  </p>
                                                      <div align="left">
      <table cellspacing="4" cellpadding="0" border="0">
      <tr>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      <tr>
        <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#ffffff"><pre>
  IndexDataSource ds = new FSDataSource(&quot;/usr/local/lucene/docs&quot;);
  IndyoIndexer indexer = new IndyoIndexer(&quot;/usr/local/index&quot;, 
                      &quot;/usr/local/indyo/default.config.xml&quot;);
  indexer.index(ds);                    
  </pre></td>
        <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      <tr>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      </table>
      </div>
                                                  <p>
    FSDataSource is a simple datasource which indexes both files 
    and directories. The metadata FSDataSource adds to each document is: 
    filePath, fileName, fileSize, fileFormat, fileContents, 
    fileLastModifiedDate. Based on the file extension of the files indexed, 
    Indyo will use file content-handlers according to the mappings found in the 
    configuration file. If you're not happy with this list of file 
    metadata, feel free to subclass FSDataSource, or, as we're about 
    to cover next, write your own custom IndexDataSource.
  </p>
                                                  <p>
    Get familiar with FSDataSource. You'll find it very handy, both for indexing 
    files directly, as well as nesting it within another datasource. For example, 
    you might need to index a database table, in which one of the rows represent 
    the location of a file, and you may want to use FSDataSource to index this 
    file as well.
  </p>
                                                      <table border="0" cellspacing="0" cellpadding="2" width="100%">
        <tr><td bgcolor="#828DA6">
          <font color="#ffffff" face="arial,helvetica,sanserif">
            <a name="Writing your custom IndexDataSource"><strong>Writing your custom IndexDataSource</strong></a>
          </font>
        </td></tr>
        <tr><td>
          <blockquote>
                                      <p>
    To write a custom IndexDataSource, you need to write a class 
    which implements IndexDataSource, and provides an implementation 
    for the getData() method which returns a Map[]. The javadoc of the 
    getData() method reads:
  </p>
                                                      <div align="left">
      <table cellspacing="4" cellpadding="0" border="0">
      <tr>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      <tr>
        <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#ffffff"><pre>
  /**
   * Retrieve a array of Maps. Each map represents the
   * a document to be indexed. The key:value pair of the map
   * is the metadata of the document.
   */
  </pre></td>
        <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      <tr>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      </table>
      </div>
                                                  <p>
    So, the getData() method provides a way for Indyo to retrieve document 
    metadata from each IndexDataSource. A simple example of a custom 
    IndexDataSource, HashMapDataSource is provided below.
  </p>
                                                      <div align="left">
      <table cellspacing="4" cellpadding="0" border="0">
      <tr>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      <tr>
        <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#ffffff"><pre>
  public class HashMapDataSource implements IndexDataSource
  {
      private Map data;
  
      public HashMapDataSource(Map data)
      {
          this.data = data;
      }
  
      public Map[] getData() throws Exception
      {
          return new Map[1]{data};
      }
  }
  </pre></td>
        <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      <tr>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      </table>
      </div>
                                                  <p>
    As you can see, HashMapDataSource doesn't do anything very useful. It 
    always results in one Document being indexed, and the document's fields 
    depend on the contents of the map that HashMapDataSource was initialized 
    with.
  </p>
                                                  <p>
    A slightly more useful IndexDataSource, SingleDocumentFSDataSource 
    provides an example of how to nest datasources. Given a directory, 
    SingleDocumentFSDataSource recursively indexes all directories 
    and files within that directory <i>as the same Document</i>. In other 
    words, only one Document is created in the index. This is accomplished 
    by the use of a nested datasource. The code for 
    SingleDocumentFSDataSource is listed below:
  </p>
                                                      <div align="left">
      <table cellspacing="4" cellpadding="0" border="0">
      <tr>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      <tr>
        <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#ffffff"><pre> 
  public class SingleDocumentFSDataSource
          implements IndexDataSource
  {
      private File file;
  
      public SingleDocumentFSDataSource(File file)
      {
          this.file = file;
      }
  
      public Map[] getData() throws Exception
      {
          Map data = new HashMap(1);
          data.put(NESTED_DATASOURCE, new FSDataSource(file));
          return new Map[1]{data};
      }
  }
  </pre></td>
        <td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      <tr>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
        <td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
      </tr>
      </table>
      </div>
                                                  <p>
    Nested datasources don't result in a separate Document being created. 
    Use them when working with complex datasources, i.e., datasources 
    which are an aggregation of multiple datasources. The current way to 
    add a nested datasource is using the key "NESTED_DATASOURCE". Indyo 
    accepts an IndexDataSource object, a List of IndexDataSources, 
    or an IndexDataSource[] for this key.
  </p>
                              </blockquote>
        </td></tr>
        <tr><td><br/></td></tr>
      </table>
                              </blockquote>
          </p>
        </td></tr>
        <tr><td><br/></td></tr>
      </table>
                                                  <table border="0" cellspacing="0" cellpadding="2" width="100%">
        <tr><td bgcolor="#525D76">
          <font color="#ffffff" face="arial,helvetica,sanserif">
            <a name="Where to Go From Here"><strong>Where to Go From Here</strong></a>
          </font>
        </td></tr>
        <tr><td>
          <blockquote>
                                      <p>
    Congratulations!  You have completed the Indyo
    tutorial.  Although this has only been an introduction
    to Torque, it should be sufficient to get you started
    with Indyo in your applications.  For those of you
    seeking additional information, there are several other
    documents on this site that can provide details on
    various subjects.  Lastly, the source code is an
    invaluable resource when all else fails to provide
    answers!
  </p>
                              </blockquote>
          </p>
        </td></tr>
        <tr><td><br/></td></tr>
      </table>
                                                  <table border="0" cellspacing="0" cellpadding="2" width="100%">
        <tr><td bgcolor="#525D76">
          <font color="#ffffff" face="arial,helvetica,sanserif">
            <a name="Acknowledgements"><strong>Acknowledgements</strong></a>
          </font>
        </td></tr>
        <tr><td>
          <blockquote>
                                      <p>
    This document was shamelessly ripped from the extremely well-written 
    and well-organized 
    <a href="http://jakarta.apache.org/turbine/torque/tutorial.html">Torque
    </a> tutorial. Thanks Pete!
  </p>
                              </blockquote>
          </p>
        </td></tr>
        <tr><td><br/></td></tr>
      </table>
                                          </td>
                  </tr>
  
                  <!-- FOOTER -->
                  <tr><td colspan="2">
                      <hr noshade="" size="1"/>
                  </td></tr>
                  <tr><td colspan="2">
                      <div align="center"><font color="#525D76" size="-1"><em>
                      Copyright &#169; 1999-2002, Apache Software Foundation
                      </em></font></div>
                  </td></tr>
              </table>
          </body>
      </html>
  <!-- end the processing -->
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  1.2       +33 -23    jakarta-lucene/xdocs/lucene-sandbox/index.xml
  
  Index: index.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/xdocs/lucene-sandbox/index.xml,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- index.xml	14 Jul 2002 19:00:11 -0000	1.1
  +++ index.xml	17 Sep 2002 04:21:06 -0000	1.2
  @@ -1,23 +1,33 @@
  -<?xml version="1.0"?>
  -<document>
  -<properties>
  -<author>Otis Gospodentic</author>
  -<title>Lucene Sandbox</title>
  -</properties>
  -<body>
  -
  -<section name="Lucene Sandbox">
  -Lucene project also contains a workspace, Lucene Sandbox, that is open to all Lucene committers, as well
  -as a few other developers.  The purpose of the Sandbox is to host various third party contributions,
  -and to serve as a place to try out new ideas and prepare them for inclusion into the core Lucene
  -distribution.
  -Users are free to experiment with the components developed in the Sandbox, but Sandbox components will
  -not necessarily be maintained, particularly in their current state.
  -<P>
  -You can access Lucene Sandbox CVS repository at
  -<A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>.
  -</P>
  -</section>
  -
  -</body>
  -</document>
  +<?xml version="1.0"?>
  +<document>
  +<properties>
  +<author>Otis Gospodentic</author>
  +<title>Lucene Sandbox</title>
  +</properties>
  +<body>
  +
  +<section name="Lucene Sandbox">
  +Lucene project also contains a workspace, Lucene Sandbox, that is open to all Lucene committers, as well
  +as a few other developers.  The purpose of the Sandbox is to host various third party contributions,
  +and to serve as a place to try out new ideas and prepare them for inclusion into the core Lucene
  +distribution.
  +Users are free to experiment with the components developed in the Sandbox, but Sandbox components will
  +not necessarily be maintained, particularly in their current state.
  +<P>
  +You can access Lucene Sandbox CVS repository at
  +<A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>.
  +</P>
  +
  +<subsection name="Indyo">
  +<p>
  +Indyo is a datasource-independent Lucene indexing framework.
  +</p>
  +<p>
  +A tutorial for using Indyo can be found <a href="indyo/tutorial.html">here</a>.
  +</p>
  +</subsection>
  +
  +</section>
  +
  +</body>
  +</document>
  
  
  
  1.1                  jakarta-lucene/xdocs/lucene-sandbox/indyo/tutorial.xml
  
  Index: tutorial.xml
  ===================================================================
  <?xml version="1.0"?>
  
  <document>
  
    <properties>
      <title>Indyo Tutorial</title>
      <author email="kelvint@apache.org">Kelvin Tan</author>
    </properties>
  
    <body>
  
  <section name="About this Tutorial">
  
  <p>
    This tutorial is intended to give first-time users an
    introduction to using Indyo, a datasource-independent 
    Lucene indexing framework.
  </p>
  
  <p>
    This will include how to obtain Indyo, configuring Indyo
    and indexing a directory on a filesystem.
  </p>
  
  </section>
  
  <section name="Step 1: Obtaining Indyo">
  
  <p>
    First, you need to obtain Indyo.  As
    of this writing, Indyo is only available via CVS, from the 
    "jakarta-lucene-sandbox" repository. See 
    <a href="http://jakarta.apache.org/cvsindex.html">Jakarta CVS</a> 
    on accessing files via CVS.</p>
  
  
  </section>
  
  <section name="Step 2: Building Indyo">
  
  <p>
    Get a copy of <a href="http://jakarta.apache.org/ant">Ant</a> if 
    you don't already have it installed. Then simply type "ant" in the 
    directory where the local copy of the Indyo sources reside.
  </p>
  
  <p>
    Voila! You should now have a jar file "indyo-&lt;version number&gt;.jar".
  </p>
  
  </section>
  
  <section name="Step 3: Configuring Indyo">
  
  <p>
    The "src/conf" folder contains a default configuration file which is 
    sufficient for normal use. 
  </p>
  
  </section>
  
  <section name="Step 4: Using Indyo">
  
  <p>
    Congratulations, you have finally reached the fun the
    part of this tutorial.  This is where you'll discover
    the power of Indyo.  
  </p>
  
  <p>
    To index a datasource, first instantiate the respective 
    datasource, then hand it to IndyoIndexer for indexing. 
    For example:
  </p>
  
  <source><![CDATA[
  IndexDataSource ds = new FSDataSource("/usr/local/lucene/docs");
  IndyoIndexer indexer = new IndyoIndexer("/usr/local/index", 
                      "/usr/local/indyo/default.config.xml");
  indexer.index(ds);                    
  ]]></source>
  
  <p>
    FSDataSource is a simple datasource which indexes both files 
    and directories. The metadata FSDataSource adds to each document is: 
    filePath, fileName, fileSize, fileFormat, fileContents, 
    fileLastModifiedDate. Based on the file extension of the files indexed, 
    Indyo will use file content-handlers according to the mappings found in the 
    configuration file. If you're not happy with this list of file 
    metadata, feel free to subclass FSDataSource, or, as we're about 
    to cover next, write your own custom IndexDataSource.
  </p>
  
  <p>
    Get familiar with FSDataSource. You'll find it very handy, both for indexing 
    files directly, as well as nesting it within another datasource. For example, 
    you might need to index a database table, in which one of the rows represent 
    the location of a file, and you may want to use FSDataSource to index this 
    file as well.
  </p>
  
  <subsection name="Writing your custom IndexDataSource">
  
  <p>
    To write a custom IndexDataSource, you need to write a class 
    which implements IndexDataSource, and provides an implementation 
    for the getData() method which returns a Map[]. The javadoc of the 
    getData() method reads:
  </p>
  
  <source><![CDATA[
  /**
   * Retrieve a array of Maps. Each map represents the
   * a document to be indexed. The key:value pair of the map
   * is the metadata of the document.
   */
  ]]></source>
  
  <p>
    So, the getData() method provides a way for Indyo to retrieve document 
    metadata from each IndexDataSource. A simple example of a custom 
    IndexDataSource, HashMapDataSource is provided below.
  </p>
  
  <source><![CDATA[
  public class HashMapDataSource implements IndexDataSource
  {
      private Map data;
  
      public HashMapDataSource(Map data)
      {
          this.data = data;
      }
  
      public Map[] getData() throws Exception
      {
          return new Map[1]{data};
      }
  }
  ]]></source>
  
  <p>
    As you can see, HashMapDataSource doesn't do anything very useful. It 
    always results in one Document being indexed, and the document's fields 
    depend on the contents of the map that HashMapDataSource was initialized 
    with.
  </p>
  
  <p>
    A slightly more useful IndexDataSource, SingleDocumentFSDataSource 
    provides an example of how to nest datasources. Given a directory, 
    SingleDocumentFSDataSource recursively indexes all directories 
    and files within that directory <i>as the same Document</i>. In other 
    words, only one Document is created in the index. This is accomplished 
    by the use of a nested datasource. The code for 
    SingleDocumentFSDataSource is listed below:
  </p>
  
  <source><![CDATA[ 
  public class SingleDocumentFSDataSource
          implements IndexDataSource
  {
      private File file;
  
      public SingleDocumentFSDataSource(File file)
      {
          this.file = file;
      }
  
      public Map[] getData() throws Exception
      {
          Map data = new HashMap(1);
          data.put(NESTED_DATASOURCE, new FSDataSource(file));
          return new Map[1]{data};
      }
  }
  ]]></source>
  
  <p>
    Nested datasources don't result in a separate Document being created. 
    Use them when working with complex datasources, i.e., datasources 
    which are an aggregation of multiple datasources. The current way to 
    add a nested datasource is using the key "NESTED_DATASOURCE". Indyo 
    accepts an IndexDataSource object, a List of IndexDataSources, 
    or an IndexDataSource[] for this key.
  </p>
  
  </subsection>
  
  </section>
  
  <section name="Where to Go From Here">
  
  <p>
    Congratulations!  You have completed the Indyo
    tutorial.  Although this has only been an introduction
    to Torque, it should be sufficient to get you started
    with Indyo in your applications.  For those of you
    seeking additional information, there are several other
    documents on this site that can provide details on
    various subjects.  Lastly, the source code is an
    invaluable resource when all else fails to provide
    answers!
  </p>
  
  </section>
  
  <section name="Acknowledgements">
  
  <p>
    This document was shamelessly ripped from the extremely well-written 
    and well-organized 
    <a href="http://jakarta.apache.org/turbine/torque/tutorial.html">Torque
    </a> tutorial. Thanks Pete!
  </p>
  
  </section>
  
    </body>
  </document>
  
  
  
  

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>