You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by ot...@apache.org on 2002/06/04 17:29:32 UTC
cvs commit: jakarta-lucene/xdocs luceneplan.xml

otis        2002/06/04 08:29:32

  Modified:    xdocs    luceneplan.xml
  Log:
  - Fixed spelling a bit.
  - Nukes trailing blank spaces.
  
  Revision  Changes    Path
  1.4       +68 -68    jakarta-lucene/xdocs/luceneplan.xml
  
  Index: luceneplan.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/xdocs/luceneplan.xml,v
  retrieving revision 1.3
  retrieving revision 1.4
  diff -u -r1.3 -r1.4
  --- luceneplan.xml	24 Feb 2002 16:33:21 -0000	1.3
  +++ luceneplan.xml	4 Jun 2002 15:29:32 -0000	1.4
  @@ -1,5 +1,5 @@
   <?xml version="1.0" encoding="UTF-8"?>
  -     
  +
   <document>
     <properties>
      <title>Plan for enhancements to Lucene</title>
  @@ -8,7 +8,7 @@
      </authors>
     </properties>
     <body>
  -  
  +
           <section name="Purpose">
                   <p>
                           The purpose of this document is to outline plans for
  @@ -21,8 +21,8 @@
                           The best reference is <a href="http://www.htdig.org">
                           htDig</a>, though it is not quite as sophisticated as
                           Lucene, it has a number of features that make it
  -                        desireable.  It however is a traditional c-compiled app
  -                        which makes it somewhat unpleasent to install on some
  +                        desirable.  It however is a traditional c-compiled app
  +                        which makes it somewhat unpleasant to install on some
                           platforms (like Solaris!).
                   </p>
                   <p>
  @@ -30,42 +30,42 @@
                           community for an initial reaction, advice, feedback and
                           consent.  Following this it will be submitted to the
                           Lucene user community for support.  Although, I'm (Andy
  -                        Oliver) capable of providing these enhancements by 
  -                        myself, I'd of course prefer to work on them in concert 
  +                        Oliver) capable of providing these enhancements by
  +                        myself, I'd of course prefer to work on them in concert
                           with others.
                   </p>
                   <p>
  -                        While I'm outlaying a fairly large featureset, these can
  +                        While I'm outlaying a fairly large feature set, these can
                           be implemented incrementally of course (and are probably
                           best if done that way).
                   </p>
           </section>
  -  
  +
           <section name="Goal and Objectives">
                   <p>
                           The goal is to provide features to Lucene that allow it
  -                        to be used as a dropin search engine.  It should provide
  +                        to be used as a drop-in search engine.  It should provide
                           many of the features of projects like <a
                           href="http://www.htdig.org">htDig</a> while surpassing
  -                        them with unique Lucene features and capabillities such as
  +                        them with unique Lucene features and capabilities such as
                           easy installation on and java-supporting platform,
  -                        and support for document fields and field searches.  And 
  +                        and support for document fields and field searches.  And
                           of course, <a href="http://apache.org/LICENSE">
                           a pragmatic software license</a>.
                   </p>
                   <p>
                           To reach this goal we'll implement code to support the
                           following objectives that augment but do not replace
  -                        the current Lucene featureset.  
  +                        the current Lucene feature set.
                   </p>
                   <ul>
                           <li>
  -                                Document Location Independance - meaning mapping
  +                                Document Location Independence - meaning mapping
                                   real contexts to runtime contexts.
                                   Essentially, if the document is at
                                   /var/www/htdocs/mydoc.html, I probably want it
                                   indexed as
  -                                http://www.bigevilmegacorp.com/mydoc.html.                                
  +                                http://www.bigevilmegacorp.com/mydoc.html.
                           </li>
                           <li>
                                   Standard methods of creating central indicies -
  @@ -73,21 +73,21 @@
                                   many environments than is *remote* indexing (for
                                   instance http).  I would suggest that most folks
                                   would prefer that general functionality be
  -                                suppored by Lucene instead of having to write
  +                                supported by Lucene instead of having to write
                                   code for every indexing project.  Obviously, if
                                   what they are doing is *special* they'll have to
  -                                code, but general document indexing accross
  -                                webservers would not qualify.
  +                                code, but general document indexing across
  +                                web servers would not qualify.
                           </li>
                           <li>
  -                                Document interperatation abstraction - currently
  +                                Document interpretation abstraction - currently
                                   one must handle document object construction via
                                   custom code.  A standard interface for plugging
  -                                in format handlers should be supported.  
  +                                in format handlers should be supported.
                           </li>
                           <li>
                                   Mime and file-extension to document
  -                                interperatation mapping.                                  
  +                                interpretation mapping.
                           </li>
                   </ul>
           </section>
  @@ -128,7 +128,7 @@
                                   </li>
                                   <li>
                                           replacement type - the type of
  -                                        replacewith path:  relative, url or
  +                                        replace with path:  relative, URL or
                                           path.
                                   </li>
                                   <li>
  @@ -153,8 +153,8 @@
                                           0 - Long.MAX_VALUE.
                                   </li>
                                   <li>
  -                                        SleeptimeBetweenCalls - can be used to 
  -                                        avoid flooding a machine with too many 
  +                                        SleeptimeBetweenCalls - can be used to
  +                                        avoid flooding a machine with too many
                                           requests
                                   </li>
                                   <li>
  @@ -163,12 +163,12 @@
                                           inactivity.
                                   </li>
                                   <li>
  -                                        IncludeFilter - include only items 
  -                                        matching filter.  (can occur mulitple
  +                                        IncludeFilter - include only items
  +                                        matching filter.  (can occur multiple
                                           times)
                                   </li>
                                   <li>
  -                                        ExcludeFilter - exclude only items 
  +                                        ExcludeFilter - exclude only items
                                           matching filter.  (can occur multiple
                                           times)
                                   </li>
  @@ -196,9 +196,9 @@
                                           (probably from the command line) read
                                           this properties file and get them from
                                           it.  Command line options override
  -                                        the properties file in the case of 
  +                                        the properties file in the case of
                                           duplicates.  There should also be an
  -                                        enivironment variable or VM parameter to
  +                                        environment variable or VM parameter to
                                           set this.
                                   </li>
                           </ul>
  @@ -209,8 +209,8 @@
                           </p>
                           <p>
                                   This should extend the AbstractCrawler and
  -                                support any addtional options required for a
  -                                filesystem index.
  +                                support any additional options required for a
  +                                file system index.
                           </p>
                   <!--</s2>-->
                   <!--<s2 title="HTTPIndexer">-->
  @@ -218,12 +218,12 @@
   			      <b>HTTP Crawler </b>
                           </p>
                           <p>
  -                                Supports the AbstractCrawler options as well as:                                
  +                                Supports the AbstractCrawler options as well as:
                           </p>
                           <ul>
                                   <li>
  -                                        span hosts - Wheter to span hosts or not,
  -                                        by default this should be no.                                        
  +                                        span hosts - Whether to span hosts or not,
  +                                        by default this should be no.
                                   </li>
                                   <li>
                                           restrict domains - (ignored if span
  @@ -237,11 +237,11 @@
                                           recurse and go to
                                           /nextcontext/index.html this option says
                                           to also try /nextcontext to get the dir
  -                                        lsiting)
  +                                        listing)
                                   </li>
                                   <li>
                                           map extensions -
  -                                        (always/default/never/fallback).  Wether
  +                                        (always/default/never/fallback).  Whether
                                           to always use extension mapping, by
                                           default (fallback to mime type), NEVER
                                           or fallback if mime is not available
  @@ -254,12 +254,12 @@
                           </ul>
           <!--        </s2> -->
           </section>
  -        
  +
           <section name="MIMEMap">
                   <p>
                           A configurable registry of document types, their
  -                        description, an identifyer, mime-type and file
  -                        extension.  This should map both MIME -> factory 
  +                        description, an identifier, mime-type and file
  +                        extension.  This should map both MIME -> factory
                           and extension -> factory.
                   </p>
                   <p>
  @@ -287,7 +287,7 @@
                                           <td>"html,htm"</td>
                                           <td></td>
                                           <td>HTMLDocumentFactory</td>
  -                                </tr>                                
  +                                </tr>
                           </table>
           </section>
           <section name="DocumentFactory">
  @@ -300,17 +300,17 @@
           </section>
           <section name="FieldMapping classes">
                   <p>
  -                        A class taht maps standard fields from the
  +                        A class that maps standard fields from the
                           DocumentFactories into *fields* in the Document objects
                           they create.  I suggest that a regular expression system
                           or xpath might be the most universal way to do this.
                           For instance if perhaps I had an XML factory that
                           represented XML elements as fields, I could map content
  -                        from particular fields to ther fields or supress them
  +                        from particular fields to their fields or suppress them
                           entirely.  We could even make this configurable.
                   </p>
                   <p>
  -                
  +
                           for example:
                   </p>
                   <ul>
  @@ -333,48 +333,48 @@
                           title.suppress=false
                           </li>
                   </ul>
  -                <p>                
  -                        In this example we map html documents such that all 
  -                        fields are suppressed but author and title.  We map 
  -                        author and title to anything in the content matching 
  -                        author: (and x characters).  Okay my regular expresions 
  +                <p>
  +                        In this example we map html documents such that all
  +                        fields are suppressed but author and title.  We map
  +                        author and title to anything in the content matching
  +                        author: (and x characters).  Okay my regular expresions
                           suck but hopefully you get the idea.
                   </p>
           </section>
           <section name="Final Thoughts">
                   <p>
  -                        We might also consider eliminating the DocumentFactory 
  -                        entirely by making an AbstractDocument from which the 
  -                        current document object would inherit from.  I 
  -                        experimented with this locally, and it was a relatively 
  -                        minor code change and there was of course no difference 
  -                        in performance.  The Document Factory classes would 
  -                        instead be instances of various subclasses of 
  +                        We might also consider eliminating the DocumentFactory
  +                        entirely by making an AbstractDocument from which the
  +                        current document object would inherit from.  I
  +                        experimented with this locally, and it was a relatively
  +                        minor code change and there was of course no difference
  +                        in performance.  The Document Factory classes would
  +                        instead be instances of various subclasses of
                           AbstractDocument.
                   </p>
                   <p>
  -                        My inspiration for this is HTDig (http://www.htdig.org/).  
  -                        While this goes slightly beyond what HTDig provides by 
  -                        providing field mapping (where HTDIG is just interested 
  -                        in Strings/numbers wherever they are found), it provides 
  -                        at least what I would need to use this as a dropin for 
  -                        most places I contract at (with the obvious exception of 
  -                        a default set of content handlers which would of course 
  +                        My inspiration for this is HTDig (http://www.htdig.org/).
  +                        While this goes slightly beyond what HTDig provides by
  +                        providing field mapping (where HTDIG is just interested
  +                        in Strings/numbers wherever they are found), it provides
  +                        at least what I would need to use this as a drop-in for
  +                        most places I contract at (with the obvious exception of
  +                        a default set of content handlers which would of course
                           develop naturally over time).
                   </p>
                   <p>
  -                        I am able to certainly contribute to this effort if the 
  -                        development community is open to it.  I'd suggest we do 
  -                        it iteratively in stages and not aim for all of this at 
  +                        I am able to certainly contribute to this effort if the
  +                        development community is open to it.  I'd suggest we do
  +                        it iteratively in stages and not aim for all of this at
                           once (for instance leave out the field mapping at first).
                   </p>
                   <p>
  -                
  -                        Anyhow, please give me some feedback, counter 
  -                        suggestions, let me know if I'm way off base or out of 
  +
  +                        Anyhow, please give me some feedback, counter
  +                        suggestions, let me know if I'm way off base or out of
                           line, etc. -Andy
                   </p>
           </section>
  -                
  +
     </body>
   </document>
  
  
  

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>