You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucy.apache.org by bu...@apache.org on 2016/09/28 12:07:52 UTC

svn commit: r998475 [6/26] - in /websites/staging/lucy/trunk/content: ./ docs/ docs/0.5.0/ docs/0.5.0/c/ docs/0.5.0/c/Clownfish/ docs/0.5.0/c/Clownfish/Docs/ docs/0.5.0/c/Lucy/ docs/0.5.0/c/Lucy/Analysis/ docs/0.5.0/c/Lucy/Docs/ docs/0.5.0/c/Lucy/Docs/...

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,260 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::FileFormat</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/">Apache</a>&nbsp;&raquo&nbsp;<a href="/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+              <li><a href="http://dezi.org/">Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+              <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Overview of index file format</h2>
+<p>It is not necessary to understand the current implementation details of the
+index file format in order to use Apache Lucy effectively, but it may be
+helpful if you are interested in tweaking for high performance, exotic usage,
+or debugging and development.</p>
+<p>On a file system, an index is a directory.  The files inside have a
+hierarchical relationship: an index is made up of “segments”, each of which is
+an independent inverted index with its own subdirectory; each segment is made
+up of several component parts.</p>
+<pre><code>[index]--|
+         |--snapshot_XXX.json
+         |--schema_XXX.json
+         |--write.lock
+         |
+         |--seg_1--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--seg_2--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--[...]--| 
+</code></pre>
+<h3>Write-once philosophy</h3>
+<p>All segment directory names consist of the string “seg_” followed by a number
+in base 36: seg_1, seg_5m, seg_p9s2 and so on, with higher numbers indicating
+more recent segments.  Once a segment is finished and committed, its name is
+never re-used and its files are never modified.</p>
+<p>Old segments become obsolete and can be removed when their data has been
+consolidated into new segments during the process of segment merging and
+optimization.  A fully-optimized index has only one segment.</p>
+<h3>Top-level entries</h3>
+<p>There are a handful of “top-level” files and directories which belong to the
+entire index rather than to a particular segment.</p>
+<h4>snapshot_XXX.json</h4>
+<p>A “snapshot” file, e.g. <code>snapshot_m7p.json</code>, is list of index files and
+directories.  Because index files, once written, are never modified, the list
+of entries in a snapshot defines a point-in-time view of the data in an index.</p>
+<p>Like segment directories, snapshot files also utilize the
+unique-base-36-number naming convention; the higher the number, the more
+recent the file.  The appearance of a new snapshot file within the index
+directory constitutes an index update.  While a new segment is being written
+new files may be added to the index directory, but until a new snapshot file
+gets written, a Searcher opening the index for reading won’t know about them.</p>
+<h4>schema_XXX.json</h4>
+<p>The schema file is a Schema object describing the index’s format, serialized
+as JSON.  It, too, is versioned, and a given snapshot file will reference one
+and only one schema file.</p>
+<h4>locks</h4>
+<p>By default, only one indexing process may safely modify the index at any given
+time.  Processes reserve an index by laying claim to the <code>write.lock</code> file
+within the <code>locks/</code> directory.  A smattering of other lock files may be used
+from time to time, as well.</p>
+<h3>A segment’s component parts</h3>
+<p>By default, each segment has up to five logical components: lexicon, postings,
+document storage, highlight data, and deletions.  Binary data from these
+components gets stored in virtual files within the “cf.dat” compound file;
+metadata is stored in a shared “segmeta.json” file.</p>
+<h4>segmeta.json</h4>
+<p>The segmeta.json file is a central repository for segment metadata.  In
+addition to information such as document counts and field numbers, it also
+warehouses arbitrary metadata on behalf of individual index components.</p>
+<h4>Lexicon</h4>
+<p>Each indexed field gets its own lexicon in each segment.  The exact files
+involved depend on the field’s type, but generally speaking there will be two
+parts.  First, there’s a primary <code>lexicon-XXX.dat</code> file which houses a
+complete term list associating terms with corpus frequency statistics,
+postings file locations, etc.  Second, one or more “lexicon index” files may
+be present which contain periodic samples from the primary lexicon file to
+facilitate fast lookups.</p>
+<h4>Postings</h4>
+<p>“Posting” is a technical term from the field of
+<a href="../../Lucy/Docs/IRTheory.html">information retrieval</a>, defined as a single
+instance of a one term indexing one document.  If you are looking at the index
+in the back of a book, and you see that “freedom” is referenced on pages 8,
+86, and 240, that would be three postings, which taken together form a
+“posting list”.  The same terminology applies to an index in electronic form.</p>
+<p>Each segment has one postings file per indexed field.  When a search is
+performed for a single term, first that term is looked up in the lexicon.  If
+the term exists in the segment, the record in the lexicon will contain
+information about which postings file to look at and where to look.</p>
+<p>The first thing any posting record tells you is a document id.  By iterating
+over all the postings associated with a term, you can find all the documents
+that match that term, a process which is analogous to looking up page numbers
+in a book’s index.  However, each posting record typically contains other
+information in addition to document id, e.g. the positions at which the term
+occurs within the field.</p>
+<h4>Documents</h4>
+<p>The document storage section is a simple database, organized into two files:</p>
+<ul>
+<li>
+<p><strong>documents.dat</strong> - Serialized documents.</p>
+</li>
+<li>
+<p><strong>documents.ix</strong> - Document storage index, a solid array of 64-bit integers
+where each integer location corresponds to a document id, and the value at
+that location points at a file position in the documents.dat file.</p>
+</li>
+</ul>
+<h4>Highlight data</h4>
+<p>The files which store data used for excerpting and highlighting are organized
+similarly to the files used to store documents.</p>
+<ul>
+<li>
+<p><strong>highlight.dat</strong> - Chunks of serialized highlight data, one per doc id.</p>
+</li>
+<li>
+<p><strong>highlight.ix</strong> - Highlight data index – as with the <code>documents.ix</code> file, a
+solid array of 64-bit file pointers.</p>
+</li>
+</ul>
+<h4>Deletions</h4>
+<p>When a document is “deleted” from a segment, it is not actually purged right
+away; it is merely marked as “deleted” via a deletions file.  Deletions files
+contains bit vectors with one bit for each document in the segment; if bit
+#254 is set then document 254 is deleted, and if that document turns up in a
+search it will be masked out.</p>
+<p>It is only when a segment’s contents are rewritten to a new segment during the
+segment-merging process that deleted documents truly go away.</p>
+<h3>Compound Files</h3>
+<p>If you peer inside an index directory, you won’t actually find any files named
+“documents.dat”, “highlight.ix”, etc. unless there is an indexing process
+underway.  What you will find instead is one “cf.dat” and one “cfmeta.json”
+file per segment.</p>
+<p>To minimize the need for file descriptors at search-time, all per-segment
+binary data files are concatenated together in “cf.dat” at the close of each
+indexing session.  Information about where each file begins and ends is stored
+in <code>cfmeta.json</code>.  When the segment is opened for reading, a single file
+descriptor per “cf.dat” file can be shared among several readers.</p>
+<h3>A Typical Search</h3>
+<p>Here’s a simplified narrative, dramatizing how a search for “freedom” against
+a given segment plays out:</p>
+<ol>
+<li>
+<p>The searcher asks the relevant Lexicon Index, “Do you know anything about
+‘freedom’?”  Lexicon Index replies, “Can’t say for sure, but if the main
+Lexicon file does, ‘freedom’ is probably somewhere around byte 21008”.</p>
+</li>
+<li>
+<p>The main Lexicon tells the searcher “One moment, let me scan our records…
+Yes, we have 2 documents which contain ‘freedom’.  You’ll find them in
+seg_6/postings-4.dat starting at byte 66991.”</p>
+</li>
+<li>
+<p>The Postings file says “Yep, we have ‘freedom’, all right!  Document id 40
+has 1 ‘freedom’, and document 44 has 8.  If you need to know more, like if any
+‘freedom’ is part of the phrase ‘freedom of speech’, ask me about positions!</p>
+</li>
+<li>
+<p>If the searcher is only looking for ‘freedom’ in isolation, that’s where it
+stops.  It now knows enough to assign the documents scores against “freedom”,
+with the 8-freedom document likely ranking higher than the single-freedom
+document.</p>
+</li>
+</ol>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,144 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::FileLocking</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/">Apache</a>&nbsp;&raquo&nbsp;<a href="/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+              <li><a href="http://dezi.org/">Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+              <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Manage indexes on shared volumes.</h2>
+<p>Normally, index locking is an invisible process.  Exclusive write access is
+controlled via lockfiles within the index directory and problems only arise
+if multiple processes attempt to acquire the write lock simultaneously;
+search-time processes do not ordinarily require locking at all.</p>
+<p>On shared volumes, however, the default locking mechanism fails, and manual
+intervention becomes necessary.</p>
+<p>Both read and write applications accessing an index on a shared volume need
+to identify themselves with a unique <code>host</code> id, e.g. hostname or
+ip address.  Knowing the host id makes it possible to tell which lockfiles
+belong to other machines and therefore must not be removed when the
+lockfile’s pid number appears not to correspond to an active process.</p>
+<p>At index-time, the danger is that multiple indexing processes from
+different machines which fail to specify a unique <code>host</code> id can
+delete each others’ lockfiles and then attempt to modify the index at the
+same time, causing index corruption.  The search-time problem is more
+complex.</p>
+<p>Once an index file is no longer listed in the most recent snapshot, Indexer
+attempts to delete it as part of a post-<a href="lucy:Indexer.Commit"></a> cleanup routine.  It is
+possible that at the moment an Indexer is deleting files which it believes
+no longer needed, a Searcher referencing an earlier snapshot is in fact
+using them.  The more often that an index is either updated or searched,
+the more likely it is that this conflict will arise from time to time.</p>
+<p>Ordinarily, the deletion attempts are not a problem.   On a typical unix
+volume, the files will be deleted in name only: any process which holds an
+open filehandle against a given file will continue to have access, and the
+file won’t actually get vaporized until the last filehandle is cleared.
+Thanks to “delete on last close semantics”, an Indexer can’t truly delete
+the file out from underneath an active Searcher.   On Windows, where file
+deletion fails whenever any process holds an open handle, the situation is
+different but still workable: Indexer just keeps retrying after each commit
+until deletion finally succeeds.</p>
+<p>On NFS, however, the system breaks, because NFS allows files to be deleted
+out from underneath active processes.  Should this happen, the unlucky read
+process will crash with a “Stale NFS filehandle” exception.</p>
+<p>Under normal circumstances, it is neither necessary nor desirable for
+IndexReaders to secure read locks against an index, but for NFS we have to
+make an exception.  LockFactory’s <a href="lucy:LockFactory.Make_Shared_Lock"></a> method exists for this
+reason; supplying an IndexManager instance to IndexReader’s constructor
+activates an internal locking mechanism using <a href="lucy:LockFactory.Make_Shared_Lock"></a> which
+prevents concurrent indexing processes from deleting files that are needed
+by active readers.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>Since shared locks are implemented using lockfiles located in the index
+directory (as are exclusive locks), reader applications must have write
+access for read locking to work.  Stale lock files from crashed processes
+are ordinarily cleared away the next time the same machine – as identified
+by the <code>host</code> parameter – opens another IndexReader. (The
+classic technique of timing out lock files is not feasible because search
+processes may lie dormant indefinitely.) However, please be aware that if
+the last thing a given machine does is crash, lock files belonging to it
+may persist, preventing deletion of obsolete index data.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,133 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::IRTheory</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/">Apache</a>&nbsp;&raquo&nbsp;<a href="/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+              <li><a href="http://dezi.org/">Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+              <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Crash course in information retrieval</h2>
+<p>Just enough Information Retrieval theory to find your way around Apache Lucy.</p>
+<h3>Terminology</h3>
+<p>Lucy uses some terminology from the field of information retrieval which
+may be unfamiliar to many users.  “Document” and “term” mean pretty much what
+you’d expect them to, but others such as “posting” and “inverted index” need a
+formal introduction:</p>
+<ul>
+<li><em>document</em> - An atomic unit of retrieval.</li>
+<li><em>term</em> - An attribute which describes a document.</li>
+<li><em>posting</em> - One term indexing one document.</li>
+<li><em>term list</em> - The complete list of terms which describe a document.</li>
+<li><em>posting list</em> - The complete list of documents which a term indexes.</li>
+<li><em>inverted index</em> - A data structure which maps from terms to documents.</li>
+</ul>
+<p>Since Lucy is a practical implementation of IR theory, it loads these
+abstract, distilled definitions down with useful traits.  For instance, a
+“posting” in its most rarefied form is simply a term-document pairing; in
+Lucy, the class MatchPosting fills this
+role.  However, by associating additional information with a posting like the
+number of times the term occurs in the document, we can turn it into a
+ScorePosting, making it possible
+to rank documents by relevance rather than just list documents which happen to
+match in no particular order.</p>
+<h3>TF/IDF ranking algorithm</h3>
+<p>Lucy uses a variant of the well-established “Term Frequency / Inverse
+Document Frequency” weighting scheme.  A thorough treatment of TF/IDF is too
+ambitious for our present purposes, but in a nutshell, it means that…</p>
+<ul>
+<li>
+<p>in a search for <code>skate park</code>, documents which score well for the
+comparatively rare term <code>skate</code> will rank higher than documents which score
+well for the more common term <code>park</code>.</p>
+</li>
+<li>
+<p>a 10-word text which has one occurrence each of both <code>skate</code> and <code>park</code> will
+rank higher than a 1000-word text which also contains one occurrence of each.</p>
+</li>
+</ul>
+<p>A web search for “tf idf” will turn up many excellent explanations of the
+algorithm.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,142 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/">Apache</a>&nbsp;&raquo&nbsp;<a href="/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+              <li><a href="http://dezi.org/">Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+              <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Step-by-step introduction to Apache Lucy.</h2>
+<p>Explore Apache Lucy’s basic functionality by starting with a minimalist CGI
+search app based on Lucy::Simple and transforming it, step by step,
+into an “advanced search” interface utilizing more flexible core modules like
+<a href="../../Lucy/Index/Indexer.html">Indexer</a> and <a href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a>.</p>
+<h3>Chapters</h3>
+<ul>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/SimpleTutorial.html">SimpleTutorial</a> - Build a bare-bones search app using
+Lucy::Simple.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html">BeyondSimpleTutorial</a> - Rebuild the app using core
+classes like <a href="../../Lucy/Index/Indexer.html">Indexer</a> and
+<a href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> in place of Lucy::Simple.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/FieldTypeTutorial.html">FieldTypeTutorial</a> - Experiment with different field
+characteristics using subclasses of <a href="../../Lucy/Plan/FieldType.html">FieldType</a>.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/AnalysisTutorial.html">AnalysisTutorial</a> - Examine how the choice of
+<a href="../../Lucy/Analysis/Analyzer.html">Analyzer</a> subclass affects search results.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a> - Augment search results with
+highlighted excerpts.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html">QueryObjectsTutorial</a> - Unlock advanced search features
+by using Query objects instead of query strings.</p>
+</li>
+</ul>
+<h3>Source materials</h3>
+<p>The source material used by the tutorial app – a multi-text-file presentation
+of the United States constitution – can be found in the <code>sample</code> directory
+at the root of the Lucy distribution, along with finished indexing and search
+apps.</p>
+<pre><code class="language-c">sample/indexer_simple.c  # simple indexing executable
+sample/search_simple.c   # simple search executable
+sample/indexer.c         # indexing executable
+sample/search.c          # search executable
+sample/us_constitution   # corpus
+</code></pre>
+<h3>Conventions</h3>
+<p>The user is expected to be familiar with OO Perl and basic CGI programming.</p>
+<p>The code in this tutorial assumes a Unix-flavored operating system and the
+Apache webserver, but will work with minor modifications on other setups.</p>
+<h3>See also</h3>
+<p>More advanced and esoteric subjects are covered in <a href="../../Lucy/Docs/Cookbook.html">Cookbook</a>.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,152 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::AnalysisTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/">Apache</a>&nbsp;&raquo&nbsp;<a href="/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+              <li><a href="http://dezi.org/">Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+              <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>How to choose and use Analyzers.</h2>
+<p>Try swapping out the EasyAnalyzer in our Schema for a
+<a href="../../../Lucy/Analysis/StandardTokenizer.html">StandardTokenizer</a>:</p>
+<pre><code class="language-c">    StandardTokenizer *tokenizer = StandardTokenizer_new();
+    FullTextType *type = FullTextType_new((Analyzer*)tokenizer);
+</code></pre>
+<p>Search for <code>senate</code>, <code>Senate</code>, and <code>Senator</code> before and after making the
+change and re-indexing.</p>
+<p>Under EasyAnalyzer, the results are identical for all three searches, but
+under StandardTokenizer, searches are case-sensitive, and the result sets for
+<code>Senate</code> and <code>Senator</code> are distinct.</p>
+<h3>EasyAnalyzer</h3>
+<p>What’s happening is that <a href="../../../Lucy/Analysis/EasyAnalyzer.html">EasyAnalyzer</a> is performing more aggressive
+processing than StandardTokenizer.  In addition to tokenizing, it’s also
+converting all text to lower case so that searches are case-insensitive, and
+using a “stemming” algorithm to reduce related words to a common stem (<code>senat</code>,
+in this case).</p>
+<p>EasyAnalyzer is actually multiple Analyzers wrapped up in a single package.
+In this case, it’s three-in-one, since specifying a EasyAnalyzer with
+<code>language =&gt; 'en'</code> is equivalent to this snippet creating a
+<a href="../../../Lucy/Analysis/PolyAnalyzer.html">PolyAnalyzer</a>:</p>
+<pre><code class="language-c">    Vector *analyzers = Vec_new(3);
+    Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+    Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+    Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language));
+
+    PolyAnalyzer *analyzer = PolyAnalyzer_new(NULL, analyzers);
+    DECREC(analyzers);
+</code></pre>
+<p>You can add or subtract Analyzers from there if you like.  Try adding a fourth
+Analyzer, a SnowballStopFilter for suppressing “stopwords” like “the”, “if”,
+and “maybe”.</p>
+<pre><code class="language-c">    Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+    Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+    Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language));
+    Vec_Push(analyzers, (Analyzer*)SnowStop_new(language, NULL));
+</code></pre>
+<p>Also, try removing the SnowballStemmer.</p>
+<pre><code class="language-c">    Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+    Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+</code></pre>
+<p>The original choice of a stock English EasyAnalyzer probably still yields the
+best results for this document collection, but you get the idea: sometimes you
+want a different Analyzer.</p>
+<h3>When the best Analyzer is no Analyzer</h3>
+<p>Sometimes you don’t want an Analyzer at all.  That was true for our “url”
+field because we didn’t need it to be searchable, but it’s also true for
+certain types of searchable fields.  For instance, “category” fields are often
+set up to match exactly or not at all, as are fields like “last_name” (because
+you may not want to conflate results for “Humphrey” and “Humphries”).</p>
+<p>To specify that there should be no analysis performed at all, use StringType:</p>
+<pre><code class="language-c">    String     *name = Str_newf(&quot;category&quot;);
+    StringType *type = StringType_new();
+    Schema_Spec_Field(schema, name, (FieldType*)type);
+    DECREF(type);
+    DECREF(name);
+</code></pre>
+<h3>Highlighting up next</h3>
+<p>In our next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a>,
+we’ll add highlighted excerpts from the “content” field to our search results.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,296 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::BeyondSimpleTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/">Apache</a>&nbsp;&raquo&nbsp;<a href="/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+              <li><a href="http://dezi.org/">Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+              <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>A more flexible app structure.</h2>
+<h3>Goal</h3>
+<p>In this tutorial chapter, we’ll refactor the apps we built in
+<a href="../../../Lucy/Docs/Tutorial/SimpleTutorial.html">SimpleTutorial</a> so that they look exactly the same from
+the end user’s point of view, but offer the developer greater possibilites for
+expansion.</p>
+<p>To achieve this, we’ll ditch Lucy::Simple and replace it with the
+classes that it uses internally:</p>
+<ul>
+<li><a href="../../../Lucy/Plan/Schema.html">Schema</a> - Plan out your index.</li>
+<li><a href="../../../Lucy/Plan/FullTextType.html">FullTextType</a> - Field type for full text search.</li>
+<li><a href="../../../Lucy/Analysis/EasyAnalyzer.html">EasyAnalyzer</a> - A one-size-fits-all parser/tokenizer.</li>
+<li><a href="../../../Lucy/Index/Indexer.html">Indexer</a> - Manipulate index content.</li>
+<li><a href="../../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> - Search an index.</li>
+<li><a href="../../../Lucy/Search/Hits.html">Hits</a> - Iterate over hits returned by a Searcher.</li>
+</ul>
+<h3>Adaptations to indexer.pl</h3>
+<p>After we load our modules…</p>
+<pre><code class="language-c">#include &lt;dirent.h&gt;
+#include &lt;stdio.h&gt;
+#include &lt;stdlib.h&gt;
+#include &lt;string.h&gt;
+
+#define CFISH_USE_SHORT_NAMES
+#define LUCY_USE_SHORT_NAMES
+#include &quot;Clownfish/String.h&quot;
+#include &quot;Lucy/Analysis/EasyAnalyzer.h&quot;
+#include &quot;Lucy/Document/Doc.h&quot;
+#include &quot;Lucy/Index/Indexer.h&quot;
+#include &quot;Lucy/Plan/FullTextType.h&quot;
+#include &quot;Lucy/Plan/StringType.h&quot;
+#include &quot;Lucy/Plan/Schema.h&quot;
+
+const char path_to_index[] = &quot;/path/to/index&quot;;
+const char uscon_source[]  = &quot;/usr/local/apache2/htdocs/us_constitution&quot;;
+</code></pre>
+<p>… the first item we’re going need is a <a href="../../../Lucy/Plan/Schema.html">Schema</a>.</p>
+<p>The primary job of a Schema is to specify what fields are available and how
+they’re defined.  We’ll start off with three fields: title, content and url.</p>
+<pre><code class="language-c">static Schema*
+S_create_schema() {
+    // Create a new schema.
+    Schema *schema = Schema_new();
+
+    // Create an analyzer.
+    String       *language = Str_newf(&quot;en&quot;);
+    EasyAnalyzer *analyzer = EasyAnalyzer_new(language);
+
+    // Specify fields.
+
+    FullTextType *type = FullTextType_new((Analyzer*)analyzer);
+
+    {
+        String *field_str = Str_newf(&quot;title&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    {
+        String *field_str = Str_newf(&quot;content&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    {
+        String *field_str = Str_newf(&quot;url&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    DECREF(type);
+    DECREF(analyzer);
+    DECREF(language);
+    return schema;
+}
+</code></pre>
+<p>All of the fields are spec’d out using the <a href="../../../Lucy/Plan/FullTextType.html">FullTextType</a> FieldType,
+indicating that they will be searchable as “full text” – which means that
+they can be searched for individual words.  The “analyzer”, which is unique to
+FullTextType fields, is what breaks up the text into searchable tokens.</p>
+<p>Next, we’ll swap our Lucy::Simple object out for an <a href="../../../Lucy/Index/Indexer.html">Indexer</a>.
+The substitution will be straightforward because Simple has merely been
+serving as a thin wrapper around an inner Indexer, and we’ll just be peeling
+away the wrapper.</p>
+<p>First, replace the constructor:</p>
+<pre><code class="language-c">int
+main() {
+    // Initialize the library.
+    lucy_bootstrap_parcel();
+
+    Schema *schema = S_create_schema();
+    String *folder = Str_newf(&quot;%s&quot;, path_to_index);
+
+    Indexer *indexer = Indexer_new(schema, (Obj*)folder, NULL,
+                                   Indexer_CREATE | Indexer_TRUNCATE);
+
+</code></pre>
+<p>Next, have the <code>indexer</code> object <a href="../../../Lucy/Index/Indexer.html#func_Add_Doc">Add_Doc()</a> where we
+were having the <code>lucy</code> object adding the document before:</p>
+<pre><code class="language-c">    DIR *dir = opendir(uscon_source);
+    if (dir == NULL) {
+        perror(uscon_source);
+        return 1;
+    }
+
+    for (struct dirent *entry = readdir(dir);
+         entry;
+         entry = readdir(dir)) {
+
+        if (S_ends_with(entry-&gt;d_name, &quot;.txt&quot;)) {
+            Doc *doc = S_parse_file(entry-&gt;d_name);
+            Indexer_Add_Doc(indexer, doc, 1.0);
+            DECREF(doc);
+        }
+    }
+
+    closedir(dir);
+</code></pre>
+<p>There’s only one extra step required: at the end of the app, you must call
+commit() explicitly to close the indexing session and commit your changes.
+(Lucy::Simple hides this detail, calling commit() implicitly when it needs to).</p>
+<pre><code class="language-c">    Indexer_Commit(indexer);
+
+    DECREF(indexer);
+    DECREF(folder);
+    DECREF(schema);
+    return 0;
+}
+</code></pre>
+<h3>Adaptations to search.cgi</h3>
+<p>In our search app as in our indexing app, Lucy::Simple has served as a
+thin wrapper – this time around <a href="../../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> and
+<a href="../../../Lucy/Search/Hits.html">Hits</a>.  Swapping out Simple for these two classes is
+also straightforward:</p>
+<pre><code class="language-c">#include &lt;stdio.h&gt;
+#include &lt;stdlib.h&gt;
+#include &lt;string.h&gt;
+
+#define CFISH_USE_SHORT_NAMES
+#define LUCY_USE_SHORT_NAMES
+#include &quot;Clownfish/String.h&quot;
+#include &quot;Lucy/Document/HitDoc.h&quot;
+#include &quot;Lucy/Search/Hits.h&quot;
+#include &quot;Lucy/Search/IndexSearcher.h&quot;
+
+const char path_to_index[] = &quot;/path/to/index&quot;;
+
+int
+main(int argc, char *argv[]) {
+    // Initialize the library.
+    lucy_bootstrap_parcel();
+
+    if (argc &lt; 2) {
+        printf(&quot;Usage: %s &lt;querystring&gt;\n&quot;, argv[0]);
+        return 0;
+    }
+
+    const char *query_c = argv[1];
+
+    printf(&quot;Searching for: %s\n\n&quot;, query_c);
+
+    String        *folder   = Str_newf(&quot;%s&quot;, path_to_index);
+    IndexSearcher *searcher = IxSearcher_new((Obj*)folder);
+
+    String *query_str = Str_newf(&quot;%s&quot;, query_c);
+    Hits *hits = IxSearcher_Hits(searcher, (Obj*)query_str, 0, 10, NULL);
+
+    String *title_str = Str_newf(&quot;title&quot;);
+    String *url_str   = Str_newf(&quot;url&quot;);
+    HitDoc *hit;
+    int i = 1;
+
+    // Loop over search results.
+    while (NULL != (hit = Hits_Next(hits))) {
+        String *title = (String*)HitDoc_Extract(hit, title_str);
+        char *title_c = Str_To_Utf8(title);
+
+        String *url = (String*)HitDoc_Extract(hit, url_str);
+        char *url_c = Str_To_Utf8(url);
+
+        printf(&quot;Result %d: %s (%s)\n&quot;, i, title_c, url_c);
+
+        free(url_c);
+        free(title_c);
+        DECREF(url);
+        DECREF(title);
+        DECREF(hit);
+        i++;
+    }
+
+    DECREF(url_str);
+    DECREF(title_str);
+    DECREF(hits);
+    DECREF(query_str);
+    DECREF(searcher);
+    DECREF(folder);
+    return 0;
+}
+</code></pre>
+<h3>Hooray!</h3>
+<p>Congratulations!  Your apps do the same thing as before… but now they’ll be
+easier to customize.</p>
+<p>In our next chapter, <a href="../../../Lucy/Docs/Tutorial/FieldTypeTutorial.html">FieldTypeTutorial</a>, we’ll explore
+how to assign different behaviors to different fields.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,151 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::FieldTypeTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/">Apache</a>&nbsp;&raquo&nbsp;<a href="/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+              <li><a href="http://dezi.org/">Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+              <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Specify per-field properties and behaviors.</h2>
+<p>The Schema we used in the last chapter specifies three fields:</p>
+<pre><code class="language-c">    FullTextType *type = FullTextType_new((Analyzer*)analyzer);
+
+    {
+        String *field_str = Str_newf(&quot;title&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    {
+        String *field_str = Str_newf(&quot;content&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+    {
+        String *field_str = Str_newf(&quot;url&quot;);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(field_str);
+    }
+
+</code></pre>
+<p>Since they are all defined as “full text” fields, they are all searchable –
+including the <code>url</code> field, a dubious choice.  Some URLs contain meaningful
+information, but these don’t, really:</p>
+<pre><code>http://example.com/us_constitution/amend1.txt
+</code></pre>
+<p>We may as well not bother indexing the URL content.  To achieve that we need
+to assign the <code>url</code> field to a different FieldType.</p>
+<h3>StringType</h3>
+<p>Instead of FullTextType, we’ll use a
+<a href="../../../Lucy/Plan/StringType.html">StringType</a>, which doesn’t use an
+Analyzer to break up text into individual fields.  Furthermore, we’ll mark
+this StringType as unindexed, so that its content won’t be searchable at all.</p>
+<pre><code class="language-c">    {
+        String *field_str = Str_newf(&quot;url&quot;);
+        StringType *type = StringType_new();
+        StringType_Set_Indexed(type, false);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(type);
+        DECREF(field_str);
+    }
+</code></pre>
+<p>To observe the change in behavior, try searching for <code>us_constitution</code> both
+before and after changing the Schema and re-indexing.</p>
+<h3>Toggling ‘stored’</h3>
+<p>For a taste of other FieldType possibilities, try turning off <code>stored</code> for
+one or more fields.</p>
+<pre><code class="language-c">    FullTextType *content_type = FullTextType_new((Analyzer*)analyzer);
+    FullTextType_Set_Stored(content_type, false);
+</code></pre>
+<p>Turning off <code>stored</code> for either <code>title</code> or <code>url</code> mangles our results page,
+but since we’re not displaying <code>content</code>, turning it off for <code>content</code> has
+no effect – except on index size.</p>
+<h3>Analyzers up next</h3>
+<p>Analyzers play a crucial role in the behavior of FullTextType fields.  In our
+next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/AnalysisTutorial.html">AnalysisTutorial</a>, we’ll see how
+changing up the Analyzer changes search results.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/HighlighterTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/HighlighterTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/HighlighterTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,160 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::HighlighterTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/">Apache</a>&nbsp;&raquo&nbsp;<a href="/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+              <li><a href="http://dezi.org/">Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+              <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Augment search results with highlighted excerpts.</h2>
+<p>Adding relevant excerpts with highlighted search terms to your search results
+display makes it much easier for end users to scan the page and assess which
+hits look promising, dramatically improving their search experience.</p>
+<h3>Adaptations to indexer.pl</h3>
+<p><a href="../../../Lucy/Highlight/Highlighter.html">Highlighter</a> uses information generated at index
+time.  To save resources, highlighting is disabled by default and must be
+turned on for individual fields.</p>
+<pre><code class="language-c">    {
+        String *field_str = Str_newf(&quot;content&quot;);
+        FullTextType *type = FullTextType_new((Analyzer*)analyzer);
+        FullTextType_Set_Highlightable(type, true);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(type);
+        DECREF(field_str);
+    }
+</code></pre>
+<h3>Adaptations to search.cgi</h3>
+<p>To add highlighting and excerpting to the search.cgi sample app, create a
+<code>$highlighter</code> object outside the hits iterating loop…</p>
+<pre><code class="language-c">    String *content_str = Str_newf(&quot;content&quot;);
+    Highlighter *highlighter
+        = Highlighter_new((Searcher*)searcher, (Obj*)query,
+                          content_str, 200);
+</code></pre>
+<p>… then modify the loop and the per-hit display to generate and include the
+excerpt.</p>
+<pre><code class="language-c">    String *title_str = Str_newf(&quot;title&quot;);
+    String *url_str   = Str_newf(&quot;url&quot;);
+    HitDoc *hit;
+    i = 1;
+
+    // Loop over search results.
+    while (NULL != (hit = Hits_Next(hits))) {
+        String *title = (String*)HitDoc_Extract(hit, title_str);
+        char *title_c = Str_To_Utf8(title);
+
+        String *url = (String*)HitDoc_Extract(hit, url_str);
+        char *url_c = Str_To_Utf8(url);
+
+        String *excerpt = Highlighter_Create_Excerpt(highlighter, hit);
+        char *excerpt_c = Str_To_Utf8(excerpt);
+
+        printf(&quot;Result %d: %s (%s)\n%s\n\n&quot;, i, title_c, url_c, excerpt_c);
+
+        free(excerpt_c);
+        free(url_c);
+        free(title_c);
+        DECREF(excerpt);
+        DECREF(url);
+        DECREF(title);
+        DECREF(hit);
+        i++;
+    }
+
+    DECREF(url_str);
+    DECREF(title_str);
+    DECREF(hits);
+    DECREF(query_str);
+    DECREF(highlighter);
+    DECREF(content_str);
+    DECREF(searcher);
+    DECREF(folder);
+</code></pre>
+<h3>Next chapter: Query objects</h3>
+<p>Our next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html">QueryObjectsTutorial</a>,
+illustrates how to build an “advanced search” interface using
+<a href="../../../Lucy/Search/Query.html">Query</a> objects instead of query strings.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,269 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Tutorial::QueryObjectsTutorial</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/">Apache</a>&nbsp;&raquo&nbsp;<a href="/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/">0.5.0</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/">C</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+              <li><a href="http://dezi.org/">Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+              <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div class="c-api">
+<h2>Use Query objects instead of query strings.</h2>
+<p>Until now, our search app has had only a single search box.  In this tutorial
+chapter, we’ll move towards an “advanced search” interface, by adding a
+“category” drop-down menu.  Three new classes will be required:</p>
+<ul>
+<li>
+<p><a href="../../../Lucy/Search/QueryParser.html">QueryParser</a> - Turn a query string into a
+<a href="../../../Lucy/Search/Query.html">Query</a> object.</p>
+</li>
+<li>
+<p><a href="../../../Lucy/Search/TermQuery.html">TermQuery</a> - Query for a specific term within
+a specific field.</p>
+</li>
+<li>
+<p><a href="../../../Lucy/Search/ANDQuery.html">ANDQuery</a> - “AND” together multiple Query
+objects to produce an intersected result set.</p>
+</li>
+</ul>
+<h3>Adaptations to indexer.pl</h3>
+<p>Our new “category” field will be a StringType field rather than a FullTextType
+field, because we will only be looking for exact matches.  It needs to be
+indexed, but since we won’t display its value, it doesn’t need to be stored.</p>
+<pre><code class="language-c">    {
+        String *field_str = Str_newf(&quot;category&quot;);
+        StringType *type = StringType_new();
+        StringType_Set_Stored(type, false);
+        Schema_Spec_Field(schema, field_str, (FieldType*)type);
+        DECREF(type);
+        DECREF(field_str);
+    }
+</code></pre>
+<p>There will be three possible values: “article”, “amendment”, and “preamble”,
+which we’ll hack out of the source file’s name during our <code>parse_file</code>
+subroutine:</p>
+<pre><code class="language-c">    const char *category = NULL;
+    if (S_starts_with(filename, &quot;art&quot;)) {
+        category = &quot;article&quot;;
+    }
+    else if (S_starts_with(filename, &quot;amend&quot;)) {
+        category = &quot;amendment&quot;;
+    }
+    else if (S_starts_with(filename, &quot;preamble&quot;)) {
+        category = &quot;preamble&quot;;
+    }
+    else {
+        fprintf(stderr, &quot;Can't derive category for %s&quot;, filename);
+        exit(1);
+    }
+
+    ...
+
+    {
+        // Store 'category' field
+        String *field = Str_newf(&quot;category&quot;);
+        String *value = Str_new_from_utf8(category, strlen(category));
+        Doc_Store(doc, field, (Obj*)value);
+        DECREF(field);
+        DECREF(value);
+    }
+</code></pre>
+<h3>Adaptations to search.cgi</h3>
+<p>The “category” constraint will be added to our search interface using an HTML
+“select” element (this routine will need to be integrated into the HTML
+generation section of search.cgi):</p>
+<pre><code class="language-c">static void
+S_usage_and_exit(const char *arg0) {
+    printf(&quot;Usage: %s [-c &lt;category&gt;] &lt;querystring&gt;\n&quot;, arg0);
+    exit(1);
+}
+</code></pre>
+<p>We’ll start off by loading our new modules and extracting our new CGI
+parameter.</p>
+<pre><code class="language-c">    const char *category = NULL;
+    int i = 1;
+
+    while (i &lt; argc - 1) {
+        if (strcmp(argv[i], &quot;-c&quot;) == 0) {
+            if (i + 1 &gt;= argc) {
+                S_usage_and_exit(argv[0]);
+            }
+            i += 1;
+            category = argv[i];
+        }
+        else {
+            S_usage_and_exit(argv[0]);
+        }
+
+        i += 1;
+    }
+
+    if (i + 1 != argc) {
+        S_usage_and_exit(argv[0]);
+    }
+
+    const char *query_c = argv[i];
+</code></pre>
+<p>QueryParser’s constructor requires a “schema” argument.  We can get that from
+our IndexSearcher:</p>
+<pre><code class="language-c">    IndexSearcher *searcher = IxSearcher_new((Obj*)folder);
+    Schema        *schema   = IxSearcher_Get_Schema(searcher);
+    QueryParser   *qparser  = QParser_new(schema, NULL, NULL, NULL);
+</code></pre>
+<p>Previously, we have been handing raw query strings to IndexSearcher.  Behind
+the scenes, IndexSearcher has been using a QueryParser to turn those query
+strings into Query objects.  Now, we will bring QueryParser into the
+foreground and parse the strings explicitly.</p>
+<pre><code class="language-c">    Query *query = QParser_Parse(qparser, query_str);
+</code></pre>
+<p>If the user has specified a category, we’ll use an ANDQuery to join our parsed
+query together with a TermQuery representing the category.</p>
+<pre><code class="language-c">    if (category) {
+        String *category_name = String_newf(&quot;category&quot;);
+        String *category_str  = String_newf(&quot;%s&quot;, category);
+        TermQuery *category_query
+            = TermQuery_new(category_name, category_str);
+
+        Vector *children = Vec_new(2);
+        Vec_Push(children, (Obj*)query);
+        Vec_Push(children, category_query);
+        query = (Query*)ANDQuery_new(children);
+
+        DECREF(children);
+        DECREF(category_str);
+        DECREF(category_name);
+    }
+}
+</code></pre>
+<p>Now when we execute the query…</p>
+<pre><code class="language-c">    Hits *hits = IxSearcher_Hits(searcher, (Obj*)query, 0, 10, NULL);
+</code></pre>
+<p>… we’ll get a result set which is the intersection of the parsed query and
+the category query.</p>
+<h3>Using TermQuery with full text fields</h3>
+<p>When querying full text fields, the easiest way is to create query objects
+using QueryParser. But sometimes you want to create TermQuery for a single
+term in a FullTextType field directly. In this case, we have to run the
+search term through the field’s analyzer to make sure it gets normalized in
+the same way as the field’s content.</p>
+<pre><code class="language-c">Query*
+make_term_query(Schema *schema, String *field, String *term) {
+    FieldType *type  = Schema_Fetch_Type(schema, field);
+    String    *token = NULL;
+
+    if (FieldType_is_a(type, FULLTEXTTYPE)) {
+        // Run the term through the full text analysis chain.
+        Analyzer *analyzer = FullTextType_Get_Analyzer((FullTextType*)type);
+        Vector   *tokens   = Analyzer_Split(analyzer, term);
+
+        if (Vec_Get_Size(tokens) != 1) {
+            // If the term expands to more than one token, or no
+            // tokens at all, it will never match a single token in
+            // the full text field.
+            DECREF(tokens);
+            return (Query*)NoMatchQuery_new();
+        }
+
+        token = (String*)Vec_Delete(tokens, 0);
+        DECREF(tokens);
+    }
+    else {
+        // Exact match for other types.
+        token = (String*)INCREF(term);
+    }
+
+    TermQuery *term_query = TermQuery_new(field, (Obj*)token);
+
+    DECREF(token);
+    return (Query*)term_query;
+}
+</code></pre>
+<h3>Congratulations!</h3>
+<p>You’ve made it to the end of the tutorial.</p>
+<h3>See Also</h3>
+<p>For additional thematic documentation, see the Apache Lucy
+<a href="../../../Lucy/Docs/Cookbook.html">Cookbook</a>.</p>
+<p>ANDQuery has a companion class, <a href="../../../Lucy/Search/ORQuery.html">ORQuery</a>, and a
+close relative, <a href="../../../Lucy/Search/RequiredOptionalQuery.html">RequiredOptionalQuery</a>.</p>
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>