You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucy.apache.org by bu...@apache.org on 2016/09/28 12:07:52 UTC
svn commit: r998475 [6/26] - in /websites/staging/lucy/trunk/content: ./
docs/ docs/0.5.0/ docs/0.5.0/c/ docs/0.5.0/c/Clownfish/
docs/0.5.0/c/Clownfish/Docs/ docs/0.5.0/c/Lucy/ docs/0.5.0/c/Lucy/Analysis/
docs/0.5.0/c/Lucy/Docs/ docs/0.5.0/c/Lucy/Docs/...
Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,260 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ <title>Lucy::Docs::FileFormat</title>
+ <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+ </head>
+
+ <body>
+
+ <div id="lucy-rigid_wrapper">
+
+ <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+ <div id="lucy-logo_box" class="grid_8">
+ <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+ </div> <!-- lucy-logo_box -->
+
+ <div #id="lucy-top_nav_box" class="grid_8">
+ <div id="lucy-top_nav_bar" class="container_8">
+ <ul>
+ <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+ <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+ <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+ <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+ </ul>
+ </div> <!-- lucy-top_nav_bar -->
+ <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/c/">C</a> » <a href="/docs/0.5.0/c/Lucy/">Lucy</a> » <a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+ <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+ <input value="*.apache.org" name="sitesearch" type="hidden"/>
+ <input type="text" name="q" id="query" style="width:85%">
+ <input type="submit" id="submit" value="Search">
+ </form>
+ </div> <!-- lucy-top_nav_box -->
+
+ <div class="clear"></div>
+
+ </div> <!-- lucy-top -->
+
+ <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+ <div class="grid_4" id="lucy-left_nav_box">
+ <h6>About</h6>
+ <ul>
+ <li><a href="/">Welcome</a></li>
+ <li><a href="/clownfish.html">Clownfish</a></li>
+ <li><a href="/faq.html">FAQ</a></li>
+ <li><a href="/people.html">People</a></li>
+ </ul>
+ <h6>Resources</h6>
+ <ul>
+ <li><a href="/download.html">Download</a></li>
+ <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+ <li><a href="/docs/">Documentation</a></li>
+ <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+ <li><a href="/version_control.html">Version Control</a></li>
+ </ul>
+ <h6>Related Projects</h6>
+ <ul>
+ <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+ <li><a href="http://dezi.org/">Dezi</a></li>
+ <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+ <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+ <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+ </ul>
+ </div> <!-- lucy-left_nav_box -->
+
+ <div id="lucy-main_content_box" class="grid_9">
+ <div class="c-api">
+<h2>Overview of index file format</h2>
+<p>It is not necessary to understand the current implementation details of the
+index file format in order to use Apache Lucy effectively, but it may be
+helpful if you are interested in tweaking for high performance, exotic usage,
+or debugging and development.</p>
+<p>On a file system, an index is a directory. The files inside have a
+hierarchical relationship: an index is made up of “segments”, each of which is
+an independent inverted index with its own subdirectory; each segment is made
+up of several component parts.</p>
+<pre><code>[index]--|
+ |--snapshot_XXX.json
+ |--schema_XXX.json
+ |--write.lock
+ |
+ |--seg_1--|
+ | |--segmeta.json
+ | |--cfmeta.json
+ | |--cf.dat-------|
+ | |--[lexicon]
+ | |--[postings]
+ | |--[documents]
+ | |--[highlight]
+ | |--[deletions]
+ |
+ |--seg_2--|
+ | |--segmeta.json
+ | |--cfmeta.json
+ | |--cf.dat-------|
+ | |--[lexicon]
+ | |--[postings]
+ | |--[documents]
+ | |--[highlight]
+ | |--[deletions]
+ |
+ |--[...]--|
+</code></pre>
+<h3>Write-once philosophy</h3>
+<p>All segment directory names consist of the string “seg_” followed by a number
+in base 36: seg_1, seg_5m, seg_p9s2 and so on, with higher numbers indicating
+more recent segments. Once a segment is finished and committed, its name is
+never re-used and its files are never modified.</p>
+<p>Old segments become obsolete and can be removed when their data has been
+consolidated into new segments during the process of segment merging and
+optimization. A fully-optimized index has only one segment.</p>
+<h3>Top-level entries</h3>
+<p>There are a handful of “top-level” files and directories which belong to the
+entire index rather than to a particular segment.</p>
+<h4>snapshot_XXX.json</h4>
+<p>A “snapshot” file, e.g. <code>snapshot_m7p.json</code>, is list of index files and
+directories. Because index files, once written, are never modified, the list
+of entries in a snapshot defines a point-in-time view of the data in an index.</p>
+<p>Like segment directories, snapshot files also utilize the
+unique-base-36-number naming convention; the higher the number, the more
+recent the file. The appearance of a new snapshot file within the index
+directory constitutes an index update. While a new segment is being written
+new files may be added to the index directory, but until a new snapshot file
+gets written, a Searcher opening the index for reading won’t know about them.</p>
+<h4>schema_XXX.json</h4>
+<p>The schema file is a Schema object describing the index’s format, serialized
+as JSON. It, too, is versioned, and a given snapshot file will reference one
+and only one schema file.</p>
+<h4>locks</h4>
+<p>By default, only one indexing process may safely modify the index at any given
+time. Processes reserve an index by laying claim to the <code>write.lock</code> file
+within the <code>locks/</code> directory. A smattering of other lock files may be used
+from time to time, as well.</p>
+<h3>A segment’s component parts</h3>
+<p>By default, each segment has up to five logical components: lexicon, postings,
+document storage, highlight data, and deletions. Binary data from these
+components gets stored in virtual files within the “cf.dat” compound file;
+metadata is stored in a shared “segmeta.json” file.</p>
+<h4>segmeta.json</h4>
+<p>The segmeta.json file is a central repository for segment metadata. In
+addition to information such as document counts and field numbers, it also
+warehouses arbitrary metadata on behalf of individual index components.</p>
+<h4>Lexicon</h4>
+<p>Each indexed field gets its own lexicon in each segment. The exact files
+involved depend on the field’s type, but generally speaking there will be two
+parts. First, there’s a primary <code>lexicon-XXX.dat</code> file which houses a
+complete term list associating terms with corpus frequency statistics,
+postings file locations, etc. Second, one or more “lexicon index” files may
+be present which contain periodic samples from the primary lexicon file to
+facilitate fast lookups.</p>
+<h4>Postings</h4>
+<p>“Posting” is a technical term from the field of
+<a href="../../Lucy/Docs/IRTheory.html">information retrieval</a>, defined as a single
+instance of a one term indexing one document. If you are looking at the index
+in the back of a book, and you see that “freedom” is referenced on pages 8,
+86, and 240, that would be three postings, which taken together form a
+“posting list”. The same terminology applies to an index in electronic form.</p>
+<p>Each segment has one postings file per indexed field. When a search is
+performed for a single term, first that term is looked up in the lexicon. If
+the term exists in the segment, the record in the lexicon will contain
+information about which postings file to look at and where to look.</p>
+<p>The first thing any posting record tells you is a document id. By iterating
+over all the postings associated with a term, you can find all the documents
+that match that term, a process which is analogous to looking up page numbers
+in a book’s index. However, each posting record typically contains other
+information in addition to document id, e.g. the positions at which the term
+occurs within the field.</p>
+<h4>Documents</h4>
+<p>The document storage section is a simple database, organized into two files:</p>
+<ul>
+<li>
+<p><strong>documents.dat</strong> - Serialized documents.</p>
+</li>
+<li>
+<p><strong>documents.ix</strong> - Document storage index, a solid array of 64-bit integers
+where each integer location corresponds to a document id, and the value at
+that location points at a file position in the documents.dat file.</p>
+</li>
+</ul>
+<h4>Highlight data</h4>
+<p>The files which store data used for excerpting and highlighting are organized
+similarly to the files used to store documents.</p>
+<ul>
+<li>
+<p><strong>highlight.dat</strong> - Chunks of serialized highlight data, one per doc id.</p>
+</li>
+<li>
+<p><strong>highlight.ix</strong> - Highlight data index – as with the <code>documents.ix</code> file, a
+solid array of 64-bit file pointers.</p>
+</li>
+</ul>
+<h4>Deletions</h4>
+<p>When a document is “deleted” from a segment, it is not actually purged right
+away; it is merely marked as “deleted” via a deletions file. Deletions files
+contains bit vectors with one bit for each document in the segment; if bit
+#254 is set then document 254 is deleted, and if that document turns up in a
+search it will be masked out.</p>
+<p>It is only when a segment’s contents are rewritten to a new segment during the
+segment-merging process that deleted documents truly go away.</p>
+<h3>Compound Files</h3>
+<p>If you peer inside an index directory, you won’t actually find any files named
+“documents.dat”, “highlight.ix”, etc. unless there is an indexing process
+underway. What you will find instead is one “cf.dat” and one “cfmeta.json”
+file per segment.</p>
+<p>To minimize the need for file descriptors at search-time, all per-segment
+binary data files are concatenated together in “cf.dat” at the close of each
+indexing session. Information about where each file begins and ends is stored
+in <code>cfmeta.json</code>. When the segment is opened for reading, a single file
+descriptor per “cf.dat” file can be shared among several readers.</p>
+<h3>A Typical Search</h3>
+<p>Here’s a simplified narrative, dramatizing how a search for “freedom” against
+a given segment plays out:</p>
+<ol>
+<li>
+<p>The searcher asks the relevant Lexicon Index, “Do you know anything about
+‘freedom’?” Lexicon Index replies, “Can’t say for sure, but if the main
+Lexicon file does, ‘freedom’ is probably somewhere around byte 21008”.</p>
+</li>
+<li>
+<p>The main Lexicon tells the searcher “One moment, let me scan our records…
+Yes, we have 2 documents which contain ‘freedom’. You’ll find them in
+seg_6/postings-4.dat starting at byte 66991.”</p>
+</li>
+<li>
+<p>The Postings file says “Yep, we have ‘freedom’, all right! Document id 40
+has 1 ‘freedom’, and document 44 has 8. If you need to know more, like if any
+‘freedom’ is part of the phrase ‘freedom of speech’, ask me about positions!</p>
+</li>
+<li>
+<p>If the searcher is only looking for ‘freedom’ in isolation, that’s where it
+stops. It now knows enough to assign the documents scores against “freedom”,
+with the 8-freedom document likely ranking higher than the single-freedom
+document.</p>
+</li>
+</ol>
+</div>
+
+ </div> <!-- lucy-main_content_box -->
+ <div class="clear"></div>
+
+ </div> <!-- lucy-main_content -->
+
+ <div id="lucy-copyright" class="container_16">
+ <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br/>
+ Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+ Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
+ respective owners.
+ </p>
+ </div> <!-- lucy-copyright -->
+
+ </div> <!-- lucy-rigid_wrapper -->
+
+ </body>
+</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,144 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ <title>Lucy::Docs::FileLocking</title>
+ <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+ </head>
+
+ <body>
+
+ <div id="lucy-rigid_wrapper">
+
+ <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+ <div id="lucy-logo_box" class="grid_8">
+ <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+ </div> <!-- lucy-logo_box -->
+
+ <div #id="lucy-top_nav_box" class="grid_8">
+ <div id="lucy-top_nav_bar" class="container_8">
+ <ul>
+ <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+ <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+ <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+ <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+ </ul>
+ </div> <!-- lucy-top_nav_bar -->
+ <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/c/">C</a> » <a href="/docs/0.5.0/c/Lucy/">Lucy</a> » <a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+ <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+ <input value="*.apache.org" name="sitesearch" type="hidden"/>
+ <input type="text" name="q" id="query" style="width:85%">
+ <input type="submit" id="submit" value="Search">
+ </form>
+ </div> <!-- lucy-top_nav_box -->
+
+ <div class="clear"></div>
+
+ </div> <!-- lucy-top -->
+
+ <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+ <div class="grid_4" id="lucy-left_nav_box">
+ <h6>About</h6>
+ <ul>
+ <li><a href="/">Welcome</a></li>
+ <li><a href="/clownfish.html">Clownfish</a></li>
+ <li><a href="/faq.html">FAQ</a></li>
+ <li><a href="/people.html">People</a></li>
+ </ul>
+ <h6>Resources</h6>
+ <ul>
+ <li><a href="/download.html">Download</a></li>
+ <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+ <li><a href="/docs/">Documentation</a></li>
+ <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+ <li><a href="/version_control.html">Version Control</a></li>
+ </ul>
+ <h6>Related Projects</h6>
+ <ul>
+ <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+ <li><a href="http://dezi.org/">Dezi</a></li>
+ <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+ <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+ <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+ </ul>
+ </div> <!-- lucy-left_nav_box -->
+
+ <div id="lucy-main_content_box" class="grid_9">
+ <div class="c-api">
+<h2>Manage indexes on shared volumes.</h2>
+<p>Normally, index locking is an invisible process. Exclusive write access is
+controlled via lockfiles within the index directory and problems only arise
+if multiple processes attempt to acquire the write lock simultaneously;
+search-time processes do not ordinarily require locking at all.</p>
+<p>On shared volumes, however, the default locking mechanism fails, and manual
+intervention becomes necessary.</p>
+<p>Both read and write applications accessing an index on a shared volume need
+to identify themselves with a unique <code>host</code> id, e.g. hostname or
+ip address. Knowing the host id makes it possible to tell which lockfiles
+belong to other machines and therefore must not be removed when the
+lockfile’s pid number appears not to correspond to an active process.</p>
+<p>At index-time, the danger is that multiple indexing processes from
+different machines which fail to specify a unique <code>host</code> id can
+delete each others’ lockfiles and then attempt to modify the index at the
+same time, causing index corruption. The search-time problem is more
+complex.</p>
+<p>Once an index file is no longer listed in the most recent snapshot, Indexer
+attempts to delete it as part of a post-<a href="lucy:Indexer.Commit"></a> cleanup routine. It is
+possible that at the moment an Indexer is deleting files which it believes
+no longer needed, a Searcher referencing an earlier snapshot is in fact
+using them. The more often that an index is either updated or searched,
+the more likely it is that this conflict will arise from time to time.</p>
+<p>Ordinarily, the deletion attempts are not a problem. On a typical unix
+volume, the files will be deleted in name only: any process which holds an
+open filehandle against a given file will continue to have access, and the
+file won’t actually get vaporized until the last filehandle is cleared.
+Thanks to “delete on last close semantics”, an Indexer can’t truly delete
+the file out from underneath an active Searcher. On Windows, where file
+deletion fails whenever any process holds an open handle, the situation is
+different but still workable: Indexer just keeps retrying after each commit
+until deletion finally succeeds.</p>
+<p>On NFS, however, the system breaks, because NFS allows files to be deleted
+out from underneath active processes. Should this happen, the unlucky read
+process will crash with a “Stale NFS filehandle” exception.</p>
+<p>Under normal circumstances, it is neither necessary nor desirable for
+IndexReaders to secure read locks against an index, but for NFS we have to
+make an exception. LockFactory’s <a href="lucy:LockFactory.Make_Shared_Lock"></a> method exists for this
+reason; supplying an IndexManager instance to IndexReader’s constructor
+activates an internal locking mechanism using <a href="lucy:LockFactory.Make_Shared_Lock"></a> which
+prevents concurrent indexing processes from deleting files that are needed
+by active readers.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>Since shared locks are implemented using lockfiles located in the index
+directory (as are exclusive locks), reader applications must have write
+access for read locking to work. Stale lock files from crashed processes
+are ordinarily cleared away the next time the same machine – as identified
+by the <code>host</code> parameter – opens another IndexReader. (The
+classic technique of timing out lock files is not feasible because search
+processes may lie dormant indefinitely.) However, please be aware that if
+the last thing a given machine does is crash, lock files belonging to it
+may persist, preventing deletion of obsolete index data.</p>
+</div>
+
+ </div> <!-- lucy-main_content_box -->
+ <div class="clear"></div>
+
+ </div> <!-- lucy-main_content -->
+
+ <div id="lucy-copyright" class="container_16">
+ <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br/>
+ Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+ Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
+ respective owners.
+ </p>
+ </div> <!-- lucy-copyright -->
+
+ </div> <!-- lucy-rigid_wrapper -->
+
+ </body>
+</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,133 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ <title>Lucy::Docs::IRTheory</title>
+ <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+ </head>
+
+ <body>
+
+ <div id="lucy-rigid_wrapper">
+
+ <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+ <div id="lucy-logo_box" class="grid_8">
+ <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+ </div> <!-- lucy-logo_box -->
+
+ <div #id="lucy-top_nav_box" class="grid_8">
+ <div id="lucy-top_nav_bar" class="container_8">
+ <ul>
+ <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+ <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+ <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+ <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+ </ul>
+ </div> <!-- lucy-top_nav_bar -->
+ <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/c/">C</a> » <a href="/docs/0.5.0/c/Lucy/">Lucy</a> » <a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+ <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+ <input value="*.apache.org" name="sitesearch" type="hidden"/>
+ <input type="text" name="q" id="query" style="width:85%">
+ <input type="submit" id="submit" value="Search">
+ </form>
+ </div> <!-- lucy-top_nav_box -->
+
+ <div class="clear"></div>
+
+ </div> <!-- lucy-top -->
+
+ <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+ <div class="grid_4" id="lucy-left_nav_box">
+ <h6>About</h6>
+ <ul>
+ <li><a href="/">Welcome</a></li>
+ <li><a href="/clownfish.html">Clownfish</a></li>
+ <li><a href="/faq.html">FAQ</a></li>
+ <li><a href="/people.html">People</a></li>
+ </ul>
+ <h6>Resources</h6>
+ <ul>
+ <li><a href="/download.html">Download</a></li>
+ <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+ <li><a href="/docs/">Documentation</a></li>
+ <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+ <li><a href="/version_control.html">Version Control</a></li>
+ </ul>
+ <h6>Related Projects</h6>
+ <ul>
+ <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+ <li><a href="http://dezi.org/">Dezi</a></li>
+ <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+ <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+ <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+ </ul>
+ </div> <!-- lucy-left_nav_box -->
+
+ <div id="lucy-main_content_box" class="grid_9">
+ <div class="c-api">
+<h2>Crash course in information retrieval</h2>
+<p>Just enough Information Retrieval theory to find your way around Apache Lucy.</p>
+<h3>Terminology</h3>
+<p>Lucy uses some terminology from the field of information retrieval which
+may be unfamiliar to many users. “Document” and “term” mean pretty much what
+you’d expect them to, but others such as “posting” and “inverted index” need a
+formal introduction:</p>
+<ul>
+<li><em>document</em> - An atomic unit of retrieval.</li>
+<li><em>term</em> - An attribute which describes a document.</li>
+<li><em>posting</em> - One term indexing one document.</li>
+<li><em>term list</em> - The complete list of terms which describe a document.</li>
+<li><em>posting list</em> - The complete list of documents which a term indexes.</li>
+<li><em>inverted index</em> - A data structure which maps from terms to documents.</li>
+</ul>
+<p>Since Lucy is a practical implementation of IR theory, it loads these
+abstract, distilled definitions down with useful traits. For instance, a
+“posting” in its most rarefied form is simply a term-document pairing; in
+Lucy, the class MatchPosting fills this
+role. However, by associating additional information with a posting like the
+number of times the term occurs in the document, we can turn it into a
+ScorePosting, making it possible
+to rank documents by relevance rather than just list documents which happen to
+match in no particular order.</p>
+<h3>TF/IDF ranking algorithm</h3>
+<p>Lucy uses a variant of the well-established “Term Frequency / Inverse
+Document Frequency” weighting scheme. A thorough treatment of TF/IDF is too
+ambitious for our present purposes, but in a nutshell, it means that…</p>
+<ul>
+<li>
+<p>in a search for <code>skate park</code>, documents which score well for the
+comparatively rare term <code>skate</code> will rank higher than documents which score
+well for the more common term <code>park</code>.</p>
+</li>
+<li>
+<p>a 10-word text which has one occurrence each of both <code>skate</code> and <code>park</code> will
+rank higher than a 1000-word text which also contains one occurrence of each.</p>
+</li>
+</ul>
+<p>A web search for “tf idf” will turn up many excellent explanations of the
+algorithm.</p>
+</div>
+
+ </div> <!-- lucy-main_content_box -->
+ <div class="clear"></div>
+
+ </div> <!-- lucy-main_content -->
+
+ <div id="lucy-copyright" class="container_16">
+ <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br/>
+ Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+ Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
+ respective owners.
+ </p>
+ </div> <!-- lucy-copyright -->
+
+ </div> <!-- lucy-rigid_wrapper -->
+
+ </body>
+</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,142 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ <title>Lucy::Docs::Tutorial</title>
+ <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+ </head>
+
+ <body>
+
+ <div id="lucy-rigid_wrapper">
+
+ <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+ <div id="lucy-logo_box" class="grid_8">
+ <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+ </div> <!-- lucy-logo_box -->
+
+ <div #id="lucy-top_nav_box" class="grid_8">
+ <div id="lucy-top_nav_bar" class="container_8">
+ <ul>
+ <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+ <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+ <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+ <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+ </ul>
+ </div> <!-- lucy-top_nav_bar -->
+ <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/c/">C</a> » <a href="/docs/0.5.0/c/Lucy/">Lucy</a> » <a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a></p>
+ <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+ <input value="*.apache.org" name="sitesearch" type="hidden"/>
+ <input type="text" name="q" id="query" style="width:85%">
+ <input type="submit" id="submit" value="Search">
+ </form>
+ </div> <!-- lucy-top_nav_box -->
+
+ <div class="clear"></div>
+
+ </div> <!-- lucy-top -->
+
+ <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+ <div class="grid_4" id="lucy-left_nav_box">
+ <h6>About</h6>
+ <ul>
+ <li><a href="/">Welcome</a></li>
+ <li><a href="/clownfish.html">Clownfish</a></li>
+ <li><a href="/faq.html">FAQ</a></li>
+ <li><a href="/people.html">People</a></li>
+ </ul>
+ <h6>Resources</h6>
+ <ul>
+ <li><a href="/download.html">Download</a></li>
+ <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+ <li><a href="/docs/">Documentation</a></li>
+ <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+ <li><a href="/version_control.html">Version Control</a></li>
+ </ul>
+ <h6>Related Projects</h6>
+ <ul>
+ <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+ <li><a href="http://dezi.org/">Dezi</a></li>
+ <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+ <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+ <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+ </ul>
+ </div> <!-- lucy-left_nav_box -->
+
+ <div id="lucy-main_content_box" class="grid_9">
+ <div class="c-api">
+<h2>Step-by-step introduction to Apache Lucy.</h2>
+<p>Explore Apache Lucy’s basic functionality by starting with a minimalist CGI
+search app based on Lucy::Simple and transforming it, step by step,
+into an “advanced search” interface utilizing more flexible core modules like
+<a href="../../Lucy/Index/Indexer.html">Indexer</a> and <a href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a>.</p>
+<h3>Chapters</h3>
+<ul>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/SimpleTutorial.html">SimpleTutorial</a> - Build a bare-bones search app using
+Lucy::Simple.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html">BeyondSimpleTutorial</a> - Rebuild the app using core
+classes like <a href="../../Lucy/Index/Indexer.html">Indexer</a> and
+<a href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> in place of Lucy::Simple.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/FieldTypeTutorial.html">FieldTypeTutorial</a> - Experiment with different field
+characteristics using subclasses of <a href="../../Lucy/Plan/FieldType.html">FieldType</a>.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/AnalysisTutorial.html">AnalysisTutorial</a> - Examine how the choice of
+<a href="../../Lucy/Analysis/Analyzer.html">Analyzer</a> subclass affects search results.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a> - Augment search results with
+highlighted excerpts.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html">QueryObjectsTutorial</a> - Unlock advanced search features
+by using Query objects instead of query strings.</p>
+</li>
+</ul>
+<h3>Source materials</h3>
+<p>The source material used by the tutorial app – a multi-text-file presentation
+of the United States constitution – can be found in the <code>sample</code> directory
+at the root of the Lucy distribution, along with finished indexing and search
+apps.</p>
+<pre><code class="language-c">sample/indexer_simple.c # simple indexing executable
+sample/search_simple.c # simple search executable
+sample/indexer.c # indexing executable
+sample/search.c # search executable
+sample/us_constitution # corpus
+</code></pre>
+<h3>Conventions</h3>
+<p>The user is expected to be familiar with OO Perl and basic CGI programming.</p>
+<p>The code in this tutorial assumes a Unix-flavored operating system and the
+Apache webserver, but will work with minor modifications on other setups.</p>
+<h3>See also</h3>
+<p>More advanced and esoteric subjects are covered in <a href="../../Lucy/Docs/Cookbook.html">Cookbook</a>.</p>
+</div>
+
+ </div> <!-- lucy-main_content_box -->
+ <div class="clear"></div>
+
+ </div> <!-- lucy-main_content -->
+
+ <div id="lucy-copyright" class="container_16">
+ <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br/>
+ Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+ Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
+ respective owners.
+ </p>
+ </div> <!-- lucy-copyright -->
+
+ </div> <!-- lucy-rigid_wrapper -->
+
+ </body>
+</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,152 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ <title>Lucy::Docs::Tutorial::AnalysisTutorial</title>
+ <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+ </head>
+
+ <body>
+
+ <div id="lucy-rigid_wrapper">
+
+ <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+ <div id="lucy-logo_box" class="grid_8">
+ <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+ </div> <!-- lucy-logo_box -->
+
+ <div #id="lucy-top_nav_box" class="grid_8">
+ <div id="lucy-top_nav_bar" class="container_8">
+ <ul>
+ <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+ <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+ <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+ <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+ </ul>
+ </div> <!-- lucy-top_nav_bar -->
+ <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/c/">C</a> » <a href="/docs/0.5.0/c/Lucy/">Lucy</a> » <a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+ <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+ <input value="*.apache.org" name="sitesearch" type="hidden"/>
+ <input type="text" name="q" id="query" style="width:85%">
+ <input type="submit" id="submit" value="Search">
+ </form>
+ </div> <!-- lucy-top_nav_box -->
+
+ <div class="clear"></div>
+
+ </div> <!-- lucy-top -->
+
+ <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+ <div class="grid_4" id="lucy-left_nav_box">
+ <h6>About</h6>
+ <ul>
+ <li><a href="/">Welcome</a></li>
+ <li><a href="/clownfish.html">Clownfish</a></li>
+ <li><a href="/faq.html">FAQ</a></li>
+ <li><a href="/people.html">People</a></li>
+ </ul>
+ <h6>Resources</h6>
+ <ul>
+ <li><a href="/download.html">Download</a></li>
+ <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+ <li><a href="/docs/">Documentation</a></li>
+ <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+ <li><a href="/version_control.html">Version Control</a></li>
+ </ul>
+ <h6>Related Projects</h6>
+ <ul>
+ <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+ <li><a href="http://dezi.org/">Dezi</a></li>
+ <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+ <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+ <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+ </ul>
+ </div> <!-- lucy-left_nav_box -->
+
+ <div id="lucy-main_content_box" class="grid_9">
+ <div class="c-api">
+<h2>How to choose and use Analyzers.</h2>
+<p>Try swapping out the EasyAnalyzer in our Schema for a
+<a href="../../../Lucy/Analysis/StandardTokenizer.html">StandardTokenizer</a>:</p>
+<pre><code class="language-c"> StandardTokenizer *tokenizer = StandardTokenizer_new();
+ FullTextType *type = FullTextType_new((Analyzer*)tokenizer);
+</code></pre>
+<p>Search for <code>senate</code>, <code>Senate</code>, and <code>Senator</code> before and after making the
+change and re-indexing.</p>
+<p>Under EasyAnalyzer, the results are identical for all three searches, but
+under StandardTokenizer, searches are case-sensitive, and the result sets for
+<code>Senate</code> and <code>Senator</code> are distinct.</p>
+<h3>EasyAnalyzer</h3>
+<p>What’s happening is that <a href="../../../Lucy/Analysis/EasyAnalyzer.html">EasyAnalyzer</a> is performing more aggressive
+processing than StandardTokenizer. In addition to tokenizing, it’s also
+converting all text to lower case so that searches are case-insensitive, and
+using a “stemming” algorithm to reduce related words to a common stem (<code>senat</code>,
+in this case).</p>
+<p>EasyAnalyzer is actually multiple Analyzers wrapped up in a single package.
+In this case, it’s three-in-one, since specifying a EasyAnalyzer with
+<code>language => 'en'</code> is equivalent to this snippet creating a
+<a href="../../../Lucy/Analysis/PolyAnalyzer.html">PolyAnalyzer</a>:</p>
+<pre><code class="language-c"> Vector *analyzers = Vec_new(3);
+ Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+ Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+ Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language));
+
+ PolyAnalyzer *analyzer = PolyAnalyzer_new(NULL, analyzers);
+ DECREC(analyzers);
+</code></pre>
+<p>You can add or subtract Analyzers from there if you like. Try adding a fourth
+Analyzer, a SnowballStopFilter for suppressing “stopwords” like “the”, “if”,
+and “maybe”.</p>
+<pre><code class="language-c"> Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+ Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+ Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language));
+ Vec_Push(analyzers, (Analyzer*)SnowStop_new(language, NULL));
+</code></pre>
+<p>Also, try removing the SnowballStemmer.</p>
+<pre><code class="language-c"> Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+ Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+</code></pre>
+<p>The original choice of a stock English EasyAnalyzer probably still yields the
+best results for this document collection, but you get the idea: sometimes you
+want a different Analyzer.</p>
+<h3>When the best Analyzer is no Analyzer</h3>
+<p>Sometimes you don’t want an Analyzer at all. That was true for our “url”
+field because we didn’t need it to be searchable, but it’s also true for
+certain types of searchable fields. For instance, “category” fields are often
+set up to match exactly or not at all, as are fields like “last_name” (because
+you may not want to conflate results for “Humphrey” and “Humphries”).</p>
+<p>To specify that there should be no analysis performed at all, use StringType:</p>
+<pre><code class="language-c"> String *name = Str_newf("category");
+ StringType *type = StringType_new();
+ Schema_Spec_Field(schema, name, (FieldType*)type);
+ DECREF(type);
+ DECREF(name);
+</code></pre>
+<h3>Highlighting up next</h3>
+<p>In our next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a>,
+we’ll add highlighted excerpts from the “content” field to our search results.</p>
+</div>
+
+ </div> <!-- lucy-main_content_box -->
+ <div class="clear"></div>
+
+ </div> <!-- lucy-main_content -->
+
+ <div id="lucy-copyright" class="container_16">
+ <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br/>
+ Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+ Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
+ respective owners.
+ </p>
+ </div> <!-- lucy-copyright -->
+
+ </div> <!-- lucy-rigid_wrapper -->
+
+ </body>
+</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/BeyondSimpleTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,296 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ <title>Lucy::Docs::Tutorial::BeyondSimpleTutorial</title>
+ <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+ </head>
+
+ <body>
+
+ <div id="lucy-rigid_wrapper">
+
+ <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+ <div id="lucy-logo_box" class="grid_8">
+ <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+ </div> <!-- lucy-logo_box -->
+
+ <div #id="lucy-top_nav_box" class="grid_8">
+ <div id="lucy-top_nav_bar" class="container_8">
+ <ul>
+ <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+ <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+ <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+ <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+ </ul>
+ </div> <!-- lucy-top_nav_bar -->
+ <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/c/">C</a> » <a href="/docs/0.5.0/c/Lucy/">Lucy</a> » <a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+ <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+ <input value="*.apache.org" name="sitesearch" type="hidden"/>
+ <input type="text" name="q" id="query" style="width:85%">
+ <input type="submit" id="submit" value="Search">
+ </form>
+ </div> <!-- lucy-top_nav_box -->
+
+ <div class="clear"></div>
+
+ </div> <!-- lucy-top -->
+
+ <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+ <div class="grid_4" id="lucy-left_nav_box">
+ <h6>About</h6>
+ <ul>
+ <li><a href="/">Welcome</a></li>
+ <li><a href="/clownfish.html">Clownfish</a></li>
+ <li><a href="/faq.html">FAQ</a></li>
+ <li><a href="/people.html">People</a></li>
+ </ul>
+ <h6>Resources</h6>
+ <ul>
+ <li><a href="/download.html">Download</a></li>
+ <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+ <li><a href="/docs/">Documentation</a></li>
+ <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+ <li><a href="/version_control.html">Version Control</a></li>
+ </ul>
+ <h6>Related Projects</h6>
+ <ul>
+ <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+ <li><a href="http://dezi.org/">Dezi</a></li>
+ <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+ <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+ <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+ </ul>
+ </div> <!-- lucy-left_nav_box -->
+
+ <div id="lucy-main_content_box" class="grid_9">
+ <div class="c-api">
+<h2>A more flexible app structure.</h2>
+<h3>Goal</h3>
+<p>In this tutorial chapter, we’ll refactor the apps we built in
+<a href="../../../Lucy/Docs/Tutorial/SimpleTutorial.html">SimpleTutorial</a> so that they look exactly the same from
+the end user’s point of view, but offer the developer greater possibilites for
+expansion.</p>
+<p>To achieve this, we’ll ditch Lucy::Simple and replace it with the
+classes that it uses internally:</p>
+<ul>
+<li><a href="../../../Lucy/Plan/Schema.html">Schema</a> - Plan out your index.</li>
+<li><a href="../../../Lucy/Plan/FullTextType.html">FullTextType</a> - Field type for full text search.</li>
+<li><a href="../../../Lucy/Analysis/EasyAnalyzer.html">EasyAnalyzer</a> - A one-size-fits-all parser/tokenizer.</li>
+<li><a href="../../../Lucy/Index/Indexer.html">Indexer</a> - Manipulate index content.</li>
+<li><a href="../../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> - Search an index.</li>
+<li><a href="../../../Lucy/Search/Hits.html">Hits</a> - Iterate over hits returned by a Searcher.</li>
+</ul>
+<h3>Adaptations to indexer.pl</h3>
+<p>After we load our modules…</p>
+<pre><code class="language-c">#include <dirent.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#define CFISH_USE_SHORT_NAMES
+#define LUCY_USE_SHORT_NAMES
+#include "Clownfish/String.h"
+#include "Lucy/Analysis/EasyAnalyzer.h"
+#include "Lucy/Document/Doc.h"
+#include "Lucy/Index/Indexer.h"
+#include "Lucy/Plan/FullTextType.h"
+#include "Lucy/Plan/StringType.h"
+#include "Lucy/Plan/Schema.h"
+
+const char path_to_index[] = "/path/to/index";
+const char uscon_source[] = "/usr/local/apache2/htdocs/us_constitution";
+</code></pre>
+<p>… the first item we’re going need is a <a href="../../../Lucy/Plan/Schema.html">Schema</a>.</p>
+<p>The primary job of a Schema is to specify what fields are available and how
+they’re defined. We’ll start off with three fields: title, content and url.</p>
+<pre><code class="language-c">static Schema*
+S_create_schema() {
+ // Create a new schema.
+ Schema *schema = Schema_new();
+
+ // Create an analyzer.
+ String *language = Str_newf("en");
+ EasyAnalyzer *analyzer = EasyAnalyzer_new(language);
+
+ // Specify fields.
+
+ FullTextType *type = FullTextType_new((Analyzer*)analyzer);
+
+ {
+ String *field_str = Str_newf("title");
+ Schema_Spec_Field(schema, field_str, (FieldType*)type);
+ DECREF(field_str);
+ }
+
+ {
+ String *field_str = Str_newf("content");
+ Schema_Spec_Field(schema, field_str, (FieldType*)type);
+ DECREF(field_str);
+ }
+
+ {
+ String *field_str = Str_newf("url");
+ Schema_Spec_Field(schema, field_str, (FieldType*)type);
+ DECREF(field_str);
+ }
+
+ DECREF(type);
+ DECREF(analyzer);
+ DECREF(language);
+ return schema;
+}
+</code></pre>
+<p>All of the fields are spec’d out using the <a href="../../../Lucy/Plan/FullTextType.html">FullTextType</a> FieldType,
+indicating that they will be searchable as “full text” – which means that
+they can be searched for individual words. The “analyzer”, which is unique to
+FullTextType fields, is what breaks up the text into searchable tokens.</p>
+<p>Next, we’ll swap our Lucy::Simple object out for an <a href="../../../Lucy/Index/Indexer.html">Indexer</a>.
+The substitution will be straightforward because Simple has merely been
+serving as a thin wrapper around an inner Indexer, and we’ll just be peeling
+away the wrapper.</p>
+<p>First, replace the constructor:</p>
+<pre><code class="language-c">int
+main() {
+ // Initialize the library.
+ lucy_bootstrap_parcel();
+
+ Schema *schema = S_create_schema();
+ String *folder = Str_newf("%s", path_to_index);
+
+ Indexer *indexer = Indexer_new(schema, (Obj*)folder, NULL,
+ Indexer_CREATE | Indexer_TRUNCATE);
+
+</code></pre>
+<p>Next, have the <code>indexer</code> object <a href="../../../Lucy/Index/Indexer.html#func_Add_Doc">Add_Doc()</a> where we
+were having the <code>lucy</code> object adding the document before:</p>
+<pre><code class="language-c"> DIR *dir = opendir(uscon_source);
+ if (dir == NULL) {
+ perror(uscon_source);
+ return 1;
+ }
+
+ for (struct dirent *entry = readdir(dir);
+ entry;
+ entry = readdir(dir)) {
+
+ if (S_ends_with(entry->d_name, ".txt")) {
+ Doc *doc = S_parse_file(entry->d_name);
+ Indexer_Add_Doc(indexer, doc, 1.0);
+ DECREF(doc);
+ }
+ }
+
+ closedir(dir);
+</code></pre>
+<p>There’s only one extra step required: at the end of the app, you must call
+commit() explicitly to close the indexing session and commit your changes.
+(Lucy::Simple hides this detail, calling commit() implicitly when it needs to).</p>
+<pre><code class="language-c"> Indexer_Commit(indexer);
+
+ DECREF(indexer);
+ DECREF(folder);
+ DECREF(schema);
+ return 0;
+}
+</code></pre>
+<h3>Adaptations to search.cgi</h3>
+<p>In our search app as in our indexing app, Lucy::Simple has served as a
+thin wrapper – this time around <a href="../../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> and
+<a href="../../../Lucy/Search/Hits.html">Hits</a>. Swapping out Simple for these two classes is
+also straightforward:</p>
+<pre><code class="language-c">#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#define CFISH_USE_SHORT_NAMES
+#define LUCY_USE_SHORT_NAMES
+#include "Clownfish/String.h"
+#include "Lucy/Document/HitDoc.h"
+#include "Lucy/Search/Hits.h"
+#include "Lucy/Search/IndexSearcher.h"
+
+const char path_to_index[] = "/path/to/index";
+
+int
+main(int argc, char *argv[]) {
+ // Initialize the library.
+ lucy_bootstrap_parcel();
+
+ if (argc < 2) {
+ printf("Usage: %s <querystring>\n", argv[0]);
+ return 0;
+ }
+
+ const char *query_c = argv[1];
+
+ printf("Searching for: %s\n\n", query_c);
+
+ String *folder = Str_newf("%s", path_to_index);
+ IndexSearcher *searcher = IxSearcher_new((Obj*)folder);
+
+ String *query_str = Str_newf("%s", query_c);
+ Hits *hits = IxSearcher_Hits(searcher, (Obj*)query_str, 0, 10, NULL);
+
+ String *title_str = Str_newf("title");
+ String *url_str = Str_newf("url");
+ HitDoc *hit;
+ int i = 1;
+
+ // Loop over search results.
+ while (NULL != (hit = Hits_Next(hits))) {
+ String *title = (String*)HitDoc_Extract(hit, title_str);
+ char *title_c = Str_To_Utf8(title);
+
+ String *url = (String*)HitDoc_Extract(hit, url_str);
+ char *url_c = Str_To_Utf8(url);
+
+ printf("Result %d: %s (%s)\n", i, title_c, url_c);
+
+ free(url_c);
+ free(title_c);
+ DECREF(url);
+ DECREF(title);
+ DECREF(hit);
+ i++;
+ }
+
+ DECREF(url_str);
+ DECREF(title_str);
+ DECREF(hits);
+ DECREF(query_str);
+ DECREF(searcher);
+ DECREF(folder);
+ return 0;
+}
+</code></pre>
+<h3>Hooray!</h3>
+<p>Congratulations! Your apps do the same thing as before… but now they’ll be
+easier to customize.</p>
+<p>In our next chapter, <a href="../../../Lucy/Docs/Tutorial/FieldTypeTutorial.html">FieldTypeTutorial</a>, we’ll explore
+how to assign different behaviors to different fields.</p>
+</div>
+
+ </div> <!-- lucy-main_content_box -->
+ <div class="clear"></div>
+
+ </div> <!-- lucy-main_content -->
+
+ <div id="lucy-copyright" class="container_16">
+ <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br/>
+ Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+ Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
+ respective owners.
+ </p>
+ </div> <!-- lucy-copyright -->
+
+ </div> <!-- lucy-rigid_wrapper -->
+
+ </body>
+</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/FieldTypeTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,151 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ <title>Lucy::Docs::Tutorial::FieldTypeTutorial</title>
+ <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+ </head>
+
+ <body>
+
+ <div id="lucy-rigid_wrapper">
+
+ <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+ <div id="lucy-logo_box" class="grid_8">
+ <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+ </div> <!-- lucy-logo_box -->
+
+ <div #id="lucy-top_nav_box" class="grid_8">
+ <div id="lucy-top_nav_bar" class="container_8">
+ <ul>
+ <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+ <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+ <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+ <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+ </ul>
+ </div> <!-- lucy-top_nav_bar -->
+ <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/c/">C</a> » <a href="/docs/0.5.0/c/Lucy/">Lucy</a> » <a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+ <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+ <input value="*.apache.org" name="sitesearch" type="hidden"/>
+ <input type="text" name="q" id="query" style="width:85%">
+ <input type="submit" id="submit" value="Search">
+ </form>
+ </div> <!-- lucy-top_nav_box -->
+
+ <div class="clear"></div>
+
+ </div> <!-- lucy-top -->
+
+ <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+ <div class="grid_4" id="lucy-left_nav_box">
+ <h6>About</h6>
+ <ul>
+ <li><a href="/">Welcome</a></li>
+ <li><a href="/clownfish.html">Clownfish</a></li>
+ <li><a href="/faq.html">FAQ</a></li>
+ <li><a href="/people.html">People</a></li>
+ </ul>
+ <h6>Resources</h6>
+ <ul>
+ <li><a href="/download.html">Download</a></li>
+ <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+ <li><a href="/docs/">Documentation</a></li>
+ <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+ <li><a href="/version_control.html">Version Control</a></li>
+ </ul>
+ <h6>Related Projects</h6>
+ <ul>
+ <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+ <li><a href="http://dezi.org/">Dezi</a></li>
+ <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+ <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+ <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+ </ul>
+ </div> <!-- lucy-left_nav_box -->
+
+ <div id="lucy-main_content_box" class="grid_9">
+ <div class="c-api">
+<h2>Specify per-field properties and behaviors.</h2>
+<p>The Schema we used in the last chapter specifies three fields:</p>
+<pre><code class="language-c"> FullTextType *type = FullTextType_new((Analyzer*)analyzer);
+
+ {
+ String *field_str = Str_newf("title");
+ Schema_Spec_Field(schema, field_str, (FieldType*)type);
+ DECREF(field_str);
+ }
+
+ {
+ String *field_str = Str_newf("content");
+ Schema_Spec_Field(schema, field_str, (FieldType*)type);
+ DECREF(field_str);
+ }
+
+ {
+ String *field_str = Str_newf("url");
+ Schema_Spec_Field(schema, field_str, (FieldType*)type);
+ DECREF(field_str);
+ }
+
+</code></pre>
+<p>Since they are all defined as “full text” fields, they are all searchable –
+including the <code>url</code> field, a dubious choice. Some URLs contain meaningful
+information, but these don’t, really:</p>
+<pre><code>http://example.com/us_constitution/amend1.txt
+</code></pre>
+<p>We may as well not bother indexing the URL content. To achieve that we need
+to assign the <code>url</code> field to a different FieldType.</p>
+<h3>StringType</h3>
+<p>Instead of FullTextType, we’ll use a
+<a href="../../../Lucy/Plan/StringType.html">StringType</a>, which doesn’t use an
+Analyzer to break up text into individual fields. Furthermore, we’ll mark
+this StringType as unindexed, so that its content won’t be searchable at all.</p>
+<pre><code class="language-c"> {
+ String *field_str = Str_newf("url");
+ StringType *type = StringType_new();
+ StringType_Set_Indexed(type, false);
+ Schema_Spec_Field(schema, field_str, (FieldType*)type);
+ DECREF(type);
+ DECREF(field_str);
+ }
+</code></pre>
+<p>To observe the change in behavior, try searching for <code>us_constitution</code> both
+before and after changing the Schema and re-indexing.</p>
+<h3>Toggling ‘stored’</h3>
+<p>For a taste of other FieldType possibilities, try turning off <code>stored</code> for
+one or more fields.</p>
+<pre><code class="language-c"> FullTextType *content_type = FullTextType_new((Analyzer*)analyzer);
+ FullTextType_Set_Stored(content_type, false);
+</code></pre>
+<p>Turning off <code>stored</code> for either <code>title</code> or <code>url</code> mangles our results page,
+but since we’re not displaying <code>content</code>, turning it off for <code>content</code> has
+no effect – except on index size.</p>
+<h3>Analyzers up next</h3>
+<p>Analyzers play a crucial role in the behavior of FullTextType fields. In our
+next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/AnalysisTutorial.html">AnalysisTutorial</a>, we’ll see how
+changing up the Analyzer changes search results.</p>
+</div>
+
+ </div> <!-- lucy-main_content_box -->
+ <div class="clear"></div>
+
+ </div> <!-- lucy-main_content -->
+
+ <div id="lucy-copyright" class="container_16">
+ <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br/>
+ Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+ Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
+ respective owners.
+ </p>
+ </div> <!-- lucy-copyright -->
+
+ </div> <!-- lucy-rigid_wrapper -->
+
+ </body>
+</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/HighlighterTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/HighlighterTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/HighlighterTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,160 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ <title>Lucy::Docs::Tutorial::HighlighterTutorial</title>
+ <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+ </head>
+
+ <body>
+
+ <div id="lucy-rigid_wrapper">
+
+ <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+ <div id="lucy-logo_box" class="grid_8">
+ <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+ </div> <!-- lucy-logo_box -->
+
+ <div #id="lucy-top_nav_box" class="grid_8">
+ <div id="lucy-top_nav_bar" class="container_8">
+ <ul>
+ <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+ <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+ <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+ <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+ </ul>
+ </div> <!-- lucy-top_nav_bar -->
+ <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/c/">C</a> » <a href="/docs/0.5.0/c/Lucy/">Lucy</a> » <a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+ <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+ <input value="*.apache.org" name="sitesearch" type="hidden"/>
+ <input type="text" name="q" id="query" style="width:85%">
+ <input type="submit" id="submit" value="Search">
+ </form>
+ </div> <!-- lucy-top_nav_box -->
+
+ <div class="clear"></div>
+
+ </div> <!-- lucy-top -->
+
+ <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+ <div class="grid_4" id="lucy-left_nav_box">
+ <h6>About</h6>
+ <ul>
+ <li><a href="/">Welcome</a></li>
+ <li><a href="/clownfish.html">Clownfish</a></li>
+ <li><a href="/faq.html">FAQ</a></li>
+ <li><a href="/people.html">People</a></li>
+ </ul>
+ <h6>Resources</h6>
+ <ul>
+ <li><a href="/download.html">Download</a></li>
+ <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+ <li><a href="/docs/">Documentation</a></li>
+ <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+ <li><a href="/version_control.html">Version Control</a></li>
+ </ul>
+ <h6>Related Projects</h6>
+ <ul>
+ <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+ <li><a href="http://dezi.org/">Dezi</a></li>
+ <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+ <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+ <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+ </ul>
+ </div> <!-- lucy-left_nav_box -->
+
+ <div id="lucy-main_content_box" class="grid_9">
+ <div class="c-api">
+<h2>Augment search results with highlighted excerpts.</h2>
+<p>Adding relevant excerpts with highlighted search terms to your search results
+display makes it much easier for end users to scan the page and assess which
+hits look promising, dramatically improving their search experience.</p>
+<h3>Adaptations to indexer.pl</h3>
+<p><a href="../../../Lucy/Highlight/Highlighter.html">Highlighter</a> uses information generated at index
+time. To save resources, highlighting is disabled by default and must be
+turned on for individual fields.</p>
+<pre><code class="language-c"> {
+ String *field_str = Str_newf("content");
+ FullTextType *type = FullTextType_new((Analyzer*)analyzer);
+ FullTextType_Set_Highlightable(type, true);
+ Schema_Spec_Field(schema, field_str, (FieldType*)type);
+ DECREF(type);
+ DECREF(field_str);
+ }
+</code></pre>
+<h3>Adaptations to search.cgi</h3>
+<p>To add highlighting and excerpting to the search.cgi sample app, create a
+<code>$highlighter</code> object outside the hits iterating loop…</p>
+<pre><code class="language-c"> String *content_str = Str_newf("content");
+ Highlighter *highlighter
+ = Highlighter_new((Searcher*)searcher, (Obj*)query,
+ content_str, 200);
+</code></pre>
+<p>… then modify the loop and the per-hit display to generate and include the
+excerpt.</p>
+<pre><code class="language-c"> String *title_str = Str_newf("title");
+ String *url_str = Str_newf("url");
+ HitDoc *hit;
+ i = 1;
+
+ // Loop over search results.
+ while (NULL != (hit = Hits_Next(hits))) {
+ String *title = (String*)HitDoc_Extract(hit, title_str);
+ char *title_c = Str_To_Utf8(title);
+
+ String *url = (String*)HitDoc_Extract(hit, url_str);
+ char *url_c = Str_To_Utf8(url);
+
+ String *excerpt = Highlighter_Create_Excerpt(highlighter, hit);
+ char *excerpt_c = Str_To_Utf8(excerpt);
+
+ printf("Result %d: %s (%s)\n%s\n\n", i, title_c, url_c, excerpt_c);
+
+ free(excerpt_c);
+ free(url_c);
+ free(title_c);
+ DECREF(excerpt);
+ DECREF(url);
+ DECREF(title);
+ DECREF(hit);
+ i++;
+ }
+
+ DECREF(url_str);
+ DECREF(title_str);
+ DECREF(hits);
+ DECREF(query_str);
+ DECREF(highlighter);
+ DECREF(content_str);
+ DECREF(searcher);
+ DECREF(folder);
+</code></pre>
+<h3>Next chapter: Query objects</h3>
+<p>Our next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html">QueryObjectsTutorial</a>,
+illustrates how to build an “advanced search” interface using
+<a href="../../../Lucy/Search/Query.html">Query</a> objects instead of query strings.</p>
+</div>
+
+ </div> <!-- lucy-main_content_box -->
+ <div class="clear"></div>
+
+ </div> <!-- lucy-main_content -->
+
+ <div id="lucy-copyright" class="container_16">
+ <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br/>
+ Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+ Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
+ respective owners.
+ </p>
+ </div> <!-- lucy-copyright -->
+
+ </div> <!-- lucy-rigid_wrapper -->
+
+ </body>
+</html>
Added: websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html (added)
+++ websites/staging/lucy/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/QueryObjectsTutorial.html Wed Sep 28 12:07:48 2016
@@ -0,0 +1,269 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+ <head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ <title>Lucy::Docs::Tutorial::QueryObjectsTutorial</title>
+ <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+ </head>
+
+ <body>
+
+ <div id="lucy-rigid_wrapper">
+
+ <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+ <div id="lucy-logo_box" class="grid_8">
+ <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucy™"></a>
+ </div> <!-- lucy-logo_box -->
+
+ <div #id="lucy-top_nav_box" class="grid_8">
+ <div id="lucy-top_nav_bar" class="container_8">
+ <ul>
+ <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li>
+ <li><a href="http://www.apache.org/licenses/" title="License">License</a></li>
+ <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li>
+ <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li>
+ </ul>
+ </div> <!-- lucy-top_nav_bar -->
+ <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/0.5.0/">0.5.0</a> » <a href="/docs/0.5.0/c/">C</a> » <a href="/docs/0.5.0/c/Lucy/">Lucy</a> » <a href="/docs/0.5.0/c/Lucy/Docs/">Docs</a> » <a href="/docs/0.5.0/c/Lucy/Docs/Tutorial/">Tutorial</a></p>
+ <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get">
+ <input value="*.apache.org" name="sitesearch" type="hidden"/>
+ <input type="text" name="q" id="query" style="width:85%">
+ <input type="submit" id="submit" value="Search">
+ </form>
+ </div> <!-- lucy-top_nav_box -->
+
+ <div class="clear"></div>
+
+ </div> <!-- lucy-top -->
+
+ <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+ <div class="grid_4" id="lucy-left_nav_box">
+ <h6>About</h6>
+ <ul>
+ <li><a href="/">Welcome</a></li>
+ <li><a href="/clownfish.html">Clownfish</a></li>
+ <li><a href="/faq.html">FAQ</a></li>
+ <li><a href="/people.html">People</a></li>
+ </ul>
+ <h6>Resources</h6>
+ <ul>
+ <li><a href="/download.html">Download</a></li>
+ <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+ <li><a href="/docs/">Documentation</a></li>
+ <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li>
+ <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li>
+ <li><a href="/version_control.html">Version Control</a></li>
+ </ul>
+ <h6>Related Projects</h6>
+ <ul>
+ <li><a href="http://lucene.apache.org/core/">Lucene</a></li>
+ <li><a href="http://dezi.org/">Dezi</a></li>
+ <li><a href="http://lucene.apache.org/solr/">Solr</a></li>
+ <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li>
+ <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li>
+ </ul>
+ </div> <!-- lucy-left_nav_box -->
+
+ <div id="lucy-main_content_box" class="grid_9">
+ <div class="c-api">
+<h2>Use Query objects instead of query strings.</h2>
+<p>Until now, our search app has had only a single search box. In this tutorial
+chapter, we’ll move towards an “advanced search” interface, by adding a
+“category” drop-down menu. Three new classes will be required:</p>
+<ul>
+<li>
+<p><a href="../../../Lucy/Search/QueryParser.html">QueryParser</a> - Turn a query string into a
+<a href="../../../Lucy/Search/Query.html">Query</a> object.</p>
+</li>
+<li>
+<p><a href="../../../Lucy/Search/TermQuery.html">TermQuery</a> - Query for a specific term within
+a specific field.</p>
+</li>
+<li>
+<p><a href="../../../Lucy/Search/ANDQuery.html">ANDQuery</a> - “AND” together multiple Query
+objects to produce an intersected result set.</p>
+</li>
+</ul>
+<h3>Adaptations to indexer.pl</h3>
+<p>Our new “category” field will be a StringType field rather than a FullTextType
+field, because we will only be looking for exact matches. It needs to be
+indexed, but since we won’t display its value, it doesn’t need to be stored.</p>
+<pre><code class="language-c"> {
+ String *field_str = Str_newf("category");
+ StringType *type = StringType_new();
+ StringType_Set_Stored(type, false);
+ Schema_Spec_Field(schema, field_str, (FieldType*)type);
+ DECREF(type);
+ DECREF(field_str);
+ }
+</code></pre>
+<p>There will be three possible values: “article”, “amendment”, and “preamble”,
+which we’ll hack out of the source file’s name during our <code>parse_file</code>
+subroutine:</p>
+<pre><code class="language-c"> const char *category = NULL;
+ if (S_starts_with(filename, "art")) {
+ category = "article";
+ }
+ else if (S_starts_with(filename, "amend")) {
+ category = "amendment";
+ }
+ else if (S_starts_with(filename, "preamble")) {
+ category = "preamble";
+ }
+ else {
+ fprintf(stderr, "Can't derive category for %s", filename);
+ exit(1);
+ }
+
+ ...
+
+ {
+ // Store 'category' field
+ String *field = Str_newf("category");
+ String *value = Str_new_from_utf8(category, strlen(category));
+ Doc_Store(doc, field, (Obj*)value);
+ DECREF(field);
+ DECREF(value);
+ }
+</code></pre>
+<h3>Adaptations to search.cgi</h3>
+<p>The “category” constraint will be added to our search interface using an HTML
+“select” element (this routine will need to be integrated into the HTML
+generation section of search.cgi):</p>
+<pre><code class="language-c">static void
+S_usage_and_exit(const char *arg0) {
+ printf("Usage: %s [-c <category>] <querystring>\n", arg0);
+ exit(1);
+}
+</code></pre>
+<p>We’ll start off by loading our new modules and extracting our new CGI
+parameter.</p>
+<pre><code class="language-c"> const char *category = NULL;
+ int i = 1;
+
+ while (i < argc - 1) {
+ if (strcmp(argv[i], "-c") == 0) {
+ if (i + 1 >= argc) {
+ S_usage_and_exit(argv[0]);
+ }
+ i += 1;
+ category = argv[i];
+ }
+ else {
+ S_usage_and_exit(argv[0]);
+ }
+
+ i += 1;
+ }
+
+ if (i + 1 != argc) {
+ S_usage_and_exit(argv[0]);
+ }
+
+ const char *query_c = argv[i];
+</code></pre>
+<p>QueryParser’s constructor requires a “schema” argument. We can get that from
+our IndexSearcher:</p>
+<pre><code class="language-c"> IndexSearcher *searcher = IxSearcher_new((Obj*)folder);
+ Schema *schema = IxSearcher_Get_Schema(searcher);
+ QueryParser *qparser = QParser_new(schema, NULL, NULL, NULL);
+</code></pre>
+<p>Previously, we have been handing raw query strings to IndexSearcher. Behind
+the scenes, IndexSearcher has been using a QueryParser to turn those query
+strings into Query objects. Now, we will bring QueryParser into the
+foreground and parse the strings explicitly.</p>
+<pre><code class="language-c"> Query *query = QParser_Parse(qparser, query_str);
+</code></pre>
+<p>If the user has specified a category, we’ll use an ANDQuery to join our parsed
+query together with a TermQuery representing the category.</p>
+<pre><code class="language-c"> if (category) {
+ String *category_name = String_newf("category");
+ String *category_str = String_newf("%s", category);
+ TermQuery *category_query
+ = TermQuery_new(category_name, category_str);
+
+ Vector *children = Vec_new(2);
+ Vec_Push(children, (Obj*)query);
+ Vec_Push(children, category_query);
+ query = (Query*)ANDQuery_new(children);
+
+ DECREF(children);
+ DECREF(category_str);
+ DECREF(category_name);
+ }
+}
+</code></pre>
+<p>Now when we execute the query…</p>
+<pre><code class="language-c"> Hits *hits = IxSearcher_Hits(searcher, (Obj*)query, 0, 10, NULL);
+</code></pre>
+<p>… we’ll get a result set which is the intersection of the parsed query and
+the category query.</p>
+<h3>Using TermQuery with full text fields</h3>
+<p>When querying full text fields, the easiest way is to create query objects
+using QueryParser. But sometimes you want to create TermQuery for a single
+term in a FullTextType field directly. In this case, we have to run the
+search term through the field’s analyzer to make sure it gets normalized in
+the same way as the field’s content.</p>
+<pre><code class="language-c">Query*
+make_term_query(Schema *schema, String *field, String *term) {
+ FieldType *type = Schema_Fetch_Type(schema, field);
+ String *token = NULL;
+
+ if (FieldType_is_a(type, FULLTEXTTYPE)) {
+ // Run the term through the full text analysis chain.
+ Analyzer *analyzer = FullTextType_Get_Analyzer((FullTextType*)type);
+ Vector *tokens = Analyzer_Split(analyzer, term);
+
+ if (Vec_Get_Size(tokens) != 1) {
+ // If the term expands to more than one token, or no
+ // tokens at all, it will never match a single token in
+ // the full text field.
+ DECREF(tokens);
+ return (Query*)NoMatchQuery_new();
+ }
+
+ token = (String*)Vec_Delete(tokens, 0);
+ DECREF(tokens);
+ }
+ else {
+ // Exact match for other types.
+ token = (String*)INCREF(term);
+ }
+
+ TermQuery *term_query = TermQuery_new(field, (Obj*)token);
+
+ DECREF(token);
+ return (Query*)term_query;
+}
+</code></pre>
+<h3>Congratulations!</h3>
+<p>You’ve made it to the end of the tutorial.</p>
+<h3>See Also</h3>
+<p>For additional thematic documentation, see the Apache Lucy
+<a href="../../../Lucy/Docs/Cookbook.html">Cookbook</a>.</p>
+<p>ANDQuery has a companion class, <a href="../../../Lucy/Search/ORQuery.html">ORQuery</a>, and a
+close relative, <a href="../../../Lucy/Search/RequiredOptionalQuery.html">RequiredOptionalQuery</a>.</p>
+</div>
+
+ </div> <!-- lucy-main_content_box -->
+ <div class="clear"></div>
+
+ </div> <!-- lucy-main_content -->
+
+ <div id="lucy-copyright" class="container_16">
+ <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the
+ <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+ <br/>
+ Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The
+ Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their
+ respective owners.
+ </p>
+ </div> <!-- lucy-copyright -->
+
+ </div> <!-- lucy-rigid_wrapper -->
+
+ </body>
+</html>