You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucy.apache.org by nw...@apache.org on 2016/09/28 12:06:26 UTC

svn commit: r1762636 [2/12] - in /lucy/site/trunk/content/docs: ./ 0.5.0/ 0.5.0/c/ 0.5.0/c/Clownfish/ 0.5.0/c/Clownfish/Docs/ 0.5.0/c/Lucy/ 0.5.0/c/Lucy/Analysis/ 0.5.0/c/Lucy/Docs/ 0.5.0/c/Lucy/Docs/Cookbook/ 0.5.0/c/Lucy/Docs/Tutorial/ 0.5.0/c/Lucy/D...

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/RegexTokenizer.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/RegexTokenizer.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/RegexTokenizer.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/RegexTokenizer.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,199 @@
+Title: Lucy::Analysis::RegexTokenizer – C API Documentation
+
+<div class="c-api">
+<h2>Lucy::Analysis::RegexTokenizer</h2>
+<table>
+<tr>
+<td class="label">parcel</td>
+<td><a href="../../lucy.html">Lucy</a></td>
+</tr>
+<tr>
+<td class="label">class variable</td>
+<td><code><span class="prefix">LUCY_</span>REGEXTOKENIZER</code></td>
+</tr>
+<tr>
+<td class="label">struct symbol</td>
+<td><code><span class="prefix">lucy_</span>RegexTokenizer</code></td>
+</tr>
+<tr>
+<td class="label">class nickname</td>
+<td><code><span class="prefix">lucy_</span>RegexTokenizer</code></td>
+</tr>
+<tr>
+<td class="label">header file</td>
+<td><code>Lucy/Analysis/RegexTokenizer.h</code></td>
+</tr>
+</table>
+<h3>Name</h3>
+<p>Lucy::Analysis::RegexTokenizer – Split a string into tokens.</p>
+<h3>Description</h3>
+<p>Generically, “tokenizing” is a process of breaking up a string into an
+array of “tokens”.  For instance, the string “three blind mice” might be
+tokenized into “three”, “blind”, “mice”.</p>
+<p>Lucy::Analysis::RegexTokenizer decides where it should break up the text
+based on a regular expression compiled from a supplied <code>pattern</code>
+matching one token.  If our source string is…</p>
+<pre><code>&quot;Eats, Shoots and Leaves.&quot;
+</code></pre>
+<p>… then a “whitespace tokenizer” with a <code>pattern</code> of
+<code>&quot;\\S+&quot;</code> produces…</p>
+<pre><code>Eats,
+Shoots
+and
+Leaves.
+</code></pre>
+<p>… while a “word character tokenizer” with a <code>pattern</code> of
+<code>&quot;\\w+&quot;</code> produces…</p>
+<pre><code>Eats
+Shoots
+and
+Leaves
+</code></pre>
+<p>… the difference being that the word character tokenizer skips over
+punctuation as well as whitespace when determining token boundaries.</p>
+<h3>Functions</h3>
+<dl>
+<dt id="func_new">new</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>RegexTokenizer* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>RegexTokenizer_new</strong>(
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>pattern</strong>
+);
+</code></pre>
+<p>Create a new RegexTokenizer.</p>
+<dl>
+<dt>pattern</dt>
+<dd><p>A string specifying a Perl-syntax regular expression
+which should match one token.  The default value is
+<code>\w+(?:[\x{2019}']\w+)*</code>, which matches “it’s” as well as
+“it” and “O’Henry’s” as well as “Henry”.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_init">init</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>RegexTokenizer*
+<span class="prefix">lucy_</span><strong>RegexTokenizer_init</strong>(
+    <span class="prefix">lucy_</span>RegexTokenizer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>pattern</strong>
+);
+</code></pre>
+<p>Initialize a RegexTokenizer.</p>
+<dl>
+<dt>pattern</dt>
+<dd><p>A string specifying a Perl-syntax regular expression
+which should match one token.  The default value is
+<code>\w+(?:[\x{2019}']\w+)*</code>, which matches “it’s” as well as
+“it” and “O’Henry’s” as well as “Henry”.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="func_Transform">Transform</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>RegexTokenizer_Transform</strong>(
+    <span class="prefix">lucy_</span>RegexTokenizer *<strong>self</strong>,
+    <span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a> *<strong>inversion</strong>
+);
+</code></pre>
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html">Inversion</a> as input
+and returns an Inversion, either the same one (presumably transformed
+in some way), or a new one.</p>
+<dl>
+<dt>inversion</dt>
+<dd><p>An inversion.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Transform_Text">Transform_Text</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>RegexTokenizer_Transform_Text</strong>(
+    <span class="prefix">lucy_</span>RegexTokenizer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>text</strong>
+);
+</code></pre>
+<p>Kick off an analysis chain, creating an Inversion from string input.
+The default implementation simply creates an initial Inversion with a
+single Token, then calls <a href="../../Lucy/Analysis/RegexTokenizer.html#func_Transform">Transform()</a>, but occasionally subclasses will
+provide an optimized implementation which minimizes string copies.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A string.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Dump">Dump</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>RegexTokenizer_Dump</strong>(
+    <span class="prefix">lucy_</span>RegexTokenizer *<strong>self</strong>
+);
+</code></pre>
+<p>Dump the analyzer as hash.</p>
+<p>Subclasses should call <a href="../../Lucy/Analysis/RegexTokenizer.html#func_Dump">Dump()</a> on the superclass. The returned
+object is a hash which should be populated with parameters of
+the analyzer.</p>
+<p><strong>Returns:</strong> A hash containing a description of the analyzer.</p>
+</dd>
+<dt id="func_Load">Load</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>RegexTokenizer* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>RegexTokenizer_Load</strong>(
+    <span class="prefix">lucy_</span>RegexTokenizer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>dump</strong>
+);
+</code></pre>
+<p>Reconstruct an analyzer from a dump.</p>
+<p>Subclasses should first call <a href="../../Lucy/Analysis/RegexTokenizer.html#func_Load">Load()</a> on the superclass. The
+returned object is an analyzer which should be reconstructed by
+setting the dumped parameters from the hash contained in <code>dump</code>.</p>
+<p>Note that the invocant analyzer is unused.</p>
+<dl>
+<dt>dump</dt>
+<dd><p>A hash.</p>
+</dd>
+</dl>
+<p><strong>Returns:</strong> An analyzer.</p>
+</dd>
+<dt id="func_Equals">Equals</dt>
+<dd>
+<pre><code>bool
+<span class="prefix">lucy_</span><strong>RegexTokenizer_Equals</strong>(
+    <span class="prefix">lucy_</span>RegexTokenizer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>other</strong>
+);
+</code></pre>
+<p>Indicate whether two objects are the same.  By default, compares the
+memory address.</p>
+<dl>
+<dt>other</dt>
+<dd><p>Another Obj.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h4>Methods inherited from Lucy::Analysis::Analyzer</h4>
+<dl>
+<dt id="func_Split">Split</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Vector.html">Vector</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>RegexTokenizer_Split</strong>(
+    <span class="prefix">lucy_</span>RegexTokenizer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>text</strong>
+);
+</code></pre>
+<p>Analyze text and return an array of token texts.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A string.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h3>Inheritance</h3>
+<p>Lucy::Analysis::RegexTokenizer is a <a href="../../Lucy/Analysis/Analyzer.html">Lucy::Analysis::Analyzer</a> is a <a href="../../Clownfish/Obj.html">Clownfish::Obj</a>.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/SnowballStemmer.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/SnowballStemmer.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/SnowballStemmer.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/SnowballStemmer.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,176 @@
+Title: Lucy::Analysis::SnowballStemmer – C API Documentation
+
+<div class="c-api">
+<h2>Lucy::Analysis::SnowballStemmer</h2>
+<table>
+<tr>
+<td class="label">parcel</td>
+<td><a href="../../lucy.html">Lucy</a></td>
+</tr>
+<tr>
+<td class="label">class variable</td>
+<td><code><span class="prefix">LUCY_</span>SNOWBALLSTEMMER</code></td>
+</tr>
+<tr>
+<td class="label">struct symbol</td>
+<td><code><span class="prefix">lucy_</span>SnowballStemmer</code></td>
+</tr>
+<tr>
+<td class="label">class nickname</td>
+<td><code><span class="prefix">lucy_</span>SnowStemmer</code></td>
+</tr>
+<tr>
+<td class="label">header file</td>
+<td><code>Lucy/Analysis/SnowballStemmer.h</code></td>
+</tr>
+</table>
+<h3>Name</h3>
+<p>Lucy::Analysis::SnowballStemmer – Reduce related words to a shared root.</p>
+<h3>Description</h3>
+<p>SnowballStemmer is an <a href="../../Lucy/Analysis/Analyzer.html">Analyzer</a> which reduces
+related words to a root form (using the “Snowball” stemming library).  For
+instance, “horse”, “horses”, and “horsing” all become “hors” – so that a
+search for ‘horse’ will also match documents containing ‘horses’ and
+‘horsing’.</p>
+<h3>Functions</h3>
+<dl>
+<dt id="func_new">new</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>SnowballStemmer* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStemmer_new</strong>(
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>language</strong>
+);
+</code></pre>
+<p>Create a new SnowballStemmer.</p>
+<dl>
+<dt>language</dt>
+<dd><p>A two-letter ISO code identifying a language supported
+by Snowball.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_init">init</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>SnowballStemmer*
+<span class="prefix">lucy_</span><strong>SnowStemmer_init</strong>(
+    <span class="prefix">lucy_</span>SnowballStemmer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>language</strong>
+);
+</code></pre>
+<p>Initialize a SnowballStemmer.</p>
+<dl>
+<dt>language</dt>
+<dd><p>A two-letter ISO code identifying a language supported
+by Snowball.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="func_Transform">Transform</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStemmer_Transform</strong>(
+    <span class="prefix">lucy_</span>SnowballStemmer *<strong>self</strong>,
+    <span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a> *<strong>inversion</strong>
+);
+</code></pre>
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html">Inversion</a> as input
+and returns an Inversion, either the same one (presumably transformed
+in some way), or a new one.</p>
+<dl>
+<dt>inversion</dt>
+<dd><p>An inversion.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Dump">Dump</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Hash.html">Hash</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStemmer_Dump</strong>(
+    <span class="prefix">lucy_</span>SnowballStemmer *<strong>self</strong>
+);
+</code></pre>
+<p>Dump the analyzer as hash.</p>
+<p>Subclasses should call <a href="../../Lucy/Analysis/SnowballStemmer.html#func_Dump">Dump()</a> on the superclass. The returned
+object is a hash which should be populated with parameters of
+the analyzer.</p>
+<p><strong>Returns:</strong> A hash containing a description of the analyzer.</p>
+</dd>
+<dt id="func_Load">Load</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>SnowballStemmer* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStemmer_Load</strong>(
+    <span class="prefix">lucy_</span>SnowballStemmer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>dump</strong>
+);
+</code></pre>
+<p>Reconstruct an analyzer from a dump.</p>
+<p>Subclasses should first call <a href="../../Lucy/Analysis/SnowballStemmer.html#func_Load">Load()</a> on the superclass. The
+returned object is an analyzer which should be reconstructed by
+setting the dumped parameters from the hash contained in <code>dump</code>.</p>
+<p>Note that the invocant analyzer is unused.</p>
+<dl>
+<dt>dump</dt>
+<dd><p>A hash.</p>
+</dd>
+</dl>
+<p><strong>Returns:</strong> An analyzer.</p>
+</dd>
+<dt id="func_Equals">Equals</dt>
+<dd>
+<pre><code>bool
+<span class="prefix">lucy_</span><strong>SnowStemmer_Equals</strong>(
+    <span class="prefix">lucy_</span>SnowballStemmer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>other</strong>
+);
+</code></pre>
+<p>Indicate whether two objects are the same.  By default, compares the
+memory address.</p>
+<dl>
+<dt>other</dt>
+<dd><p>Another Obj.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h4>Methods inherited from Lucy::Analysis::Analyzer</h4>
+<dl>
+<dt id="func_Transform_Text">Transform_Text</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStemmer_Transform_Text</strong>(
+    <span class="prefix">lucy_</span>SnowballStemmer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>text</strong>
+);
+</code></pre>
+<p>Kick off an analysis chain, creating an Inversion from string input.
+The default implementation simply creates an initial Inversion with a
+single Token, then calls <a href="../../Lucy/Analysis/SnowballStemmer.html#func_Transform">Transform()</a>, but occasionally subclasses will
+provide an optimized implementation which minimizes string copies.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A string.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Split">Split</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Vector.html">Vector</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStemmer_Split</strong>(
+    <span class="prefix">lucy_</span>SnowballStemmer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>text</strong>
+);
+</code></pre>
+<p>Analyze text and return an array of token texts.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A string.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h3>Inheritance</h3>
+<p>Lucy::Analysis::SnowballStemmer is a <a href="../../Lucy/Analysis/Analyzer.html">Lucy::Analysis::Analyzer</a> is a <a href="../../Clownfish/Obj.html">Clownfish::Obj</a>.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/SnowballStopFilter.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/SnowballStopFilter.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/SnowballStopFilter.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/SnowballStopFilter.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,208 @@
+Title: Lucy::Analysis::SnowballStopFilter – C API Documentation
+
+<div class="c-api">
+<h2>Lucy::Analysis::SnowballStopFilter</h2>
+<table>
+<tr>
+<td class="label">parcel</td>
+<td><a href="../../lucy.html">Lucy</a></td>
+</tr>
+<tr>
+<td class="label">class variable</td>
+<td><code><span class="prefix">LUCY_</span>SNOWBALLSTOPFILTER</code></td>
+</tr>
+<tr>
+<td class="label">struct symbol</td>
+<td><code><span class="prefix">lucy_</span>SnowballStopFilter</code></td>
+</tr>
+<tr>
+<td class="label">class nickname</td>
+<td><code><span class="prefix">lucy_</span>SnowStop</code></td>
+</tr>
+<tr>
+<td class="label">header file</td>
+<td><code>Lucy/Analysis/SnowballStopFilter.h</code></td>
+</tr>
+</table>
+<h3>Name</h3>
+<p>Lucy::Analysis::SnowballStopFilter – Suppress a “stoplist” of common words.</p>
+<h3>Description</h3>
+<p>A “stoplist” is collection of “stopwords”: words which are common enough to
+be of little value when determining search results.  For example, so many
+documents in English contain “the”, “if”, and “maybe” that it may improve
+both performance and relevance to block them.</p>
+<p>Before filtering stopwords:</p>
+<pre><code>(&quot;i&quot;, &quot;am&quot;, &quot;the&quot;, &quot;walrus&quot;)
+</code></pre>
+<p>After filtering stopwords:</p>
+<pre><code>(&quot;walrus&quot;)
+</code></pre>
+<p>SnowballStopFilter provides default stoplists for several languages,
+courtesy of the <a href="http://snowball.tartarus.org">Snowball project</a>, or you may
+supply your own.</p>
+<pre><code>|-----------------------|
+| ISO CODE | LANGUAGE   |
+|-----------------------|
+| da       | Danish     |
+| de       | German     |
+| en       | English    |
+| es       | Spanish    |
+| fi       | Finnish    |
+| fr       | French     |
+| hu       | Hungarian  |
+| it       | Italian    |
+| nl       | Dutch      |
+| no       | Norwegian  |
+| pt       | Portuguese |
+| sv       | Swedish    |
+| ru       | Russian    |
+|-----------------------|
+</code></pre>
+<h3>Functions</h3>
+<dl>
+<dt id="func_new">new</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>SnowballStopFilter* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStop_new</strong>(
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>language</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Hash.html">Hash</a> *<strong>stoplist</strong>
+);
+</code></pre>
+<p>Create a new SnowballStopFilter.</p>
+<dl>
+<dt>stoplist</dt>
+<dd><p>A hash with stopwords as the keys.</p>
+</dd>
+<dt>language</dt>
+<dd><p>The ISO code for a supported language.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_init">init</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>SnowballStopFilter*
+<span class="prefix">lucy_</span><strong>SnowStop_init</strong>(
+    <span class="prefix">lucy_</span>SnowballStopFilter *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>language</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Hash.html">Hash</a> *<strong>stoplist</strong>
+);
+</code></pre>
+<p>Initialize a SnowballStopFilter.</p>
+<dl>
+<dt>stoplist</dt>
+<dd><p>A hash with stopwords as the keys.</p>
+</dd>
+<dt>language</dt>
+<dd><p>The ISO code for a supported language.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="func_Transform">Transform</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStop_Transform</strong>(
+    <span class="prefix">lucy_</span>SnowballStopFilter *<strong>self</strong>,
+    <span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a> *<strong>inversion</strong>
+);
+</code></pre>
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html">Inversion</a> as input
+and returns an Inversion, either the same one (presumably transformed
+in some way), or a new one.</p>
+<dl>
+<dt>inversion</dt>
+<dd><p>An inversion.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Equals">Equals</dt>
+<dd>
+<pre><code>bool
+<span class="prefix">lucy_</span><strong>SnowStop_Equals</strong>(
+    <span class="prefix">lucy_</span>SnowballStopFilter *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>other</strong>
+);
+</code></pre>
+<p>Indicate whether two objects are the same.  By default, compares the
+memory address.</p>
+<dl>
+<dt>other</dt>
+<dd><p>Another Obj.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Dump">Dump</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStop_Dump</strong>(
+    <span class="prefix">lucy_</span>SnowballStopFilter *<strong>self</strong>
+);
+</code></pre>
+<p>Dump the analyzer as hash.</p>
+<p>Subclasses should call <a href="../../Lucy/Analysis/SnowballStopFilter.html#func_Dump">Dump()</a> on the superclass. The returned
+object is a hash which should be populated with parameters of
+the analyzer.</p>
+<p><strong>Returns:</strong> A hash containing a description of the analyzer.</p>
+</dd>
+<dt id="func_Load">Load</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStop_Load</strong>(
+    <span class="prefix">lucy_</span>SnowballStopFilter *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>dump</strong>
+);
+</code></pre>
+<p>Reconstruct an analyzer from a dump.</p>
+<p>Subclasses should first call <a href="../../Lucy/Analysis/SnowballStopFilter.html#func_Load">Load()</a> on the superclass. The
+returned object is an analyzer which should be reconstructed by
+setting the dumped parameters from the hash contained in <code>dump</code>.</p>
+<p>Note that the invocant analyzer is unused.</p>
+<dl>
+<dt>dump</dt>
+<dd><p>A hash.</p>
+</dd>
+</dl>
+<p><strong>Returns:</strong> An analyzer.</p>
+</dd>
+</dl>
+<h4>Methods inherited from Lucy::Analysis::Analyzer</h4>
+<dl>
+<dt id="func_Transform_Text">Transform_Text</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStop_Transform_Text</strong>(
+    <span class="prefix">lucy_</span>SnowballStopFilter *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>text</strong>
+);
+</code></pre>
+<p>Kick off an analysis chain, creating an Inversion from string input.
+The default implementation simply creates an initial Inversion with a
+single Token, then calls <a href="../../Lucy/Analysis/SnowballStopFilter.html#func_Transform">Transform()</a>, but occasionally subclasses will
+provide an optimized implementation which minimizes string copies.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A string.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Split">Split</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Vector.html">Vector</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>SnowStop_Split</strong>(
+    <span class="prefix">lucy_</span>SnowballStopFilter *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>text</strong>
+);
+</code></pre>
+<p>Analyze text and return an array of token texts.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A string.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h3>Inheritance</h3>
+<p>Lucy::Analysis::SnowballStopFilter is a <a href="../../Lucy/Analysis/Analyzer.html">Lucy::Analysis::Analyzer</a> is a <a href="../../Clownfish/Obj.html">Clownfish::Obj</a>.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/StandardTokenizer.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/StandardTokenizer.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/StandardTokenizer.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/StandardTokenizer.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,162 @@
+Title: Lucy::Analysis::StandardTokenizer – C API Documentation
+
+<div class="c-api">
+<h2>Lucy::Analysis::StandardTokenizer</h2>
+<table>
+<tr>
+<td class="label">parcel</td>
+<td><a href="../../lucy.html">Lucy</a></td>
+</tr>
+<tr>
+<td class="label">class variable</td>
+<td><code><span class="prefix">LUCY_</span>STANDARDTOKENIZER</code></td>
+</tr>
+<tr>
+<td class="label">struct symbol</td>
+<td><code><span class="prefix">lucy_</span>StandardTokenizer</code></td>
+</tr>
+<tr>
+<td class="label">class nickname</td>
+<td><code><span class="prefix">lucy_</span>StandardTokenizer</code></td>
+</tr>
+<tr>
+<td class="label">header file</td>
+<td><code>Lucy/Analysis/StandardTokenizer.h</code></td>
+</tr>
+</table>
+<h3>Name</h3>
+<p>Lucy::Analysis::StandardTokenizer – Split a string into tokens.</p>
+<h3>Description</h3>
+<p>Generically, “tokenizing” is a process of breaking up a string into an
+array of “tokens”.  For instance, the string “three blind mice” might be
+tokenized into “three”, “blind”, “mice”.</p>
+<p>Lucy::Analysis::StandardTokenizer breaks up the text at the word
+boundaries defined in Unicode Standard Annex #29. It then returns those
+words that contain alphabetic or numeric characters.</p>
+<h3>Functions</h3>
+<dl>
+<dt id="func_new">new</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>StandardTokenizer* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>StandardTokenizer_new</strong>(void);
+</code></pre>
+<p>Constructor.  Takes no arguments.</p>
+</dd>
+<dt id="func_init">init</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>StandardTokenizer*
+<span class="prefix">lucy_</span><strong>StandardTokenizer_init</strong>(
+    <span class="prefix">lucy_</span>StandardTokenizer *<strong>self</strong>
+);
+</code></pre>
+<p>Initialize a StandardTokenizer.</p>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="func_Transform">Transform</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>StandardTokenizer_Transform</strong>(
+    <span class="prefix">lucy_</span>StandardTokenizer *<strong>self</strong>,
+    <span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a> *<strong>inversion</strong>
+);
+</code></pre>
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html">Inversion</a> as input
+and returns an Inversion, either the same one (presumably transformed
+in some way), or a new one.</p>
+<dl>
+<dt>inversion</dt>
+<dd><p>An inversion.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Transform_Text">Transform_Text</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span><a href="../../Lucy/Analysis/Inversion.html">Inversion</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>StandardTokenizer_Transform_Text</strong>(
+    <span class="prefix">lucy_</span>StandardTokenizer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>text</strong>
+);
+</code></pre>
+<p>Kick off an analysis chain, creating an Inversion from string input.
+The default implementation simply creates an initial Inversion with a
+single Token, then calls <a href="../../Lucy/Analysis/StandardTokenizer.html#func_Transform">Transform()</a>, but occasionally subclasses will
+provide an optimized implementation which minimizes string copies.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A string.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Equals">Equals</dt>
+<dd>
+<pre><code>bool
+<span class="prefix">lucy_</span><strong>StandardTokenizer_Equals</strong>(
+    <span class="prefix">lucy_</span>StandardTokenizer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>other</strong>
+);
+</code></pre>
+<p>Indicate whether two objects are the same.  By default, compares the
+memory address.</p>
+<dl>
+<dt>other</dt>
+<dd><p>Another Obj.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h4>Methods inherited from Lucy::Analysis::Analyzer</h4>
+<dl>
+<dt id="func_Split">Split</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Vector.html">Vector</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>StandardTokenizer_Split</strong>(
+    <span class="prefix">lucy_</span>StandardTokenizer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/String.html">String</a> *<strong>text</strong>
+);
+</code></pre>
+<p>Analyze text and return an array of token texts.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A string.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_Dump">Dump</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>StandardTokenizer_Dump</strong>(
+    <span class="prefix">lucy_</span>StandardTokenizer *<strong>self</strong>
+);
+</code></pre>
+<p>Dump the analyzer as hash.</p>
+<p>Subclasses should call <a href="../../Lucy/Analysis/StandardTokenizer.html#func_Dump">Dump()</a> on the superclass. The returned
+object is a hash which should be populated with parameters of
+the analyzer.</p>
+<p><strong>Returns:</strong> A hash containing a description of the analyzer.</p>
+</dd>
+<dt id="func_Load">Load</dt>
+<dd>
+<pre><code><span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a>* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>StandardTokenizer_Load</strong>(
+    <span class="prefix">lucy_</span>StandardTokenizer *<strong>self</strong>,
+    <span class="prefix">cfish_</span><a href="../../Clownfish/Obj.html">Obj</a> *<strong>dump</strong>
+);
+</code></pre>
+<p>Reconstruct an analyzer from a dump.</p>
+<p>Subclasses should first call <a href="../../Lucy/Analysis/StandardTokenizer.html#func_Load">Load()</a> on the superclass. The
+returned object is an analyzer which should be reconstructed by
+setting the dumped parameters from the hash contained in <code>dump</code>.</p>
+<p>Note that the invocant analyzer is unused.</p>
+<dl>
+<dt>dump</dt>
+<dd><p>A hash.</p>
+</dd>
+</dl>
+<p><strong>Returns:</strong> An analyzer.</p>
+</dd>
+</dl>
+<h3>Inheritance</h3>
+<p>Lucy::Analysis::StandardTokenizer is a <a href="../../Lucy/Analysis/Analyzer.html">Lucy::Analysis::Analyzer</a> is a <a href="../../Clownfish/Obj.html">Clownfish::Obj</a>.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/Token.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/Token.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/Token.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Analysis/Token.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,194 @@
+Title: Lucy::Analysis::Token – C API Documentation
+
+<div class="c-api">
+<h2>Lucy::Analysis::Token</h2>
+<table>
+<tr>
+<td class="label">parcel</td>
+<td><a href="../../lucy.html">Lucy</a></td>
+</tr>
+<tr>
+<td class="label">class variable</td>
+<td><code><span class="prefix">LUCY_</span>TOKEN</code></td>
+</tr>
+<tr>
+<td class="label">struct symbol</td>
+<td><code><span class="prefix">lucy_</span>Token</code></td>
+</tr>
+<tr>
+<td class="label">class nickname</td>
+<td><code><span class="prefix">lucy_</span>Token</code></td>
+</tr>
+<tr>
+<td class="label">header file</td>
+<td><code>Lucy/Analysis/Token.h</code></td>
+</tr>
+</table>
+<h3>Name</h3>
+<p>Lucy::Analysis::Token – Unit of text.</p>
+<h3>Description</h3>
+<p>Token is the fundamental unit used by Apache Lucy’s Analyzer subclasses.
+Each Token has 5 attributes: <code>text</code>, <code>start_offset</code>,
+<code>end_offset</code>, <code>boost</code>, and <code>pos_inc</code>.</p>
+<p>The <code>text</code> attribute is a Unicode string encoded as UTF-8.</p>
+<p><code>start_offset</code> is the start point of the token text, measured in
+Unicode code points from the top of the stored field;
+<code>end_offset</code> delimits the corresponding closing boundary.
+<code>start_offset</code> and <code>end_offset</code> locate the Token
+within a larger context, even if the Token’s text attribute gets modified
+– by stemming, for instance.  The Token for “beating” in the text “beating
+a dead horse” begins life with a start_offset of 0 and an end_offset of 7;
+after stemming, the text is “beat”, but the start_offset is still 0 and the
+end_offset is still 7.  This allows “beating” to be highlighted correctly
+after a search matches “beat”.</p>
+<p><code>boost</code> is a per-token weight.  Use this when you want to assign
+more or less importance to a particular token, as you might for emboldened
+text within an HTML document, for example.  (Note: The field this token
+belongs to must be spec’d to use a posting of type RichPosting.)</p>
+<p><code>pos_inc</code> is the POSition INCrement, measured in Tokens.  This
+attribute, which defaults to 1, is a an advanced tool for manipulating
+phrase matching.  Ordinarily, Tokens are assigned consecutive position
+numbers: 0, 1, and 2 for <code>&quot;three blind mice&quot;</code>.  However, if you
+set the position increment for “blind” to, say, 1000, then the three tokens
+will end up assigned to positions 0, 1, and 1001 – and will no longer
+produce a phrase match for the query <code>&quot;three blind mice&quot;</code>.</p>
+<h3>Functions</h3>
+<dl>
+<dt id="func_new">new</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>Token* <span class="comment">// incremented</span>
+<span class="prefix">lucy_</span><strong>Token_new</strong>(
+    char *<strong>text</strong>,
+    size_t <strong>len</strong>,
+    uint32_t <strong>start_offset</strong>,
+    uint32_t <strong>end_offset</strong>,
+    float <strong>boost</strong>,
+    int32_t <strong>pos_inc</strong>
+);
+</code></pre>
+<p>Create a new Token.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A UTF-8 string.</p>
+</dd>
+<dt>len</dt>
+<dd><p>Size of the string in bytes.</p>
+</dd>
+<dt>start_offset</dt>
+<dd><p>Start offset into the original document in Unicode
+code points.</p>
+</dd>
+<dt>start_offset</dt>
+<dd><p>End offset into the original document in Unicode
+code points.</p>
+</dd>
+<dt>boost</dt>
+<dd><p>Per-token weight.</p>
+</dd>
+<dt>pos_inc</dt>
+<dd><p>Position increment for phrase matching.</p>
+</dd>
+</dl>
+</dd>
+<dt id="func_init">init</dt>
+<dd>
+<pre><code><span class="prefix">lucy_</span>Token*
+<span class="prefix">lucy_</span><strong>Token_init</strong>(
+    <span class="prefix">lucy_</span>Token *<strong>self</strong>,
+    char *<strong>text</strong>,
+    size_t <strong>len</strong>,
+    uint32_t <strong>start_offset</strong>,
+    uint32_t <strong>end_offset</strong>,
+    float <strong>boost</strong>,
+    int32_t <strong>pos_inc</strong>
+);
+</code></pre>
+<p>Initialize a Token.</p>
+<dl>
+<dt>text</dt>
+<dd><p>A UTF-8 string.</p>
+</dd>
+<dt>len</dt>
+<dd><p>Size of the string in bytes.</p>
+</dd>
+<dt>start_offset</dt>
+<dd><p>Start offset into the original document in Unicode
+code points.</p>
+</dd>
+<dt>start_offset</dt>
+<dd><p>End offset into the original document in Unicode
+code points.</p>
+</dd>
+<dt>boost</dt>
+<dd><p>Per-token weight.</p>
+</dd>
+<dt>pos_inc</dt>
+<dd><p>Position increment for phrase matching.</p>
+</dd>
+</dl>
+</dd>
+</dl>
+<h3>Methods</h3>
+<dl>
+<dt id="func_Get_Start_Offset">Get_Start_Offset</dt>
+<dd>
+<pre><code>uint32_t
+<span class="prefix">lucy_</span><strong>Token_Get_Start_Offset</strong>(
+    <span class="prefix">lucy_</span>Token *<strong>self</strong>
+);
+</code></pre>
+</dd>
+<dt id="func_Get_End_Offset">Get_End_Offset</dt>
+<dd>
+<pre><code>uint32_t
+<span class="prefix">lucy_</span><strong>Token_Get_End_Offset</strong>(
+    <span class="prefix">lucy_</span>Token *<strong>self</strong>
+);
+</code></pre>
+</dd>
+<dt id="func_Get_Boost">Get_Boost</dt>
+<dd>
+<pre><code>float
+<span class="prefix">lucy_</span><strong>Token_Get_Boost</strong>(
+    <span class="prefix">lucy_</span>Token *<strong>self</strong>
+);
+</code></pre>
+</dd>
+<dt id="func_Get_Pos_Inc">Get_Pos_Inc</dt>
+<dd>
+<pre><code>int32_t
+<span class="prefix">lucy_</span><strong>Token_Get_Pos_Inc</strong>(
+    <span class="prefix">lucy_</span>Token *<strong>self</strong>
+);
+</code></pre>
+</dd>
+<dt id="func_Get_Text">Get_Text</dt>
+<dd>
+<pre><code>char*
+<span class="prefix">lucy_</span><strong>Token_Get_Text</strong>(
+    <span class="prefix">lucy_</span>Token *<strong>self</strong>
+);
+</code></pre>
+</dd>
+<dt id="func_Get_Len">Get_Len</dt>
+<dd>
+<pre><code>size_t
+<span class="prefix">lucy_</span><strong>Token_Get_Len</strong>(
+    <span class="prefix">lucy_</span>Token *<strong>self</strong>
+);
+</code></pre>
+</dd>
+<dt id="func_Set_Text">Set_Text</dt>
+<dd>
+<pre><code>void
+<span class="prefix">lucy_</span><strong>Token_Set_Text</strong>(
+    <span class="prefix">lucy_</span>Token *<strong>self</strong>,
+    char *<strong>text</strong>,
+    size_t <strong>len</strong>
+);
+</code></pre>
+</dd>
+</dl>
+<h3>Inheritance</h3>
+<p>Lucy::Analysis::Token is a <a href="../../Clownfish/Obj.html">Clownfish::Obj</a>.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,32 @@
+Title: Lucy::Docs::Cookbook
+
+<div class="c-api">
+<h2>Apache Lucy recipes</h2>
+<p>The Cookbook provides thematic documentation covering some of Apache Lucy’s
+more sophisticated features.  For a step-by-step introduction to Lucy,
+see <a href="../../Lucy/Docs/Tutorial.html">Tutorial</a>.</p>
+<h3>Chapters</h3>
+<ul>
+<li>
+<p><a href="../../Lucy/Docs/Cookbook/FastUpdates.html">FastUpdates</a> - While index updates are fast on
+average, worst-case update performance may be significantly slower. To make
+index updates consistently quick, we must manually intervene to control the
+process of index segment consolidation.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Cookbook/CustomQuery.html">CustomQuery</a> - Explore Lucy’s support for
+custom query types by creating a “PrefixQuery” class to handle trailing
+wildcards.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Cookbook/CustomQueryParser.html">CustomQueryParser</a> - Define your own custom
+search query syntax using <a href="../../Lucy/Search/QueryParser.html">QueryParser</a> and
+Parse::RecDescent.</p>
+</li>
+</ul>
+<h3>Materials</h3>
+<p>Some of the recipes in the Cookbook reference the completed
+<a href="../../Lucy/Docs/Tutorial.html">Tutorial</a> application.  These materials can be
+found in the <code>sample</code> directory at the root of the Lucy distribution:</p>
+<pre><code>Code example for C is missing</code></pre>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/CustomQuery.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/CustomQuery.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/CustomQuery.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/CustomQuery.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,102 @@
+Title: Lucy::Docs::Cookbook::CustomQuery
+
+<div class="c-api">
+<h2>Sample subclass of Query</h2>
+<p>Explore Apache Lucy’s support for custom query types by creating a
+“PrefixQuery” class to handle trailing wildcards.</p>
+<pre><code>Code example for C is missing</code></pre>
+<h3>Query, Compiler, and Matcher</h3>
+<p>To add support for a new query type, we need three classes: a Query, a
+Compiler, and a Matcher.</p>
+<ul>
+<li>
+<p>PrefixQuery - a subclass of <a href="../../../Lucy/Search/Query.html">Query</a>, and the only class
+that client code will deal with directly.</p>
+</li>
+<li>
+<p>PrefixCompiler - a subclass of <a href="../../../Lucy/Search/Compiler.html">Compiler</a>, whose primary
+role is to compile a PrefixQuery to a PrefixMatcher.</p>
+</li>
+<li>
+<p>PrefixMatcher - a subclass of <a href="../../../Lucy/Search/Matcher.html">Matcher</a>, which does the
+heavy lifting: it applies the query to individual documents and assigns a
+score to each match.</p>
+</li>
+</ul>
+<p>The PrefixQuery class on its own isn’t enough because a Query object’s role is
+limited to expressing an abstract specification for the search.  A Query is
+basically nothing but metadata; execution is left to the Query’s companion
+Compiler and Matcher.</p>
+<p>Here’s a simplified sketch illustrating how a Searcher’s hits() method ties
+together the three classes.</p>
+<pre><code>Code example for C is missing</code></pre>
+<h4>PrefixQuery</h4>
+<p>Our PrefixQuery class will have two attributes: a query string and a field
+name.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>PrefixQuery’s constructor collects and validates the attributes.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>Since this is an inside-out class, we’ll need a destructor:</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>The equals() method determines whether two Queries are logically equivalent:</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>The last thing we’ll need is a make_compiler() factory method which kicks out
+a subclass of <a href="../../../Lucy/Search/Compiler.html">Compiler</a>.</p>
+<pre><code>Code example for C is missing</code></pre>
+<h4>PrefixCompiler</h4>
+<p>PrefixQuery’s make_compiler() method will be called internally at search-time
+by objects which subclass <a href="../../../Lucy/Search/Searcher.html">Searcher</a> – such as
+<a href="../../../Lucy/Search/IndexSearcher.html">IndexSearchers</a>.</p>
+<p>A Searcher is associated with a particular collection of documents.   These
+documents may all reside in one index, as with IndexSearcher, or they may be
+spread out across multiple indexes on one or more machines, as with
+LucyX::Remote::ClusterSearcher.</p>
+<p>Searcher objects have access to certain statistical information about the
+collections they represent; for instance, a Searcher can tell you how many
+documents are in the collection…</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>… or how many documents a specific term appears in:</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>Such information can be used by sophisticated Compiler implementations to
+assign more or less heft to individual queries or sub-queries.  However, we’re
+not going to bother with weighting for this demo; we’ll just assign a fixed
+score of 1.0 to each matching document.</p>
+<p>We don’t need to write a constructor, as it will suffice to inherit new() from
+Lucy::Search::Compiler.  The only method we need to implement for
+PrefixCompiler is make_matcher().</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>PrefixCompiler gets access to a <a href="../../../Lucy/Index/SegReader.html">SegReader</a>
+object when make_matcher() gets called.  From the SegReader and its
+sub-components <a href="../../../Lucy/Index/LexiconReader.html">LexiconReader</a> and
+<a href="../../../Lucy/Index/PostingListReader.html">PostingListReader</a>, we acquire a
+<a href="../../../Lucy/Index/Lexicon.html">Lexicon</a>, scan through the Lexicon’s unique
+terms, and acquire a <a href="../../../Lucy/Index/PostingList.html">PostingList</a> for each
+term that matches our prefix.</p>
+<p>Each of these PostingList objects represents a set of documents which match
+the query.</p>
+<h4>PrefixMatcher</h4>
+<p>The Matcher subclass is the most involved.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>The doc ids must be in order, or some will be ignored; hence the <code>sort</code>
+above.</p>
+<p>In addition to the constructor and destructor, there are three methods that
+must be overridden.</p>
+<p>next() advances the Matcher to the next valid matching doc.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>get_doc_id() returns the current document id, or 0 if the Matcher is
+exhausted.  (<a href="../../../Lucy/Docs/DocIDs.html">Document numbers</a> start at 1, so 0 is
+a sentinel.)</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>score() conveys the relevance score of the current match.  We’ll just return a
+fixed score of 1.0:</p>
+<pre><code>Code example for C is missing</code></pre>
+<h3>Usage</h3>
+<p>To get a basic feel for PrefixQuery, insert the FlatQueryParser module
+described in <a href="../../../Lucy/Docs/Cookbook/CustomQueryParser.html">CustomQueryParser</a> (which supports
+PrefixQuery) into the search.cgi sample app.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>If you’re planning on using PrefixQuery in earnest, though, you may want to
+change up analyzers to avoid stemming, because stemming – another approach to
+prefix conflation – is not perfectly compatible with prefix searches.</p>
+<pre><code>Code example for C is missing</code></pre>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/CustomQueryParser.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/CustomQueryParser.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/CustomQueryParser.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/CustomQueryParser.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,77 @@
+Title: Lucy::Docs::Cookbook::CustomQueryParser
+
+<div class="c-api">
+<h2>Sample subclass of QueryParser.</h2>
+<p>Implement a custom search query language using a subclass of
+<a href="../../../Lucy/Search/QueryParser.html">QueryParser</a>.</p>
+<h3>The language</h3>
+<p>At first, our query language will support only simple term queries and phrases
+delimited by double quotes.  For simplicity’s sake, it will not support
+parenthetical groupings, boolean operators, or prepended plus/minus.  The
+results for all subqueries will be unioned together – i.e. joined using an OR
+– which is usually the best approach for small-to-medium-sized document
+collections.</p>
+<p>Later, we’ll add support for trailing wildcards.</p>
+<h3>Single-field parser</h3>
+<p>Our initial parser implentation will generate queries against a single fixed
+field, “content”, and it will analyze text using a fixed choice of English
+EasyAnalyzer.  We won’t subclass Lucy::Search::QueryParser just yet.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>Some private helper subs for creating TermQuery and PhraseQuery objects will
+help keep the size of our main parse() subroutine down:</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>Our private _tokenize() method treats double-quote delimited material as a
+single token and splits on whitespace everywhere else.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>The main parsing routine creates an array of tokens by calling _tokenize(),
+runs the tokens through through the EasyAnalyzer, creates TermQuery or
+PhraseQuery objects according to how many tokens emerge from the
+EasyAnalyzer’s split() method, and adds each of the sub-queries to the primary
+ORQuery.</p>
+<pre><code>Code example for C is missing</code></pre>
+<h3>Multi-field parser</h3>
+<p>Most often, the end user will want their search query to match not only a
+single ‘content’ field, but also ‘title’ and so on.  To make that happen, we
+have to turn queries such as this…</p>
+<pre><code>foo AND NOT bar
+</code></pre>
+<p>… into the logical equivalent of this:</p>
+<pre><code>(title:foo OR content:foo) AND NOT (title:bar OR content:bar)
+</code></pre>
+<p>Rather than continue with our own from-scratch parser class and write the
+routines to accomplish that expansion, we’re now going to subclass Lucy::Search::QueryParser
+and take advantage of some of its existing methods.</p>
+<p>Our first parser implementation had the “content” field name and the choice of
+English EasyAnalyzer hard-coded for simplicity, but we don’t need to do that
+once we subclass Lucy::Search::QueryParser.  QueryParser’s constructor –
+which we will inherit, allowing us to eliminate our own constructor –
+requires a Schema which conveys field
+and Analyzer information, so we can just defer to that.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>We’re also going to jettison our _make_term_query() and _make_phrase_query()
+helper subs and chop our parse() subroutine way down.  Our revised parse()
+routine will generate Lucy::Search::LeafQuery objects instead of TermQueries
+and PhraseQueries:</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>The magic happens in QueryParser’s expand() method, which walks the ORQuery
+object we supply to it looking for LeafQuery objects, and calls expand_leaf()
+for each one it finds.  expand_leaf() performs field-specific analysis,
+decides whether each query should be a TermQuery or a PhraseQuery, and if
+multiple fields are required, creates an ORQuery which mults out e.g.  <code>foo</code>
+into <code>(title:foo OR content:foo)</code>.</p>
+<h3>Extending the query language</h3>
+<p>To add support for trailing wildcards to our query language, we need to
+override expand_leaf() to accommodate PrefixQuery, while deferring to the
+parent class implementation on TermQuery and PhraseQuery.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>Ordinarily, those asterisks would have been stripped when running tokens
+through the EasyAnalyzer – query strings containing “foo*” would produce
+TermQueries for the term “foo”.  Our override intercepts tokens with trailing
+asterisks and processes them as PrefixQueries before <code>SUPER::expand_leaf</code> can
+discard them, so that a search for “foo*” can match “food”, “foosball”, and so
+on.</p>
+<h3>Usage</h3>
+<p>Insert our custom parser into the search.cgi sample app to get a feel for how
+it behaves:</p>
+<pre><code>Code example for C is missing</code></pre>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/FastUpdates.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/FastUpdates.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/FastUpdates.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Cookbook/FastUpdates.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,163 @@
+Title: Lucy::Docs::Cookbook::FastUpdates
+
+<div class="c-api">
+<h2>Near real-time index updates</h2>
+<p>While index updates are fast on average, worst-case update performance may be
+significantly slower.  To make index updates consistently quick, we must
+manually intervene to control the process of index segment consolidation.</p>
+<h3>The problem</h3>
+<p>Ordinarily, modifying an index is cheap. New data is added to new segments,
+and the time to write a new segment scales more or less linearly with the
+number of documents added during the indexing session.</p>
+<p>Deletions are also cheap most of the time, because we don’t remove documents
+immediately but instead mark them as deleted, and adding the deletion mark is
+cheap.</p>
+<p>However, as new segments are added and the deletion rate for existing segments
+increases, search-time performance slowly begins to degrade.  At some point,
+it becomes necessary to consolidate existing segments, rewriting their data
+into a new segment.</p>
+<p>If the recycled segments are small, the time it takes to rewrite them may not
+be significant.  Every once in a while, though, a large amount of data must be
+rewritten.</p>
+<h3>Procrastinating and playing catch-up</h3>
+<p>The simplest way to force fast index updates is to avoid rewriting anything.</p>
+<p>Indexer relies upon <a href="../../../Lucy/Index/IndexManager.html">IndexManager</a>’s
+<a href="../../../Lucy/Index/IndexManager.html#func_Recycle">Recycle()</a> method to tell it which segments should
+be consolidated.  If we subclass IndexManager and override the method so that
+it always returns an empty array, we get consistently quick performance:</p>
+<pre><code class="language-c">Vector*
+NoMergeManager_Recycle_IMP(IndexManager *self, PolyReader *reader,
+                           DeletionsWriter *del_writer, int64_t cutoff,
+                           bool optimize) {
+    return Vec_new(0);
+}
+
+void
+do_index(Obj *index) {
+    CFCClass *klass = Class_singleton(&quot;NoMergeManager&quot;, INDEXMANAGER);
+    Class_Override(klass, (cfish_method_t)NoMergeManager_Recycle_IMP,
+                   LUCY_IndexManager_Recycle_OFFSET);
+
+    IndexManager *manager = (IndexManager*)Class_Make_Obj(klass);
+    IxManager_init(manager, NULL, NULL);
+
+    Indexer *indexer = Indexer_new(NULL, index, manager, 0);
+    ...
+    Indexer_Commit(indexer);
+
+    DECREF(indexer);
+    DECREF(manager);
+}
+</code></pre>
+<p>However, we can’t procrastinate forever.  Eventually, we’ll have to run an
+ordinary, uncontrolled indexing session, potentially triggering a large
+rewrite of lots of small and/or degraded segments:</p>
+<pre><code class="language-c">void
+do_index(Obj *index) {
+    Indexer *indexer = Indexer_new(NULL, index, NULL /* manager */, 0);
+    ...
+    Indexer_Commit(indexer);
+    DECREF(indexer);
+}
+</code></pre>
+<h3>Acceptable worst-case update time, slower degradation</h3>
+<p>Never merging anything at all in the main indexing process is probably
+overkill.  Small segments are relatively cheap to merge; we just need to guard
+against the big rewrites.</p>
+<p>Setting a ceiling on the number of documents in the segments to be recycled
+allows us to avoid a mass proliferation of tiny, single-document segments,
+while still offering decent worst-case update speed:</p>
+<pre><code class="language-c">Vector*
+LightMergeManager_Recycle_IMP(IndexManager *self, PolyReader *reader,
+                              DeletionsWriter *del_writer, int64_t cutoff,
+                              bool optimize) {
+    IndexManager_Recycle_t super_recycle
+        = SUPER_METHOD_PTR(IndexManager, LUCY_IndexManager_Recycle);
+    Vector *seg_readers = super_recycle(self, reader, del_writer, cutoff,
+                                        optimize);
+    Vector *small_segments = Vec_new(0);
+
+    for (size_t i = 0, max = Vec_Get_Size(seg_readers); i &lt; max; i++) {
+        SegReader *seg_reader = (SegReader*)Vec_Fetch(seg_readers, i);
+
+        if (SegReader_Doc_Max(seg_reader) &lt; 10) {
+            Vec_Push(small_segments, INCREF(seg_reader));
+        }
+    }
+
+    DECREF(seg_readers);
+    return small_segments;
+}
+</code></pre>
+<p>However, we still have to consolidate every once in a while, and while that
+happens content updates will be locked out.</p>
+<h3>Background merging</h3>
+<p>If it’s not acceptable to lock out updates while the index consolidation
+process runs, the alternative is to move the consolidation process out of
+band, using <a href="../../../Lucy/Index/BackgroundMerger.html">BackgroundMerger</a>.</p>
+<p>It’s never safe to have more than one Indexer attempting to modify the content
+of an index at the same time, but a BackgroundMerger and an Indexer can
+operate simultaneously:</p>
+<pre><code class="language-c">typedef struct {
+    Obj *index;
+    Doc *doc;
+} Context;
+
+static void
+S_index_doc(void *arg) {
+    Context *ctx = (Context*)arg;
+
+    CFCClass *klass = Class_singleton(&quot;LightMergeManager&quot;, INDEXMANAGER);
+    Class_Override(klass, (cfish_method_t)LightMergeManager_Recycle_IMP,
+                   LUCY_IndexManager_Recycle_OFFSET);
+
+    IndexManager *manager = (IndexManager*)Class_Make_Obj(klass);
+    IxManager_init(manager, NULL, NULL);
+
+    Indexer *indexer = Indexer_new(NULL, ctx-&gt;index, manager, 0);
+    Indexer_Add_Doc(indexer, ctx-&gt;doc, 1.0);
+    Indexer_Commit(indexer);
+
+    DECREF(indexer);
+    DECREF(manager);
+}
+
+void indexing_process(Obj *index, Doc *doc) {
+    Context ctx;
+    ctx.index = index;
+    ctx.doc = doc;
+
+    for (int i = 0; i &lt; max_retries; i++) {
+        Err *err = Err_trap(S_index_doc, &amp;ctx);
+        if (!err) { break; }
+        if (!Err_is_a(err, LOCKERR)) {
+            RETHROW(err);
+        }
+        WARN(&quot;Couldn't get lock (%d retries)&quot;, i);
+        DECREF(err);
+    }
+}
+
+void
+background_merge_process(Obj *index) {
+    IndexManager *manager = IxManager_new(NULL, NULL);
+    IxManager_Set_Write_Lock_Timeout(manager, 60000);
+
+    BackgroundMerger bg_merger = BGMerger_new(index, manager);
+    BGMerger_Commit(bg_merger);
+
+    DECREF(bg_merger);
+    DECREF(manager);
+}
+</code></pre>
+<p>The exception handling code becomes useful once you have more than one index
+modification process happening simultaneously.  By default, Indexer tries
+several times to acquire a write lock over the span of one second, then holds
+it until <a href="../../../Lucy/Index/Indexer.html#func_Commit">Commit()</a> completes.  BackgroundMerger handles
+most of its work
+without the write lock, but it does need it briefly once at the beginning and
+once again near the end.  Under normal loads, the internal retry logic will
+resolve conflicts, but if it’s not acceptable to miss an insert, you probably
+want to catch <a href="../../../Lucy/Store/LockErr.html">LockErr</a> exceptions thrown by Indexer.  In
+contrast, a LockErr from BackgroundMerger probably just needs to be logged.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/DevGuide.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/DevGuide.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/DevGuide.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/DevGuide.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,36 @@
+Title: Lucy::Docs::DevGuide
+
+<div class="c-api">
+<h2>Quick-start guide to hacking on Apache Lucy.</h2>
+<p>The Apache Lucy code base is organized into roughly four layers:</p>
+<ul>
+<li>Charmonizer - compiler and OS configuration probing.</li>
+<li>Clownfish - header files.</li>
+<li>C - implementation files.</li>
+<li>Host - binding language.</li>
+</ul>
+<p>Charmonizer is a configuration prober which writes a single header file,
+“charmony.h”, describing the build environment and facilitating
+cross-platform development.  It’s similar to Autoconf or Metaconfig, but
+written in pure C.</p>
+<p>The “.cfh” files within the Lucy core are Clownfish header files.
+Clownfish is a purpose-built, declaration-only language which superimposes
+a single-inheritance object model on top of C which is specifically
+designed to co-exist happily with variety of “host” languages and to allow
+limited run-time dynamic subclassing.  For more information see the
+Clownfish docs, but if there’s one thing you should know about Clownfish OO
+before you start hacking, it’s that method calls are differentiated from
+functions by capitalization:</p>
+<pre><code>Indexer_Add_Doc   &lt;-- Method, typically uses dynamic dispatch.
+Indexer_add_doc   &lt;-- Function, always a direct invocation.
+</code></pre>
+<p>The C files within the Lucy core are where most of Lucy’s low-level
+functionality lies.  They implement the interface defined by the Clownfish
+header files.</p>
+<p>The C core is intentionally left incomplete, however; to be usable, it must
+be bound to a “host” language.  (In this context, even C is considered a
+“host” which must implement the missing pieces and be “bound” to the core.)
+Some of the binding code is autogenerated by Clownfish on a spec customized
+for each language.  Other pieces are hand-coded in either C (using the
+host’s C API) or the host language itself.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/DocIDs.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/DocIDs.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/DocIDs.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/DocIDs.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,20 @@
+Title: Lucy::Docs::DocIDs
+
+<div class="c-api">
+<h2>Characteristics of Apache Lucy document ids.</h2>
+<h3>Document ids are signed 32-bit integers</h3>
+<p>Document ids in Apache Lucy start at 1.  Because 0 is never a valid doc id, we
+can use it as a sentinel value:</p>
+<pre><code>Code example for C is missing</code></pre>
+<h3>Document ids are ephemeral</h3>
+<p>The document ids used by Lucy are associated with a single index
+snapshot.  The moment an index is updated, the mapping of document ids to
+documents is subject to change.</p>
+<p>Since IndexReader objects represent a point-in-time view of an index, document
+ids are guaranteed to remain static for the life of the reader.  However,
+because they are not permanent, Lucy document ids cannot be used as
+foreign keys to locate records in external data sources.  If you truly need a
+primary key field, you must define it and populate it yourself.</p>
+<p>Furthermore, the order of document ids does not tell you anything about the
+sequence in which documents were added to the index.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/FileFormat.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,172 @@
+Title: Lucy::Docs::FileFormat
+
+<div class="c-api">
+<h2>Overview of index file format</h2>
+<p>It is not necessary to understand the current implementation details of the
+index file format in order to use Apache Lucy effectively, but it may be
+helpful if you are interested in tweaking for high performance, exotic usage,
+or debugging and development.</p>
+<p>On a file system, an index is a directory.  The files inside have a
+hierarchical relationship: an index is made up of “segments”, each of which is
+an independent inverted index with its own subdirectory; each segment is made
+up of several component parts.</p>
+<pre><code>[index]--|
+         |--snapshot_XXX.json
+         |--schema_XXX.json
+         |--write.lock
+         |
+         |--seg_1--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--seg_2--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--[...]--| 
+</code></pre>
+<h3>Write-once philosophy</h3>
+<p>All segment directory names consist of the string “seg_” followed by a number
+in base 36: seg_1, seg_5m, seg_p9s2 and so on, with higher numbers indicating
+more recent segments.  Once a segment is finished and committed, its name is
+never re-used and its files are never modified.</p>
+<p>Old segments become obsolete and can be removed when their data has been
+consolidated into new segments during the process of segment merging and
+optimization.  A fully-optimized index has only one segment.</p>
+<h3>Top-level entries</h3>
+<p>There are a handful of “top-level” files and directories which belong to the
+entire index rather than to a particular segment.</p>
+<h4>snapshot_XXX.json</h4>
+<p>A “snapshot” file, e.g. <code>snapshot_m7p.json</code>, is list of index files and
+directories.  Because index files, once written, are never modified, the list
+of entries in a snapshot defines a point-in-time view of the data in an index.</p>
+<p>Like segment directories, snapshot files also utilize the
+unique-base-36-number naming convention; the higher the number, the more
+recent the file.  The appearance of a new snapshot file within the index
+directory constitutes an index update.  While a new segment is being written
+new files may be added to the index directory, but until a new snapshot file
+gets written, a Searcher opening the index for reading won’t know about them.</p>
+<h4>schema_XXX.json</h4>
+<p>The schema file is a Schema object describing the index’s format, serialized
+as JSON.  It, too, is versioned, and a given snapshot file will reference one
+and only one schema file.</p>
+<h4>locks</h4>
+<p>By default, only one indexing process may safely modify the index at any given
+time.  Processes reserve an index by laying claim to the <code>write.lock</code> file
+within the <code>locks/</code> directory.  A smattering of other lock files may be used
+from time to time, as well.</p>
+<h3>A segment’s component parts</h3>
+<p>By default, each segment has up to five logical components: lexicon, postings,
+document storage, highlight data, and deletions.  Binary data from these
+components gets stored in virtual files within the “cf.dat” compound file;
+metadata is stored in a shared “segmeta.json” file.</p>
+<h4>segmeta.json</h4>
+<p>The segmeta.json file is a central repository for segment metadata.  In
+addition to information such as document counts and field numbers, it also
+warehouses arbitrary metadata on behalf of individual index components.</p>
+<h4>Lexicon</h4>
+<p>Each indexed field gets its own lexicon in each segment.  The exact files
+involved depend on the field’s type, but generally speaking there will be two
+parts.  First, there’s a primary <code>lexicon-XXX.dat</code> file which houses a
+complete term list associating terms with corpus frequency statistics,
+postings file locations, etc.  Second, one or more “lexicon index” files may
+be present which contain periodic samples from the primary lexicon file to
+facilitate fast lookups.</p>
+<h4>Postings</h4>
+<p>“Posting” is a technical term from the field of
+<a href="../../Lucy/Docs/IRTheory.html">information retrieval</a>, defined as a single
+instance of a one term indexing one document.  If you are looking at the index
+in the back of a book, and you see that “freedom” is referenced on pages 8,
+86, and 240, that would be three postings, which taken together form a
+“posting list”.  The same terminology applies to an index in electronic form.</p>
+<p>Each segment has one postings file per indexed field.  When a search is
+performed for a single term, first that term is looked up in the lexicon.  If
+the term exists in the segment, the record in the lexicon will contain
+information about which postings file to look at and where to look.</p>
+<p>The first thing any posting record tells you is a document id.  By iterating
+over all the postings associated with a term, you can find all the documents
+that match that term, a process which is analogous to looking up page numbers
+in a book’s index.  However, each posting record typically contains other
+information in addition to document id, e.g. the positions at which the term
+occurs within the field.</p>
+<h4>Documents</h4>
+<p>The document storage section is a simple database, organized into two files:</p>
+<ul>
+<li>
+<p><strong>documents.dat</strong> - Serialized documents.</p>
+</li>
+<li>
+<p><strong>documents.ix</strong> - Document storage index, a solid array of 64-bit integers
+where each integer location corresponds to a document id, and the value at
+that location points at a file position in the documents.dat file.</p>
+</li>
+</ul>
+<h4>Highlight data</h4>
+<p>The files which store data used for excerpting and highlighting are organized
+similarly to the files used to store documents.</p>
+<ul>
+<li>
+<p><strong>highlight.dat</strong> - Chunks of serialized highlight data, one per doc id.</p>
+</li>
+<li>
+<p><strong>highlight.ix</strong> - Highlight data index – as with the <code>documents.ix</code> file, a
+solid array of 64-bit file pointers.</p>
+</li>
+</ul>
+<h4>Deletions</h4>
+<p>When a document is “deleted” from a segment, it is not actually purged right
+away; it is merely marked as “deleted” via a deletions file.  Deletions files
+contains bit vectors with one bit for each document in the segment; if bit
+#254 is set then document 254 is deleted, and if that document turns up in a
+search it will be masked out.</p>
+<p>It is only when a segment’s contents are rewritten to a new segment during the
+segment-merging process that deleted documents truly go away.</p>
+<h3>Compound Files</h3>
+<p>If you peer inside an index directory, you won’t actually find any files named
+“documents.dat”, “highlight.ix”, etc. unless there is an indexing process
+underway.  What you will find instead is one “cf.dat” and one “cfmeta.json”
+file per segment.</p>
+<p>To minimize the need for file descriptors at search-time, all per-segment
+binary data files are concatenated together in “cf.dat” at the close of each
+indexing session.  Information about where each file begins and ends is stored
+in <code>cfmeta.json</code>.  When the segment is opened for reading, a single file
+descriptor per “cf.dat” file can be shared among several readers.</p>
+<h3>A Typical Search</h3>
+<p>Here’s a simplified narrative, dramatizing how a search for “freedom” against
+a given segment plays out:</p>
+<ol>
+<li>
+<p>The searcher asks the relevant Lexicon Index, “Do you know anything about
+‘freedom’?”  Lexicon Index replies, “Can’t say for sure, but if the main
+Lexicon file does, ‘freedom’ is probably somewhere around byte 21008”.</p>
+</li>
+<li>
+<p>The main Lexicon tells the searcher “One moment, let me scan our records…
+Yes, we have 2 documents which contain ‘freedom’.  You’ll find them in
+seg_6/postings-4.dat starting at byte 66991.”</p>
+</li>
+<li>
+<p>The Postings file says “Yep, we have ‘freedom’, all right!  Document id 40
+has 1 ‘freedom’, and document 44 has 8.  If you need to know more, like if any
+‘freedom’ is part of the phrase ‘freedom of speech’, ask me about positions!</p>
+</li>
+<li>
+<p>If the searcher is only looking for ‘freedom’ in isolation, that’s where it
+stops.  It now knows enough to assign the documents scores against “freedom”,
+with the 8-freedom document likely ranking higher than the single-freedom
+document.</p>
+</li>
+</ol>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/FileLocking.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,56 @@
+Title: Lucy::Docs::FileLocking
+
+<div class="c-api">
+<h2>Manage indexes on shared volumes.</h2>
+<p>Normally, index locking is an invisible process.  Exclusive write access is
+controlled via lockfiles within the index directory and problems only arise
+if multiple processes attempt to acquire the write lock simultaneously;
+search-time processes do not ordinarily require locking at all.</p>
+<p>On shared volumes, however, the default locking mechanism fails, and manual
+intervention becomes necessary.</p>
+<p>Both read and write applications accessing an index on a shared volume need
+to identify themselves with a unique <code>host</code> id, e.g. hostname or
+ip address.  Knowing the host id makes it possible to tell which lockfiles
+belong to other machines and therefore must not be removed when the
+lockfile’s pid number appears not to correspond to an active process.</p>
+<p>At index-time, the danger is that multiple indexing processes from
+different machines which fail to specify a unique <code>host</code> id can
+delete each others’ lockfiles and then attempt to modify the index at the
+same time, causing index corruption.  The search-time problem is more
+complex.</p>
+<p>Once an index file is no longer listed in the most recent snapshot, Indexer
+attempts to delete it as part of a post-<a href="lucy:Indexer.Commit"></a> cleanup routine.  It is
+possible that at the moment an Indexer is deleting files which it believes
+no longer needed, a Searcher referencing an earlier snapshot is in fact
+using them.  The more often that an index is either updated or searched,
+the more likely it is that this conflict will arise from time to time.</p>
+<p>Ordinarily, the deletion attempts are not a problem.   On a typical unix
+volume, the files will be deleted in name only: any process which holds an
+open filehandle against a given file will continue to have access, and the
+file won’t actually get vaporized until the last filehandle is cleared.
+Thanks to “delete on last close semantics”, an Indexer can’t truly delete
+the file out from underneath an active Searcher.   On Windows, where file
+deletion fails whenever any process holds an open handle, the situation is
+different but still workable: Indexer just keeps retrying after each commit
+until deletion finally succeeds.</p>
+<p>On NFS, however, the system breaks, because NFS allows files to be deleted
+out from underneath active processes.  Should this happen, the unlucky read
+process will crash with a “Stale NFS filehandle” exception.</p>
+<p>Under normal circumstances, it is neither necessary nor desirable for
+IndexReaders to secure read locks against an index, but for NFS we have to
+make an exception.  LockFactory’s <a href="lucy:LockFactory.Make_Shared_Lock"></a> method exists for this
+reason; supplying an IndexManager instance to IndexReader’s constructor
+activates an internal locking mechanism using <a href="lucy:LockFactory.Make_Shared_Lock"></a> which
+prevents concurrent indexing processes from deleting files that are needed
+by active readers.</p>
+<pre><code>Code example for C is missing</code></pre>
+<p>Since shared locks are implemented using lockfiles located in the index
+directory (as are exclusive locks), reader applications must have write
+access for read locking to work.  Stale lock files from crashed processes
+are ordinarily cleared away the next time the same machine – as identified
+by the <code>host</code> parameter – opens another IndexReader. (The
+classic technique of timing out lock files is not feasible because search
+processes may lie dormant indefinitely.) However, please be aware that if
+the last thing a given machine does is crash, lock files belonging to it
+may persist, preventing deletion of obsolete index data.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/IRTheory.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,45 @@
+Title: Lucy::Docs::IRTheory
+
+<div class="c-api">
+<h2>Crash course in information retrieval</h2>
+<p>Just enough Information Retrieval theory to find your way around Apache Lucy.</p>
+<h3>Terminology</h3>
+<p>Lucy uses some terminology from the field of information retrieval which
+may be unfamiliar to many users.  “Document” and “term” mean pretty much what
+you’d expect them to, but others such as “posting” and “inverted index” need a
+formal introduction:</p>
+<ul>
+<li><em>document</em> - An atomic unit of retrieval.</li>
+<li><em>term</em> - An attribute which describes a document.</li>
+<li><em>posting</em> - One term indexing one document.</li>
+<li><em>term list</em> - The complete list of terms which describe a document.</li>
+<li><em>posting list</em> - The complete list of documents which a term indexes.</li>
+<li><em>inverted index</em> - A data structure which maps from terms to documents.</li>
+</ul>
+<p>Since Lucy is a practical implementation of IR theory, it loads these
+abstract, distilled definitions down with useful traits.  For instance, a
+“posting” in its most rarefied form is simply a term-document pairing; in
+Lucy, the class MatchPosting fills this
+role.  However, by associating additional information with a posting like the
+number of times the term occurs in the document, we can turn it into a
+ScorePosting, making it possible
+to rank documents by relevance rather than just list documents which happen to
+match in no particular order.</p>
+<h3>TF/IDF ranking algorithm</h3>
+<p>Lucy uses a variant of the well-established “Term Frequency / Inverse
+Document Frequency” weighting scheme.  A thorough treatment of TF/IDF is too
+ambitious for our present purposes, but in a nutshell, it means that…</p>
+<ul>
+<li>
+<p>in a search for <code>skate park</code>, documents which score well for the
+comparatively rare term <code>skate</code> will rank higher than documents which score
+well for the more common term <code>park</code>.</p>
+</li>
+<li>
+<p>a 10-word text which has one occurrence each of both <code>skate</code> and <code>park</code> will
+rank higher than a 1000-word text which also contains one occurrence of each.</p>
+</li>
+</ul>
+<p>A web search for “tf idf” will turn up many excellent explanations of the
+algorithm.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,54 @@
+Title: Lucy::Docs::Tutorial
+
+<div class="c-api">
+<h2>Step-by-step introduction to Apache Lucy.</h2>
+<p>Explore Apache Lucy’s basic functionality by starting with a minimalist CGI
+search app based on Lucy::Simple and transforming it, step by step,
+into an “advanced search” interface utilizing more flexible core modules like
+<a href="../../Lucy/Index/Indexer.html">Indexer</a> and <a href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a>.</p>
+<h3>Chapters</h3>
+<ul>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/SimpleTutorial.html">SimpleTutorial</a> - Build a bare-bones search app using
+Lucy::Simple.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/BeyondSimpleTutorial.html">BeyondSimpleTutorial</a> - Rebuild the app using core
+classes like <a href="../../Lucy/Index/Indexer.html">Indexer</a> and
+<a href="../../Lucy/Search/IndexSearcher.html">IndexSearcher</a> in place of Lucy::Simple.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/FieldTypeTutorial.html">FieldTypeTutorial</a> - Experiment with different field
+characteristics using subclasses of <a href="../../Lucy/Plan/FieldType.html">FieldType</a>.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/AnalysisTutorial.html">AnalysisTutorial</a> - Examine how the choice of
+<a href="../../Lucy/Analysis/Analyzer.html">Analyzer</a> subclass affects search results.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a> - Augment search results with
+highlighted excerpts.</p>
+</li>
+<li>
+<p><a href="../../Lucy/Docs/Tutorial/QueryObjectsTutorial.html">QueryObjectsTutorial</a> - Unlock advanced search features
+by using Query objects instead of query strings.</p>
+</li>
+</ul>
+<h3>Source materials</h3>
+<p>The source material used by the tutorial app – a multi-text-file presentation
+of the United States constitution – can be found in the <code>sample</code> directory
+at the root of the Lucy distribution, along with finished indexing and search
+apps.</p>
+<pre><code class="language-c">sample/indexer_simple.c  # simple indexing executable
+sample/search_simple.c   # simple search executable
+sample/indexer.c         # indexing executable
+sample/search.c          # search executable
+sample/us_constitution   # corpus
+</code></pre>
+<h3>Conventions</h3>
+<p>The user is expected to be familiar with OO Perl and basic CGI programming.</p>
+<p>The code in this tutorial assumes a Unix-flavored operating system and the
+Apache webserver, but will work with minor modifications on other setups.</p>
+<h3>See also</h3>
+<p>More advanced and esoteric subjects are covered in <a href="../../Lucy/Docs/Cookbook.html">Cookbook</a>.</p>
+</div>

Added: lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.mdtext
URL: http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.mdtext?rev=1762636&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.mdtext (added)
+++ lucy/site/trunk/content/docs/0.5.0/c/Lucy/Docs/Tutorial/AnalysisTutorial.mdtext Wed Sep 28 12:06:24 2016
@@ -0,0 +1,64 @@
+Title: Lucy::Docs::Tutorial::AnalysisTutorial
+
+<div class="c-api">
+<h2>How to choose and use Analyzers.</h2>
+<p>Try swapping out the EasyAnalyzer in our Schema for a
+<a href="../../../Lucy/Analysis/StandardTokenizer.html">StandardTokenizer</a>:</p>
+<pre><code class="language-c">    StandardTokenizer *tokenizer = StandardTokenizer_new();
+    FullTextType *type = FullTextType_new((Analyzer*)tokenizer);
+</code></pre>
+<p>Search for <code>senate</code>, <code>Senate</code>, and <code>Senator</code> before and after making the
+change and re-indexing.</p>
+<p>Under EasyAnalyzer, the results are identical for all three searches, but
+under StandardTokenizer, searches are case-sensitive, and the result sets for
+<code>Senate</code> and <code>Senator</code> are distinct.</p>
+<h3>EasyAnalyzer</h3>
+<p>What’s happening is that <a href="../../../Lucy/Analysis/EasyAnalyzer.html">EasyAnalyzer</a> is performing more aggressive
+processing than StandardTokenizer.  In addition to tokenizing, it’s also
+converting all text to lower case so that searches are case-insensitive, and
+using a “stemming” algorithm to reduce related words to a common stem (<code>senat</code>,
+in this case).</p>
+<p>EasyAnalyzer is actually multiple Analyzers wrapped up in a single package.
+In this case, it’s three-in-one, since specifying a EasyAnalyzer with
+<code>language =&gt; 'en'</code> is equivalent to this snippet creating a
+<a href="../../../Lucy/Analysis/PolyAnalyzer.html">PolyAnalyzer</a>:</p>
+<pre><code class="language-c">    Vector *analyzers = Vec_new(3);
+    Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+    Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+    Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language));
+
+    PolyAnalyzer *analyzer = PolyAnalyzer_new(NULL, analyzers);
+    DECREC(analyzers);
+</code></pre>
+<p>You can add or subtract Analyzers from there if you like.  Try adding a fourth
+Analyzer, a SnowballStopFilter for suppressing “stopwords” like “the”, “if”,
+and “maybe”.</p>
+<pre><code class="language-c">    Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+    Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+    Vec_Push(analyzers, (Analyzer*)SnowStemmer_new(language));
+    Vec_Push(analyzers, (Analyzer*)SnowStop_new(language, NULL));
+</code></pre>
+<p>Also, try removing the SnowballStemmer.</p>
+<pre><code class="language-c">    Vec_Push(analyzers, (Analyzer*)StandardTokenizer_new());
+    Vec_Push(analyzers, (Analyzer*)Normalizer_new(NULL, true, false));
+</code></pre>
+<p>The original choice of a stock English EasyAnalyzer probably still yields the
+best results for this document collection, but you get the idea: sometimes you
+want a different Analyzer.</p>
+<h3>When the best Analyzer is no Analyzer</h3>
+<p>Sometimes you don’t want an Analyzer at all.  That was true for our “url”
+field because we didn’t need it to be searchable, but it’s also true for
+certain types of searchable fields.  For instance, “category” fields are often
+set up to match exactly or not at all, as are fields like “last_name” (because
+you may not want to conflate results for “Humphrey” and “Humphries”).</p>
+<p>To specify that there should be no analysis performed at all, use StringType:</p>
+<pre><code class="language-c">    String     *name = Str_newf(&quot;category&quot;);
+    StringType *type = StringType_new();
+    Schema_Spec_Field(schema, name, (FieldType*)type);
+    DECREF(type);
+    DECREF(name);
+</code></pre>
+<h3>Highlighting up next</h3>
+<p>In our next tutorial chapter, <a href="../../../Lucy/Docs/Tutorial/HighlighterTutorial.html">HighlighterTutorial</a>,
+we’ll add highlighted excerpts from the “content” field to our search results.</p>
+</div>