You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@lucy.apache.org by nw...@apache.org on 2015/07/11 15:55:52 UTC

[1/5] lucy git commit: Remove link to ClusterSearcher

Repository: lucy
Updated Branches:
  refs/heads/master a1d2e1c6d -> a57374a2b


Remove link to ClusterSearcher


Project: http://git-wip-us.apache.org/repos/asf/lucy/repo
Commit: http://git-wip-us.apache.org/repos/asf/lucy/commit/6966293b
Tree: http://git-wip-us.apache.org/repos/asf/lucy/tree/6966293b
Diff: http://git-wip-us.apache.org/repos/asf/lucy/diff/6966293b

Branch: refs/heads/master
Commit: 6966293ba12c782c1f7e0d48ad5f93ed5cb2985a
Parents: 5618020
Author: Nick Wellnhofer <we...@aevum.de>
Authored: Wed Jul 8 18:54:00 2015 +0200
Committer: Nick Wellnhofer <we...@aevum.de>
Committed: Sat Jul 11 15:03:10 2015 +0200

----------------------------------------------------------------------
 core/Lucy/Docs/Cookbook/CustomQuery.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/lucy/blob/6966293b/core/Lucy/Docs/Cookbook/CustomQuery.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Cookbook/CustomQuery.md b/core/Lucy/Docs/Cookbook/CustomQuery.md
index d135c8b..e4864b6 100644
--- a/core/Lucy/Docs/Cookbook/CustomQuery.md
+++ b/core/Lucy/Docs/Cookbook/CustomQuery.md
@@ -135,7 +135,7 @@ by objects which subclass [](cfish:lucy.Searcher) -- such as
 A Searcher is associated with a particular collection of documents.   These
 documents may all reside in one index, as with IndexSearcher, or they may be
 spread out across multiple indexes on one or more machines, as with
-[](ClusterSearcher).
+LucyX::Remote::ClusterSearcher.
 
 Searcher objects have access to certain statistical information about the
 collections they represent; for instance, a Searcher can tell you how many

[4/5] lucy git commit: Convert DevGuide and FileLocking docs to Markdown

Posted by nw...@apache.org.

Convert DevGuide and FileLocking docs to Markdown


Project: http://git-wip-us.apache.org/repos/asf/lucy/repo
Commit: http://git-wip-us.apache.org/repos/asf/lucy/commit/c2363da1
Tree: http://git-wip-us.apache.org/repos/asf/lucy/tree/c2363da1
Diff: http://git-wip-us.apache.org/repos/asf/lucy/diff/c2363da1

Branch: refs/heads/master
Commit: c2363da1b9caa85b31bd599419429c9b7ca62e76
Parents: a1d2e1c
Author: Nick Wellnhofer <we...@aevum.de>
Authored: Mon Jul 6 16:39:58 2015 +0200
Committer: Nick Wellnhofer <we...@aevum.de>
Committed: Sat Jul 11 15:03:10 2015 +0200

----------------------------------------------------------------------
 core/Lucy/Docs/DevGuide.cfh              | 59 -------------------
 core/Lucy/Docs/DevGuide.md               | 37 ++++++++++++
 core/Lucy/Docs/FileLocking.cfh           | 83 ---------------------------
 core/Lucy/Docs/FileLocking.md            | 80 ++++++++++++++++++++++++++
 perl/buildlib/Lucy/Build/Binding/Docs.pm | 69 ----------------------
 perl/lib/Lucy/Docs/DevGuide.pm           | 24 --------
 perl/lib/Lucy/Docs/FileLocking.pm        | 24 --------
 7 files changed, 117 insertions(+), 259 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/lucy/blob/c2363da1/core/Lucy/Docs/DevGuide.cfh
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/DevGuide.cfh b/core/Lucy/Docs/DevGuide.cfh
deleted file mode 100644
index dfab5ff..0000000
--- a/core/Lucy/Docs/DevGuide.cfh
+++ /dev/null
@@ -1,59 +0,0 @@
-/* Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-parcel Lucy;
-
-/** Quick-start guide to hacking on Apache Lucy.
- *
- * The Apache Lucy code base is organized into roughly four layers:
- *
- *    * Charmonizer - compiler and OS configuration probing.
- *    * Clownfish - header files.
- *    * C - implementation files.
- *    * Host - binding language.
- *
- * Charmonizer is a configuration prober which writes a single header file,
- * "charmony.h", describing the build environment and facilitating
- * cross-platform development.  It's similar to Autoconf or Metaconfig, but
- * written in pure C.
- *
- * The ".cfh" files within the Lucy core are Clownfish header files.
- * Clownfish is a purpose-built, declaration-only language which superimposes
- * a single-inheritance object model on top of C which is specifically
- * designed to co-exist happily with variety of "host" languages and to allow
- * limited run-time dynamic subclassing.  For more information see the
- * Clownfish docs, but if there's one thing you should know about Clownfish OO
- * before you start hacking, it's that method calls are differentiated from
- * functions by capitalization:
- *
- *     Indexer_Add_Doc   <-- Method, typically uses dynamic dispatch.
- *     Indexer_add_doc   <-- Function, always a direct invocation.
- *
- * The C files within the Lucy core are where most of Lucy's low-level
- * functionality lies.  They implement the interface defined by the Clownfish
- * header files.
- *
- * The C core is intentionally left incomplete, however; to be usable, it must
- * be bound to a "host" language.  (In this context, even C is considered a
- * "host" which must implement the missing pieces and be "bound" to the core.)
- * Some of the binding code is autogenerated by Clownfish on a spec customized
- * for each language.  Other pieces are hand-coded in either C (using the
- * host's C API) or the host language itself.
- */
-
-inert class Lucy::Docs::DevGuide { }
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/c2363da1/core/Lucy/Docs/DevGuide.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/DevGuide.md b/core/Lucy/Docs/DevGuide.md
new file mode 100644
index 0000000..a1b0a8b
--- /dev/null
+++ b/core/Lucy/Docs/DevGuide.md
@@ -0,0 +1,37 @@
+# Quick-start guide to hacking on Apache Lucy.
+
+The Apache Lucy code base is organized into roughly four layers:
+
+* Charmonizer - compiler and OS configuration probing.
+* Clownfish - header files.
+* C - implementation files.
+* Host - binding language.
+
+Charmonizer is a configuration prober which writes a single header file,
+"charmony.h", describing the build environment and facilitating
+cross-platform development.  It's similar to Autoconf or Metaconfig, but
+written in pure C.
+
+The ".cfh" files within the Lucy core are Clownfish header files.
+Clownfish is a purpose-built, declaration-only language which superimposes
+a single-inheritance object model on top of C which is specifically
+designed to co-exist happily with variety of "host" languages and to allow
+limited run-time dynamic subclassing.  For more information see the
+Clownfish docs, but if there's one thing you should know about Clownfish OO
+before you start hacking, it's that method calls are differentiated from
+functions by capitalization:
+
+    Indexer_Add_Doc   <-- Method, typically uses dynamic dispatch.
+    Indexer_add_doc   <-- Function, always a direct invocation.
+
+The C files within the Lucy core are where most of Lucy's low-level
+functionality lies.  They implement the interface defined by the Clownfish
+header files.
+
+The C core is intentionally left incomplete, however; to be usable, it must
+be bound to a "host" language.  (In this context, even C is considered a
+"host" which must implement the missing pieces and be "bound" to the core.)
+Some of the binding code is autogenerated by Clownfish on a spec customized
+for each language.  Other pieces are hand-coded in either C (using the
+host's C API) or the host language itself.
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/c2363da1/core/Lucy/Docs/FileLocking.cfh
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/FileLocking.cfh b/core/Lucy/Docs/FileLocking.cfh
deleted file mode 100644
index 7e17bd4..0000000
--- a/core/Lucy/Docs/FileLocking.cfh
+++ /dev/null
@@ -1,83 +0,0 @@
-/* Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-parcel Lucy;
-
-/** Manage indexes on shared volumes.
- *
- * Normally, index locking is an invisible process.  Exclusive write access is
- * controlled via lockfiles within the index directory and problems only arise
- * if multiple processes attempt to acquire the write lock simultaneously;
- * search-time processes do not ordinarily require locking at all.
- *
- * On shared volumes, however, the default locking mechanism fails, and manual
- * intervention becomes necessary.
- *
- * Both read and write applications accessing an index on a shared volume need
- * to identify themselves with a unique `host` id, e.g. hostname or
- * ip address.  Knowing the host id makes it possible to tell which lockfiles
- * belong to other machines and therefore must not be removed when the
- * lockfile's pid number appears not to correspond to an active process.
- *
- * At index-time, the danger is that multiple indexing processes from
- * different machines which fail to specify a unique `host` id can
- * delete each others' lockfiles and then attempt to modify the index at the
- * same time, causing index corruption.  The search-time problem is more
- * complex.
- *
- * Once an index file is no longer listed in the most recent snapshot, Indexer
- * attempts to delete it as part of a post-[](cfish:Indexer.Commit) cleanup routine.  It is
- * possible that at the moment an Indexer is deleting files which it believes
- * no longer needed, a Searcher referencing an earlier snapshot is in fact
- * using them.  The more often that an index is either updated or searched,
- * the more likely it is that this conflict will arise from time to time.
- *
- * Ordinarily, the deletion attempts are not a problem.   On a typical unix
- * volume, the files will be deleted in name only: any process which holds an
- * open filehandle against a given file will continue to have access, and the
- * file won't actually get vaporized until the last filehandle is cleared.
- * Thanks to "delete on last close semantics", an Indexer can't truly delete
- * the file out from underneath an active Searcher.   On Windows, where file
- * deletion fails whenever any process holds an open handle, the situation is
- * different but still workable: Indexer just keeps retrying after each commit
- * until deletion finally succeeds.
- *
- * On NFS, however, the system breaks, because NFS allows files to be deleted
- * out from underneath active processes.  Should this happen, the unlucky read
- * process will crash with a "Stale NFS filehandle" exception.
- *
- * Under normal circumstances, it is neither necessary nor desirable for
- * IndexReaders to secure read locks against an index, but for NFS we have to
- * make an exception.  LockFactory's [](cfish:LockFactory.Make_Shared_Lock) method exists for this
- * reason; supplying an IndexManager instance to IndexReader's constructor
- * activates an internal locking mechanism using [](cfish:LockFactory.Make_Shared_Lock) which
- * prevents concurrent indexing processes from deleting files that are needed
- * by active readers.
- *
- * Since shared locks are implemented using lockfiles located in the index
- * directory (as are exclusive locks), reader applications must have write
- * access for read locking to work.  Stale lock files from crashed processes
- * are ordinarily cleared away the next time the same machine -- as identified
- * by the `host` parameter -- opens another IndexReader. (The
- * classic technique of timing out lock files is not feasible because search
- * processes may lie dormant indefinitely.) However, please be aware that if
- * the last thing a given machine does is crash, lock files belonging to it
- * may persist, preventing deletion of obsolete index data.
- */
-
-inert class Lucy::Docs::FileLocking { }
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/c2363da1/core/Lucy/Docs/FileLocking.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/FileLocking.md b/core/Lucy/Docs/FileLocking.md
new file mode 100644
index 0000000..b28eb72
--- /dev/null
+++ b/core/Lucy/Docs/FileLocking.md
@@ -0,0 +1,80 @@
+# Manage indexes on shared volumes.
+
+Normally, index locking is an invisible process.  Exclusive write access is
+controlled via lockfiles within the index directory and problems only arise
+if multiple processes attempt to acquire the write lock simultaneously;
+search-time processes do not ordinarily require locking at all.
+
+On shared volumes, however, the default locking mechanism fails, and manual
+intervention becomes necessary.
+
+Both read and write applications accessing an index on a shared volume need
+to identify themselves with a unique `host` id, e.g. hostname or
+ip address.  Knowing the host id makes it possible to tell which lockfiles
+belong to other machines and therefore must not be removed when the
+lockfile's pid number appears not to correspond to an active process.
+
+At index-time, the danger is that multiple indexing processes from
+different machines which fail to specify a unique `host` id can
+delete each others' lockfiles and then attempt to modify the index at the
+same time, causing index corruption.  The search-time problem is more
+complex.
+
+Once an index file is no longer listed in the most recent snapshot, Indexer
+attempts to delete it as part of a post-[](lucy:Indexer.Commit) cleanup routine.  It is
+possible that at the moment an Indexer is deleting files which it believes
+no longer needed, a Searcher referencing an earlier snapshot is in fact
+using them.  The more often that an index is either updated or searched,
+the more likely it is that this conflict will arise from time to time.
+
+Ordinarily, the deletion attempts are not a problem.   On a typical unix
+volume, the files will be deleted in name only: any process which holds an
+open filehandle against a given file will continue to have access, and the
+file won't actually get vaporized until the last filehandle is cleared.
+Thanks to "delete on last close semantics", an Indexer can't truly delete
+the file out from underneath an active Searcher.   On Windows, where file
+deletion fails whenever any process holds an open handle, the situation is
+different but still workable: Indexer just keeps retrying after each commit
+until deletion finally succeeds.
+
+On NFS, however, the system breaks, because NFS allows files to be deleted
+out from underneath active processes.  Should this happen, the unlucky read
+process will crash with a "Stale NFS filehandle" exception.
+
+Under normal circumstances, it is neither necessary nor desirable for
+IndexReaders to secure read locks against an index, but for NFS we have to
+make an exception.  LockFactory's [](lucy:LockFactory.Make_Shared_Lock) method exists for this
+reason; supplying an IndexManager instance to IndexReader's constructor
+activates an internal locking mechanism using [](lucy:LockFactory.Make_Shared_Lock) which
+prevents concurrent indexing processes from deleting files that are needed
+by active readers.
+
+~~~ perl
+use Sys::Hostname qw( hostname );
+my $hostname = hostname() or die "Can't get unique hostname";
+my $manager = Lucy::Index::IndexManager->new( host => $hostname );
+
+# Index time:
+my $indexer = Lucy::Index::Indexer->new(
+    index   => '/path/to/index',
+    manager => $manager,
+);
+
+# Search time:
+my $reader = Lucy::Index::IndexReader->open(
+    index   => '/path/to/index',
+    manager => $manager,
+);
+my $searcher = Lucy::Search::IndexSearcher->new( index => $reader );
+~~~
+
+Since shared locks are implemented using lockfiles located in the index
+directory (as are exclusive locks), reader applications must have write
+access for read locking to work.  Stale lock files from crashed processes
+are ordinarily cleared away the next time the same machine -- as identified
+by the `host` parameter -- opens another IndexReader. (The
+classic technique of timing out lock files is not feasible because search
+processes may lie dormant indefinitely.) However, please be aware that if
+the last thing a given machine does is crash, lock files belonging to it
+may persist, preventing deletion of obsolete index data.
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/c2363da1/perl/buildlib/Lucy/Build/Binding/Docs.pm
----------------------------------------------------------------------
diff --git a/perl/buildlib/Lucy/Build/Binding/Docs.pm b/perl/buildlib/Lucy/Build/Binding/Docs.pm
deleted file mode 100644
index 07e5ce3..0000000
--- a/perl/buildlib/Lucy/Build/Binding/Docs.pm
+++ /dev/null
@@ -1,69 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-package Lucy::Build::Binding::Docs;
-use strict;
-use warnings;
-
-our $VERSION = '0.004000';
-$VERSION = eval $VERSION;
-
-sub bind_all {
-    my $class = shift;
-    $class->bind_devguide;
-    $class->bind_filelocking;
-}
-
-sub bind_devguide {
-    my $pod_spec = Clownfish::CFC::Binding::Perl::Pod->new;
-    my $binding  = Clownfish::CFC::Binding::Perl::Class->new(
-        parcel     => "Lucy",
-        class_name => "Lucy::Docs::DevGuide",
-    );
-    $binding->set_pod_spec($pod_spec);
-    Clownfish::CFC::Binding::Perl::Class->register($binding);
-}
-
-sub bind_filelocking {
-    my $pod_spec = Clownfish::CFC::Binding::Perl::Pod->new;
-    my $synopsis = <<'END_SYNOPSIS';
-    use Sys::Hostname qw( hostname );
-    my $hostname = hostname() or die "Can't get unique hostname";
-    my $manager = Lucy::Index::IndexManager->new( host => $hostname );
-
-    # Index time:
-    my $indexer = Lucy::Index::Indexer->new(
-        index   => '/path/to/index',
-        manager => $manager,
-    );
-
-    # Search time:
-    my $reader = Lucy::Index::IndexReader->open(
-        index   => '/path/to/index',
-        manager => $manager,
-    );
-    my $searcher = Lucy::Search::IndexSearcher->new( index => $reader );
-END_SYNOPSIS
-    $pod_spec->set_synopsis($synopsis);
-
-    my $binding = Clownfish::CFC::Binding::Perl::Class->new(
-        parcel     => "Lucy",
-        class_name => "Lucy::Docs::FileLocking",
-    );
-    $binding->set_pod_spec($pod_spec);
-
-    Clownfish::CFC::Binding::Perl::Class->register($binding);
-}
-
-1;

http://git-wip-us.apache.org/repos/asf/lucy/blob/c2363da1/perl/lib/Lucy/Docs/DevGuide.pm
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/DevGuide.pm b/perl/lib/Lucy/Docs/DevGuide.pm
deleted file mode 100644
index 4de9a03..0000000
--- a/perl/lib/Lucy/Docs/DevGuide.pm
+++ /dev/null
@@ -1,24 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-use Lucy;
-our $VERSION = '0.004000';
-$VERSION = eval $VERSION;
-
-1;
-
-__END__
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/c2363da1/perl/lib/Lucy/Docs/FileLocking.pm
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/FileLocking.pm b/perl/lib/Lucy/Docs/FileLocking.pm
deleted file mode 100644
index 4de9a03..0000000
--- a/perl/lib/Lucy/Docs/FileLocking.pm
+++ /dev/null
@@ -1,24 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-use Lucy;
-our $VERSION = '0.004000';
-$VERSION = eval $VERSION;
-
-1;
-
-__END__
-
-

[3/5] lucy git commit: Convert POD to Markdown

Posted by nw...@apache.org.

Convert POD to Markdown


Project: http://git-wip-us.apache.org/repos/asf/lucy/repo
Commit: http://git-wip-us.apache.org/repos/asf/lucy/commit/5618020f
Tree: http://git-wip-us.apache.org/repos/asf/lucy/tree/5618020f
Diff: http://git-wip-us.apache.org/repos/asf/lucy/diff/5618020f

Branch: refs/heads/master
Commit: 5618020ff61ba7dac4b7132b5977ad4119e2c220
Parents: c2363da
Author: Nick Wellnhofer <we...@aevum.de>
Authored: Wed Jul 8 12:57:18 2015 +0200
Committer: Nick Wellnhofer <we...@aevum.de>
Committed: Sat Jul 11 15:03:10 2015 +0200

----------------------------------------------------------------------
 core/Lucy/Docs/Cookbook.md                      |  33 ++
 core/Lucy/Docs/Cookbook/CustomQuery.md          | 321 +++++++++++++++++++
 core/Lucy/Docs/Cookbook/CustomQueryParser.md    | 231 +++++++++++++
 core/Lucy/Docs/Cookbook/FastUpdates.md          | 140 ++++++++
 core/Lucy/Docs/DocIDs.md                        |  28 ++
 core/Lucy/Docs/FileFormat.md                    | 191 +++++++++++
 core/Lucy/Docs/IRTheory.md                      |  44 +++
 core/Lucy/Docs/Tutorial.md                      |  53 +++
 core/Lucy/Docs/Tutorial/AnalysisTutorial.md     |  85 +++++
 core/Lucy/Docs/Tutorial/BeyondSimpleTutorial.md | 125 ++++++++
 core/Lucy/Docs/Tutorial/FieldTypeTutorial.md    |  60 ++++
 core/Lucy/Docs/Tutorial/HighlighterTutorial.md  |  62 ++++
 core/Lucy/Docs/Tutorial/QueryObjectsTutorial.md | 185 +++++++++++
 core/Lucy/Docs/Tutorial/SimpleTutorial.md       | 298 +++++++++++++++++
 perl/lib/Lucy/Docs/Cookbook.pod                 |  61 ----
 perl/lib/Lucy/Docs/Cookbook/CustomQuery.pod     | 320 ------------------
 .../Lucy/Docs/Cookbook/CustomQueryParser.pod    | 236 --------------
 perl/lib/Lucy/Docs/Cookbook/FastUpdates.pod     | 153 ---------
 perl/lib/Lucy/Docs/DocIDs.pod                   |  47 ---
 perl/lib/Lucy/Docs/FileFormat.pod               | 239 --------------
 perl/lib/Lucy/Docs/IRTheory.pod                 |  94 ------
 perl/lib/Lucy/Docs/Tutorial.pod                 |  89 -----
 perl/lib/Lucy/Docs/Tutorial/Analysis.pod        |  94 ------
 perl/lib/Lucy/Docs/Tutorial/BeyondSimple.pod    | 153 ---------
 perl/lib/Lucy/Docs/Tutorial/FieldType.pod       |  74 -----
 perl/lib/Lucy/Docs/Tutorial/Highlighter.pod     |  76 -----
 perl/lib/Lucy/Docs/Tutorial/QueryObjects.pod    | 198 ------------
 perl/lib/Lucy/Docs/Tutorial/Simple.pod          | 298 -----------------
 28 files changed, 1856 insertions(+), 2132 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Cookbook.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Cookbook.md b/core/Lucy/Docs/Cookbook.md
new file mode 100644
index 0000000..ec6994f
--- /dev/null
+++ b/core/Lucy/Docs/Cookbook.md
@@ -0,0 +1,33 @@
+# Apache Lucy recipes
+
+The Cookbook provides thematic documentation covering some of Apache Lucy's
+more sophisticated features.  For a step-by-step introduction to Lucy,
+see [](cfish:Tutorial).
+
+## Chapters
+
+* [](cfish:FastUpdates) - While index updates are fast on
+  average, worst-case update performance may be significantly slower. To make
+  index updates consistently quick, we must manually intervene to control the
+  process of index segment consolidation.
+
+* [](cfish:CustomQuery) - Explore Lucy's support for
+  custom query types by creating a "PrefixQuery" class to handle trailing
+  wildcards.
+
+* [](cfish:CustomQueryParser) - Define your own custom
+  search query syntax using [](cfish:lucy.QueryParser) and
+  Parse::RecDescent.
+
+## Materials
+
+Some of the recipes in the Cookbook reference the completed
+[](cfish:Tutorial) application.  These materials can be
+found in the `sample` directory at the root of the Lucy distribution:
+
+~~~ perl
+sample/indexer.pl        # indexing app
+sample/search.cgi        # search app
+sample/us_constitution   # corpus
+~~~
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Cookbook/CustomQuery.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Cookbook/CustomQuery.md b/core/Lucy/Docs/Cookbook/CustomQuery.md
new file mode 100644
index 0000000..d135c8b
--- /dev/null
+++ b/core/Lucy/Docs/Cookbook/CustomQuery.md
@@ -0,0 +1,321 @@
+# Sample subclass of Query
+
+Explore Apache Lucy's support for custom query types by creating a
+"PrefixQuery" class to handle trailing wildcards.
+
+~~~ perl
+my $prefix_query = PrefixQuery->new(
+    field        => 'content',
+    query_string => 'foo*',
+);
+my $hits = $searcher->hits( query => $prefix_query );
+...
+~~~
+
+## Query, Compiler, and Matcher 
+
+To add support for a new query type, we need three classes: a Query, a
+Compiler, and a Matcher.  
+
+* PrefixQuery - a subclass of [](cfish:lucy.Query), and the only class
+  that client code will deal with directly.
+
+* PrefixCompiler - a subclass of [](cfish:lucy.Compiler), whose primary 
+  role is to compile a PrefixQuery to a PrefixMatcher.
+
+* PrefixMatcher - a subclass of [](cfish:lucy.Matcher), which does the
+  heavy lifting: it applies the query to individual documents and assigns a
+  score to each match.
+
+The PrefixQuery class on its own isn't enough because a Query object's role is
+limited to expressing an abstract specification for the search.  A Query is
+basically nothing but metadata; execution is left to the Query's companion
+Compiler and Matcher.
+
+Here's a simplified sketch illustrating how a Searcher's hits() method ties
+together the three classes.
+
+~~~ perl
+sub hits {
+    my ( $self, $query ) = @_;
+    my $compiler = $query->make_compiler(
+        searcher => $self,
+        boost    => $query->get_boost,
+    );
+    my $matcher = $compiler->make_matcher(
+        reader     => $self->get_reader,
+        need_score => 1,
+    );
+    my @hits = $matcher->capture_hits;
+    return \@hits;
+}
+~~~
+
+### PrefixQuery
+
+Our PrefixQuery class will have two attributes: a query string and a field
+name.
+
+~~~ perl
+package PrefixQuery;
+use base qw( Lucy::Search::Query );
+use Carp;
+use Scalar::Util qw( blessed );
+
+# Inside-out member vars and hand-rolled accessors.
+my %query_string;
+my %field;
+sub get_query_string { my $self = shift; return $query_string{$$self} }
+sub get_field        { my $self = shift; return $field{$$self} }
+~~~
+
+PrefixQuery's constructor collects and validates the attributes.
+
+~~~ perl
+sub new {
+    my ( $class, %args ) = @_;
+    my $query_string = delete $args{query_string};
+    my $field        = delete $args{field};
+    my $self         = $class->SUPER::new(%args);
+    confess("'query_string' param is required")
+        unless defined $query_string;
+    confess("Invalid query_string: '$query_string'")
+        unless $query_string =~ /\*\s*$/;
+    confess("'field' param is required")
+        unless defined $field;
+    $query_string{$$self} = $query_string;
+    $field{$$self}        = $field;
+    return $self;
+}
+~~~
+
+Since this is an inside-out class, we'll need a destructor:
+
+~~~ perl
+sub DESTROY {
+    my $self = shift;
+    delete $query_string{$$self};
+    delete $field{$$self};
+    $self->SUPER::DESTROY;
+}
+~~~
+
+The equals() method determines whether two Queries are logically equivalent:
+
+~~~ perl
+sub equals {
+    my ( $self, $other ) = @_;
+    return 0 unless blessed($other);
+    return 0 unless $other->isa("PrefixQuery");
+    return 0 unless $field{$$self} eq $field{$$other};
+    return 0 unless $query_string{$$self} eq $query_string{$$other};
+    return 1;
+}
+~~~
+
+The last thing we'll need is a make_compiler() factory method which kicks out
+a subclass of [](cfish:lucy.Compiler).
+
+~~~ perl
+sub make_compiler {
+    my ( $self, %args ) = @_;
+    my $subordinate = delete $args{subordinate};
+    my $compiler = PrefixCompiler->new( %args, parent => $self );
+    $compiler->normalize unless $subordinate;
+    return $compiler;
+}
+~~~
+
+### PrefixCompiler
+
+PrefixQuery's make_compiler() method will be called internally at search-time
+by objects which subclass [](cfish:lucy.Searcher) -- such as
+[IndexSearchers](cfish:lucy.IndexSearcher).
+
+A Searcher is associated with a particular collection of documents.   These
+documents may all reside in one index, as with IndexSearcher, or they may be
+spread out across multiple indexes on one or more machines, as with
+[](ClusterSearcher).
+
+Searcher objects have access to certain statistical information about the
+collections they represent; for instance, a Searcher can tell you how many
+documents are in the collection...
+
+~~~ perl
+my $maximum_number_of_docs_in_collection = $searcher->doc_max;
+~~~
+
+... or how many documents a specific term appears in:
+
+~~~ perl
+my $term_appears_in_this_many_docs = $searcher->doc_freq(
+    field => 'content',
+    term  => 'foo',
+);
+~~~
+
+Such information can be used by sophisticated Compiler implementations to
+assign more or less heft to individual queries or sub-queries.  However, we're
+not going to bother with weighting for this demo; we'll just assign a fixed
+score of 1.0 to each matching document.
+
+We don't need to write a constructor, as it will suffice to inherit new() from
+Lucy::Search::Compiler.  The only method we need to implement for
+PrefixCompiler is make_matcher().
+
+~~~ perl
+package PrefixCompiler;
+use base qw( Lucy::Search::Compiler );
+
+sub make_matcher {
+    my ( $self, %args ) = @_;
+    my $seg_reader = $args{reader};
+
+    # Retrieve low-level components LexiconReader and PostingListReader.
+    my $lex_reader
+        = $seg_reader->obtain("Lucy::Index::LexiconReader");
+    my $plist_reader
+        = $seg_reader->obtain("Lucy::Index::PostingListReader");
+    
+    # Acquire a Lexicon and seek it to our query string.
+    my $substring = $self->get_parent->get_query_string;
+    $substring =~ s/\*.\s*$//;
+    my $field = $self->get_parent->get_field;
+    my $lexicon = $lex_reader->lexicon( field => $field );
+    return unless $lexicon;
+    $lexicon->seek($substring);
+    
+    # Accumulate PostingLists for each matching term.
+    my @posting_lists;
+    while ( defined( my $term = $lexicon->get_term ) ) {
+        last unless $term =~ /^\Q$substring/;
+        my $posting_list = $plist_reader->posting_list(
+            field => $field,
+            term  => $term,
+        );
+        if ($posting_list) {
+            push @posting_lists, $posting_list;
+        }
+        last unless $lexicon->next;
+    }
+    return unless @posting_lists;
+    
+    return PrefixMatcher->new( posting_lists => \@posting_lists );
+}
+~~~
+
+PrefixCompiler gets access to a [](cfish:lucy.SegReader)
+object when make_matcher() gets called.  From the SegReader and its
+sub-components [](cfish:lucy.LexiconReader) and
+[](cfish:lucy.PostingListReader), we acquire a
+[](cfish:lucy.Lexicon), scan through the Lexicon's unique
+terms, and acquire a [](cfish:lucy.PostingList) for each
+term that matches our prefix.
+
+Each of these PostingList objects represents a set of documents which match
+the query.
+
+### PrefixMatcher
+
+The Matcher subclass is the most involved.  
+
+~~~ perl
+package PrefixMatcher;
+use base qw( Lucy::Search::Matcher );
+
+# Inside-out member vars.
+my %doc_ids;
+my %tick;
+
+sub new {
+    my ( $class, %args ) = @_;
+    my $posting_lists = delete $args{posting_lists};
+    my $self          = $class->SUPER::new(%args);
+    
+    # Cheesy but simple way of interleaving PostingList doc sets.
+    my %all_doc_ids;
+    for my $posting_list (@$posting_lists) {
+        while ( my $doc_id = $posting_list->next ) {
+            $all_doc_ids{$doc_id} = undef;
+        }
+    }
+    my @doc_ids = sort { $a <=> $b } keys %all_doc_ids;
+    $doc_ids{$$self} = \@doc_ids;
+    
+    # Track our position within the array of doc ids.
+    $tick{$$self} = -1;
+    
+    return $self;
+}
+
+sub DESTROY {
+    my $self = shift;
+    delete $doc_ids{$$self};
+    delete $tick{$$self};
+    $self->SUPER::DESTROY;
+}
+~~~
+
+The doc ids must be in order, or some will be ignored; hence the `sort`
+above.
+
+In addition to the constructor and destructor, there are three methods that
+must be overridden.
+
+next() advances the Matcher to the next valid matching doc.  
+
+~~~ perl
+sub next {
+    my $self    = shift;
+    my $doc_ids = $doc_ids{$$self};
+    my $tick    = ++$tick{$$self};
+    return 0 if $tick >= scalar @$doc_ids;
+    return $doc_ids->[$tick];
+}
+~~~
+
+get_doc_id() returns the current document id, or 0 if the Matcher is
+exhausted.  ([Document numbers](cfish:DocIDs) start at 1, so 0 is
+a sentinel.)
+
+~~~ perl
+sub get_doc_id {
+    my $self    = shift;
+    my $tick    = $tick{$$self};
+    my $doc_ids = $doc_ids{$$self};
+    return $tick < scalar @$doc_ids ? $doc_ids->[$tick] : 0;
+}
+~~~
+
+score() conveys the relevance score of the current match.  We'll just return a
+fixed score of 1.0:
+
+~~~ perl
+sub score { 1.0 }
+~~~
+
+## Usage 
+
+To get a basic feel for PrefixQuery, insert the FlatQueryParser module
+described in [](cfish:CustomQueryParser) (which supports
+PrefixQuery) into the search.cgi sample app.
+
+~~~ perl
+my $parser = FlatQueryParser->new( schema => $searcher->get_schema );
+my $query  = $parser->parse($q);
+~~~
+
+If you're planning on using PrefixQuery in earnest, though, you may want to
+change up analyzers to avoid stemming, because stemming -- another approach to
+prefix conflation -- is not perfectly compatible with prefix searches.
+
+~~~ perl
+# Polyanalyzer with no SnowballStemmer.
+my $analyzer = Lucy::Analysis::PolyAnalyzer->new(
+    analyzers => [
+        Lucy::Analysis::StandardTokenizer->new,
+        Lucy::Analysis::Normalizer->new,
+    ],
+);
+~~~
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Cookbook/CustomQueryParser.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Cookbook/CustomQueryParser.md b/core/Lucy/Docs/Cookbook/CustomQueryParser.md
new file mode 100644
index 0000000..39b1167
--- /dev/null
+++ b/core/Lucy/Docs/Cookbook/CustomQueryParser.md
@@ -0,0 +1,231 @@
+# Sample subclass of QueryParser.
+
+Implement a custom search query language using a subclass of
+[](cfish:lucy.QueryParser).
+
+## The language
+
+At first, our query language will support only simple term queries and phrases
+delimited by double quotes.  For simplicity's sake, it will not support
+parenthetical groupings, boolean operators, or prepended plus/minus.  The
+results for all subqueries will be unioned together -- i.e. joined using an OR
+-- which is usually the best approach for small-to-medium-sized document
+collections.
+
+Later, we'll add support for trailing wildcards.
+
+## Single-field parser
+
+Our initial parser implentation will generate queries against a single fixed
+field, "content", and it will analyze text using a fixed choice of English
+EasyAnalyzer.  We won't subclass Lucy::Search::QueryParser just yet.
+
+~~~ perl
+package FlatQueryParser;
+use Lucy::Search::TermQuery;
+use Lucy::Search::PhraseQuery;
+use Lucy::Search::ORQuery;
+use Carp;
+
+sub new { 
+    my $analyzer = Lucy::Analysis::EasyAnalyzer->new(
+        language => 'en',
+    );
+    return bless { 
+        field    => 'content',
+        analyzer => $analyzer,
+    }, __PACKAGE__;
+}
+~~~
+
+Some private helper subs for creating TermQuery and PhraseQuery objects will
+help keep the size of our main parse() subroutine down:
+
+~~~ perl
+sub _make_term_query {
+    my ( $self, $term ) = @_;
+    return Lucy::Search::TermQuery->new(
+        field => $self->{field},
+        term  => $term,
+    );
+}
+
+sub _make_phrase_query {
+    my ( $self, $terms ) = @_;
+    return Lucy::Search::PhraseQuery->new(
+        field => $self->{field},
+        terms => $terms,
+    );
+}
+~~~
+
+Our private \_tokenize() method treats double-quote delimited material as a
+single token and splits on whitespace everywhere else.
+
+~~~ perl
+sub _tokenize {
+    my ( $self, $query_string ) = @_;
+    my @tokens;
+    while ( length $query_string ) {
+        if ( $query_string =~ s/^\s+// ) {
+            next;    # skip whitespace
+        }
+        elsif ( $query_string =~ s/^("[^"]*(?:"|$))// ) {
+            push @tokens, $1;    # double-quoted phrase
+        }
+        else {
+            $query_string =~ s/(\S+)//;
+            push @tokens, $1;    # single word
+        }
+    }
+    return \@tokens;
+}
+~~~
+
+The main parsing routine creates an array of tokens by calling \_tokenize(),
+runs the tokens through through the EasyAnalyzer, creates TermQuery or
+PhraseQuery objects according to how many tokens emerge from the
+EasyAnalyzer's split() method, and adds each of the sub-queries to the primary
+ORQuery.
+
+~~~ perl
+sub parse {
+    my ( $self, $query_string ) = @_;
+    my $tokens   = $self->_tokenize($query_string);
+    my $analyzer = $self->{analyzer};
+    my $or_query = Lucy::Search::ORQuery->new;
+
+    for my $token (@$tokens) {
+        if ( $token =~ s/^"// ) {
+            $token =~ s/"$//;
+            my $terms = $analyzer->split($token);
+            my $query = $self->_make_phrase_query($terms);
+            $or_query->add_child($phrase_query);
+        }
+        else {
+            my $terms = $analyzer->split($token);
+            if ( @$terms == 1 ) {
+                my $query = $self->_make_term_query( $terms->[0] );
+                $or_query->add_child($query);
+            }
+            elsif ( @$terms > 1 ) {
+                my $query = $self->_make_phrase_query($terms);
+                $or_query->add_child($query);
+            }
+        }
+    }
+
+    return $or_query;
+}
+~~~
+
+## Multi-field parser
+
+Most often, the end user will want their search query to match not only a
+single 'content' field, but also 'title' and so on.  To make that happen, we
+have to turn queries such as this...
+
+    foo AND NOT bar
+
+... into the logical equivalent of this:
+
+    (title:foo OR content:foo) AND NOT (title:bar OR content:bar)
+
+Rather than continue with our own from-scratch parser class and write the
+routines to accomplish that expansion, we're now going to subclass Lucy::Search::QueryParser
+and take advantage of some of its existing methods.
+
+Our first parser implementation had the "content" field name and the choice of
+English EasyAnalyzer hard-coded for simplicity, but we don't need to do that
+once we subclass Lucy::Search::QueryParser.  QueryParser's constructor --
+which we will inherit, allowing us to eliminate our own constructor --
+requires a Schema which conveys field
+and Analyzer information, so we can just defer to that.
+
+~~~ perl
+package FlatQueryParser;
+use base qw( Lucy::Search::QueryParser );
+use Lucy::Search::TermQuery;
+use Lucy::Search::PhraseQuery;
+use Lucy::Search::ORQuery;
+use PrefixQuery;
+use Carp;
+
+# Inherit new()
+~~~
+
+We're also going to jettison our \_make_term_query() and \_make_phrase_query()
+helper subs and chop our parse() subroutine way down.  Our revised parse()
+routine will generate Lucy::Search::LeafQuery objects instead of TermQueries
+and PhraseQueries:
+
+~~~ perl
+sub parse {
+    my ( $self, $query_string ) = @_;
+    my $tokens = $self->_tokenize($query_string);
+    my $or_query = Lucy::Search::ORQuery->new;
+    for my $token (@$tokens) {
+        my $leaf_query = Lucy::Search::LeafQuery->new( text => $token );
+        $or_query->add_child($leaf_query);
+    }
+    return $self->expand($or_query);
+}
+~~~
+
+The magic happens in QueryParser's expand() method, which walks the ORQuery
+object we supply to it looking for LeafQuery objects, and calls expand_leaf()
+for each one it finds.  expand_leaf() performs field-specific analysis,
+decides whether each query should be a TermQuery or a PhraseQuery, and if
+multiple fields are required, creates an ORQuery which mults out e.g.  `foo`
+into `(title:foo OR content:foo)`.
+
+## Extending the query language
+
+To add support for trailing wildcards to our query language, we need to
+override expand_leaf() to accommodate PrefixQuery, while deferring to the
+parent class implementation on TermQuery and PhraseQuery.
+
+~~~ perl
+sub expand_leaf {
+    my ( $self, $leaf_query ) = @_;
+    my $text = $leaf_query->get_text;
+    if ( $text =~ /\*$/ ) {
+        my $or_query = Lucy::Search::ORQuery->new;
+        for my $field ( @{ $self->get_fields } ) {
+            my $prefix_query = PrefixQuery->new(
+                field        => $field,
+                query_string => $text,
+            );
+            $or_query->add_child($prefix_query);
+        }
+        return $or_query;
+    }
+    else {
+        return $self->SUPER::expand_leaf($leaf_query);
+    }
+}
+~~~
+
+Ordinarily, those asterisks would have been stripped when running tokens
+through the EasyAnalyzer -- query strings containing "foo\*" would produce
+TermQueries for the term "foo".  Our override intercepts tokens with trailing
+asterisks and processes them as PrefixQueries before `SUPER::expand_leaf` can
+discard them, so that a search for "foo\*" can match "food", "foosball", and so
+on.
+
+## Usage
+
+Insert our custom parser into the search.cgi sample app to get a feel for how
+it behaves:
+
+~~~ perl
+my $parser = FlatQueryParser->new( schema => $searcher->get_schema );
+my $query  = $parser->parse( decode( 'UTF-8', $cgi->param('q') || '' ) );
+my $hits   = $searcher->hits(
+    query      => $query,
+    offset     => $offset,
+    num_wanted => $page_size,
+);
+...
+~~~
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Cookbook/FastUpdates.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Cookbook/FastUpdates.md b/core/Lucy/Docs/Cookbook/FastUpdates.md
new file mode 100644
index 0000000..511310a
--- /dev/null
+++ b/core/Lucy/Docs/Cookbook/FastUpdates.md
@@ -0,0 +1,140 @@
+# Near real-time index updates
+
+While index updates are fast on average, worst-case update performance may be
+significantly slower.  To make index updates consistently quick, we must
+manually intervene to control the process of index segment consolidation.
+
+## The problem
+
+Ordinarily, modifying an index is cheap. New data is added to new segments,
+and the time to write a new segment scales more or less linearly with the
+number of documents added during the indexing session.  
+
+Deletions are also cheap most of the time, because we don't remove documents
+immediately but instead mark them as deleted, and adding the deletion mark is
+cheap.
+
+However, as new segments are added and the deletion rate for existing segments
+increases, search-time performance slowly begins to degrade.  At some point,
+it becomes necessary to consolidate existing segments, rewriting their data
+into a new segment.  
+
+If the recycled segments are small, the time it takes to rewrite them may not
+be significant.  Every once in a while, though, a large amount of data must be
+rewritten.
+
+## Procrastinating and playing catch-up
+
+The simplest way to force fast index updates is to avoid rewriting anything.
+
+Indexer relies upon [](cfish:lucy.IndexManager)'s
+recycle() method to tell it which segments should be consolidated.  If we
+subclass IndexManager and override recycle() so that it always returns an
+empty array, we get consistently quick performance:
+
+~~~ perl
+package NoMergeManager;
+use base qw( Lucy::Index::IndexManager );
+sub recycle { [] }
+
+package main;
+my $indexer = Lucy::Index::Indexer->new(
+    index => '/path/to/index',
+    manager => NoMergeManager->new,
+);
+...
+$indexer->commit;
+~~~
+
+However, we can't procrastinate forever.  Eventually, we'll have to run an
+ordinary, uncontrolled indexing session, potentially triggering a large
+rewrite of lots of small and/or degraded segments:
+
+~~~ perl
+my $indexer = Lucy::Index::Indexer->new( 
+    index => '/path/to/index', 
+    # manager => NoMergeManager->new,
+);
+...
+$indexer->commit;
+~~~
+
+## Acceptable worst-case update time, slower degradation
+
+Never merging anything at all in the main indexing process is probably
+overkill.  Small segments are relatively cheap to merge; we just need to guard
+against the big rewrites.  
+
+Setting a ceiling on the number of documents in the segments to be recycled
+allows us to avoid a mass proliferation of tiny, single-document segments,
+while still offering decent worst-case update speed:
+
+~~~ perl
+package LightMergeManager;
+use base qw( Lucy::Index::IndexManager );
+
+sub recycle {
+    my $self = shift;
+    my $seg_readers = $self->SUPER::recycle(@_);
+    @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
+    return $seg_readers;
+}
+~~~
+
+However, we still have to consolidate every once in a while, and while that
+happens content updates will be locked out.
+
+## Background merging
+
+If it's not acceptable to lock out updates while the index consolidation
+process runs, the alternative is to move the consolidation process out of
+band, using Lucy::Index::BackgroundMerger.  
+
+It's never safe to have more than one Indexer attempting to modify the content
+of an index at the same time, but a BackgroundMerger and an Indexer can
+operate simultaneously:
+
+~~~ perl
+# Indexing process.
+use Scalar::Util qw( blessed );
+my $retries = 0;
+while (1) {
+    eval {
+        my $indexer = Lucy::Index::Indexer->new(
+                index => '/path/to/index',
+                manager => LightMergeManager->new,
+            );
+        $indexer->add_doc($doc);
+        $indexer->commit;
+    };
+    last unless $@;
+    if ( blessed($@) and $@->isa("Lucy::Store::LockErr") ) {
+        # Catch LockErr.
+        warn "Couldn't get lock ($retries retries)";
+        $retries++;
+    }
+    else {
+        die "Write failed: $@";
+    }
+}
+
+# Background merge process.
+my $manager = Lucy::Index::IndexManager->new;
+$manager->set_write_lock_timeout(60_000);
+my $bg_merger = Lucy::Index::BackgroundMerger->new(
+    index   => '/path/to/index',
+    manager => $manager,
+);
+$bg_merger->commit;
+~~~
+
+The exception handling code becomes useful once you have more than one index
+modification process happening simultaneously.  By default, Indexer tries
+several times to acquire a write lock over the span of one second, then holds
+it until commit() completes.  BackgroundMerger handles most of its work
+without the write lock, but it does need it briefly once at the beginning and
+once again near the end.  Under normal loads, the internal retry logic will
+resolve conflicts, but if it's not acceptable to miss an insert, you probably
+want to catch LockErr exceptions thrown by Indexer.  In contrast, a LockErr
+from BackgroundMerger probably just needs to be logged.
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/DocIDs.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/DocIDs.md b/core/Lucy/Docs/DocIDs.md
new file mode 100644
index 0000000..af696b2
--- /dev/null
+++ b/core/Lucy/Docs/DocIDs.md
@@ -0,0 +1,28 @@
+# Characteristics of Apache Lucy document ids.
+
+## Document ids are signed 32-bit integers
+
+Document ids in Apache Lucy start at 1.  Because 0 is never a valid doc id, we
+can use it as a sentinel value:
+
+~~~ perl
+while ( my $doc_id = $posting_list->next ) {
+    ...
+}
+~~~
+
+## Document ids are ephemeral
+
+The document ids used by Lucy are associated with a single index
+snapshot.  The moment an index is updated, the mapping of document ids to
+documents is subject to change.
+
+Since IndexReader objects represent a point-in-time view of an index, document
+ids are guaranteed to remain static for the life of the reader.  However,
+because they are not permanent, Lucy document ids cannot be used as
+foreign keys to locate records in external data sources.  If you truly need a
+primary key field, you must define it and populate it yourself.
+
+Furthermore, the order of document ids does not tell you anything about the
+sequence in which documents were added to the index.
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/FileFormat.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/FileFormat.md b/core/Lucy/Docs/FileFormat.md
new file mode 100644
index 0000000..c5f606c
--- /dev/null
+++ b/core/Lucy/Docs/FileFormat.md
@@ -0,0 +1,191 @@
+# Overview of index file format
+
+It is not necessary to understand the current implementation details of the
+index file format in order to use Apache Lucy effectively, but it may be
+helpful if you are interested in tweaking for high performance, exotic usage,
+or debugging and development.  
+
+On a file system, an index is a directory.  The files inside have a
+hierarchical relationship: an index is made up of "segments", each of which is
+an independent inverted index with its own subdirectory; each segment is made
+up of several component parts.
+
+    [index]--|
+             |--snapshot_XXX.json
+             |--schema_XXX.json
+             |--write.lock
+             |
+             |--seg_1--|
+             |         |--segmeta.json
+             |         |--cfmeta.json
+             |         |--cf.dat-------|
+             |                         |--[lexicon]
+             |                         |--[postings]
+             |                         |--[documents]
+             |                         |--[highlight]
+             |                         |--[deletions]
+             |
+             |--seg_2--|
+             |         |--segmeta.json
+             |         |--cfmeta.json
+             |         |--cf.dat-------|
+             |                         |--[lexicon]
+             |                         |--[postings]
+             |                         |--[documents]
+             |                         |--[highlight]
+             |                         |--[deletions]
+             |
+             |--[...]--| 
+
+## Write-once philosophy
+
+All segment directory names consist of the string "seg\_" followed by a number
+in base 36: seg_1, seg_5m, seg_p9s2 and so on, with higher numbers indicating
+more recent segments.  Once a segment is finished and committed, its name is
+never re-used and its files are never modified.
+
+Old segments become obsolete and can be removed when their data has been
+consolidated into new segments during the process of segment merging and
+optimization.  A fully-optimized index has only one segment.
+
+## Top-level entries
+
+There are a handful of "top-level" files and directories which belong to the
+entire index rather than to a particular segment.
+
+### snapshot_XXX.json
+
+A "snapshot" file, e.g. `snapshot_m7p.json`, is list of index files and
+directories.  Because index files, once written, are never modified, the list
+of entries in a snapshot defines a point-in-time view of the data in an index.
+
+Like segment directories, snapshot files also utilize the
+unique-base-36-number naming convention; the higher the number, the more
+recent the file.  The appearance of a new snapshot file within the index
+directory constitutes an index update.  While a new segment is being written
+new files may be added to the index directory, but until a new snapshot file
+gets written, a Searcher opening the index for reading won't know about them.
+
+### schema_XXX.json
+
+The schema file is a Schema object describing the index's format, serialized
+as JSON.  It, too, is versioned, and a given snapshot file will reference one
+and only one schema file.
+
+### locks 
+
+By default, only one indexing process may safely modify the index at any given
+time.  Processes reserve an index by laying claim to the `write.lock` file
+within the `locks/` directory.  A smattering of other lock files may be used
+from time to time, as well.
+
+## A segment's component parts
+
+By default, each segment has up to five logical components: lexicon, postings,
+document storage, highlight data, and deletions.  Binary data from these
+components gets stored in virtual files within the "cf.dat" compound file;
+metadata is stored in a shared "segmeta.json" file.
+
+### segmeta.json
+
+The segmeta.json file is a central repository for segment metadata.  In
+addition to information such as document counts and field numbers, it also
+warehouses arbitrary metadata on behalf of individual index components.
+
+### Lexicon 
+
+Each indexed field gets its own lexicon in each segment.  The exact files
+involved depend on the field's type, but generally speaking there will be two
+parts.  First, there's a primary `lexicon-XXX.dat` file which houses a
+complete term list associating terms with corpus frequency statistics,
+postings file locations, etc.  Second, one or more "lexicon index" files may
+be present which contain periodic samples from the primary lexicon file to
+facilitate fast lookups.
+
+### Postings
+
+"Posting" is a technical term from the field of 
+[information retrieval](cfish:IRTheory), defined as a single
+instance of a one term indexing one document.  If you are looking at the index
+in the back of a book, and you see that "freedom" is referenced on pages 8,
+86, and 240, that would be three postings, which taken together form a
+"posting list".  The same terminology applies to an index in electronic form.
+
+Each segment has one postings file per indexed field.  When a search is
+performed for a single term, first that term is looked up in the lexicon.  If
+the term exists in the segment, the record in the lexicon will contain
+information about which postings file to look at and where to look.
+
+The first thing any posting record tells you is a document id.  By iterating
+over all the postings associated with a term, you can find all the documents
+that match that term, a process which is analogous to looking up page numbers
+in a book's index.  However, each posting record typically contains other
+information in addition to document id, e.g. the positions at which the term
+occurs within the field.
+
+### Documents
+
+The document storage section is a simple database, organized into two files:
+
+* __documents.dat__ - Serialized documents.
+
+* __documents.ix__ - Document storage index, a solid array of 64-bit integers
+  where each integer location corresponds to a document id, and the value at
+  that location points at a file position in the documents.dat file.
+
+### Highlight data 
+
+The files which store data used for excerpting and highlighting are organized
+similarly to the files used to store documents.
+
+* __highlight.dat__ - Chunks of serialized highlight data, one per doc id.
+
+* __highlight.ix__ - Highlight data index -- as with the `documents.ix` file, a
+  solid array of 64-bit file pointers.
+
+### Deletions
+
+When a document is "deleted" from a segment, it is not actually purged right
+away; it is merely marked as "deleted" via a deletions file.  Deletions files
+contains bit vectors with one bit for each document in the segment; if bit
+\#254 is set then document 254 is deleted, and if that document turns up in a
+search it will be masked out.
+
+It is only when a segment's contents are rewritten to a new segment during the
+segment-merging process that deleted documents truly go away.
+
+## Compound Files
+
+If you peer inside an index directory, you won't actually find any files named
+"documents.dat", "highlight.ix", etc. unless there is an indexing process
+underway.  What you will find instead is one "cf.dat" and one "cfmeta.json"
+file per segment.
+
+To minimize the need for file descriptors at search-time, all per-segment
+binary data files are concatenated together in "cf.dat" at the close of each
+indexing session.  Information about where each file begins and ends is stored
+in `cfmeta.json`.  When the segment is opened for reading, a single file
+descriptor per "cf.dat" file can be shared among several readers.
+
+## A Typical Search
+
+Here's a simplified narrative, dramatizing how a search for "freedom" against
+a given segment plays out:
+
+1. The searcher asks the relevant Lexicon Index, "Do you know anything about
+   'freedom'?"  Lexicon Index replies, "Can't say for sure, but if the main
+   Lexicon file does, 'freedom' is probably somewhere around byte 21008".  
+
+2. The main Lexicon tells the searcher "One moment, let me scan our records...
+   Yes, we have 2 documents which contain 'freedom'.  You'll find them in
+   seg_6/postings-4.dat starting at byte 66991."
+
+3. The Postings file says "Yep, we have 'freedom', all right!  Document id 40
+   has 1 'freedom', and document 44 has 8.  If you need to know more, like if any
+   'freedom' is part of the phrase 'freedom of speech', ask me about positions!
+
+4. If the searcher is only looking for 'freedom' in isolation, that's where it
+   stops.  It now knows enough to assign the documents scores against "freedom",
+   with the 8-freedom document likely ranking higher than the single-freedom
+   document.
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/IRTheory.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/IRTheory.md b/core/Lucy/Docs/IRTheory.md
new file mode 100644
index 0000000..a9af4ed
--- /dev/null
+++ b/core/Lucy/Docs/IRTheory.md
@@ -0,0 +1,44 @@
+# Crash course in information retrieval
+
+Just enough Information Retrieval theory to find your way around Apache Lucy.
+
+## Terminology
+
+Lucy uses some terminology from the field of information retrieval which
+may be unfamiliar to many users.  "Document" and "term" mean pretty much what
+you'd expect them to, but others such as "posting" and "inverted index" need a
+formal introduction:
+
+* _document_ - An atomic unit of retrieval.
+* _term_ - An attribute which describes a document.
+* _posting_ - One term indexing one document.
+* _term list_ - The complete list of terms which describe a document.
+* _posting list_ - The complete list of documents which a term indexes.
+* _inverted index_ - A data structure which maps from terms to documents.
+
+Since Lucy is a practical implementation of IR theory, it loads these
+abstract, distilled definitions down with useful traits.  For instance, a
+"posting" in its most rarefied form is simply a term-document pairing; in
+Lucy, the class [](cfish:lucy.MatchPosting) fills this
+role.  However, by associating additional information with a posting like the
+number of times the term occurs in the document, we can turn it into a
+[](cfish:lucy.ScorePosting), making it possible
+to rank documents by relevance rather than just list documents which happen to
+match in no particular order.
+
+## TF/IDF ranking algorithm
+
+Lucy uses a variant of the well-established "Term Frequency / Inverse
+Document Frequency" weighting scheme.  A thorough treatment of TF/IDF is too
+ambitious for our present purposes, but in a nutshell, it means that...
+
+* in a search for `skate park`, documents which score well for the
+  comparatively rare term `skate` will rank higher than documents which score
+  well for the more common term `park`.
+
+* a 10-word text which has one occurrence each of both `skate` and `park` will
+  rank higher than a 1000-word text which also contains one occurrence of each.
+
+A web search for "tf idf" will turn up many excellent explanations of the
+algorithm.
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Tutorial.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Tutorial.md b/core/Lucy/Docs/Tutorial.md
new file mode 100644
index 0000000..57c66b2
--- /dev/null
+++ b/core/Lucy/Docs/Tutorial.md
@@ -0,0 +1,53 @@
+# Step-by-step introduction to Apache Lucy.
+
+Explore Apache Lucy's basic functionality by starting with a minimalist CGI
+search app based on Lucy::Simple and transforming it, step by step,
+into an "advanced search" interface utilizing more flexible core modules like
+[](cfish:lucy.Indexer) and [](cfish:lucy.IndexSearcher).
+
+## Chapters
+
+* [](cfish:SimpleTutorial) - Build a bare-bones search app using
+  Lucy::Simple.
+
+* [](cfish:BeyondSimpleTutorial) - Rebuild the app using core
+  classes like [](cfish:lucy.Indexer) and
+  [](cfish:lucy.IndexSearcher) in place of Lucy::Simple.
+
+* [](cfish:FieldTypeTutorial) - Experiment with different field
+  characteristics using subclasses of [](cfish:lucy.FieldType).
+
+* [](cfish:AnalysisTutorial) - Examine how the choice of
+  [](cfish:lucy.Analyzer) subclass affects search results.
+
+* [](cfish:HighlighterTutorial) - Augment search results with
+  highlighted excerpts.
+
+* [](cfish:QueryObjectsTutorial) - Unlock advanced search features
+  by using Query objects instead of query strings.
+
+## Source materials
+
+The source material used by the tutorial app -- a multi-text-file presentation
+of the United States constitution -- can be found in the `sample` directory
+at the root of the Lucy distribution, along with finished indexing and search
+apps.
+
+~~~ perl
+sample/indexer.pl        # indexing app
+sample/search.cgi        # search app
+sample/us_constitution   # corpus
+~~~
+
+## Conventions
+
+The user is expected to be familiar with OO Perl and basic CGI programming.
+
+The code in this tutorial assumes a Unix-flavored operating system and the
+Apache webserver, but will work with minor modifications on other setups.
+
+## See also
+
+More advanced and esoteric subjects are covered in [](cfish:Cookbook).
+
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Tutorial/AnalysisTutorial.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Tutorial/AnalysisTutorial.md b/core/Lucy/Docs/Tutorial/AnalysisTutorial.md
new file mode 100644
index 0000000..a55dd09
--- /dev/null
+++ b/core/Lucy/Docs/Tutorial/AnalysisTutorial.md
@@ -0,0 +1,85 @@
+# How to choose and use Analyzers.
+
+Try swapping out the EasyAnalyzer in our Schema for a StandardTokenizer:
+
+~~~ perl
+my $tokenizer = Lucy::Analysis::StandardTokenizer->new;
+my $type = Lucy::Plan::FullTextType->new(
+    analyzer => $tokenizer,
+);
+~~~
+
+Search for `senate`, `Senate`, and `Senator` before and after making the
+change and re-indexing.
+
+Under EasyAnalyzer, the results are identical for all three searches, but
+under StandardTokenizer, searches are case-sensitive, and the result sets for
+`Senate` and `Senator` are distinct.
+
+## EasyAnalyzer
+
+What's happening is that EasyAnalyzer is performing more aggressive processing
+than StandardTokenizer.  In addition to tokenizing, it's also converting all
+text to lower case so that searches are case-insensitive, and using a
+"stemming" algorithm to reduce related words to a common stem (`senat`, in
+this case).
+
+EasyAnalyzer is actually multiple Analyzers wrapped up in a single package.
+In this case, it's three-in-one, since specifying a EasyAnalyzer with
+`language => 'en'` is equivalent to this snippet:
+
+~~~ perl
+my $tokenizer    = Lucy::Analysis::StandardTokenizer->new;
+my $normalizer   = Lucy::Analysis::Normalizer->new;
+my $stemmer      = Lucy::Analysis::SnowballStemmer->new( language => 'en' );
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
+    analyzers => [ $tokenizer, $normalizer, $stemmer ],
+);
+~~~
+
+You can add or subtract Analyzers from there if you like.  Try adding a fourth
+Analyzer, a SnowballStopFilter for suppressing "stopwords" like "the", "if",
+and "maybe".
+
+~~~ perl
+my $stopfilter = Lucy::Analysis::SnowballStopFilter->new( 
+    language => 'en',
+);
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
+    analyzers => [ $tokenizer, $normalizer, $stopfilter, $stemmer ],
+);
+~~~
+
+Also, try removing the SnowballStemmer.
+
+~~~ perl
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
+    analyzers => [ $tokenizer, $normalizer ],
+);
+~~~
+
+The original choice of a stock English EasyAnalyzer probably still yields the
+best results for this document collection, but you get the idea: sometimes you
+want a different Analyzer.
+
+## When the best Analyzer is no Analyzer
+
+Sometimes you don't want an Analyzer at all.  That was true for our "url"
+field because we didn't need it to be searchable, but it's also true for
+certain types of searchable fields.  For instance, "category" fields are often
+set up to match exactly or not at all, as are fields like "last_name" (because
+you may not want to conflate results for "Humphrey" and "Humphries").
+
+To specify that there should be no analysis performed at all, use StringType:
+
+~~~ perl
+my $type = Lucy::Plan::StringType->new;
+$schema->spec_field( name => 'category', type => $type );
+~~~
+
+## Highlighting up next
+
+In our next tutorial chapter, [](cfish:HighlighterTutorial),
+we'll add highlighted excerpts from the "content" field to our search results.
+
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Tutorial/BeyondSimpleTutorial.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Tutorial/BeyondSimpleTutorial.md b/core/Lucy/Docs/Tutorial/BeyondSimpleTutorial.md
new file mode 100644
index 0000000..00c8e71
--- /dev/null
+++ b/core/Lucy/Docs/Tutorial/BeyondSimpleTutorial.md
@@ -0,0 +1,125 @@
+# A more flexible app structure.
+
+## Goal
+
+In this tutorial chapter, we'll refactor the apps we built in
+[](cfish:SimpleTutorial) so that they look exactly the same from
+the end user's point of view, but offer the developer greater possibilites for
+expansion.  
+
+To achieve this, we'll ditch Lucy::Simple and replace it with the
+classes that it uses internally:
+
+* [](cfish:lucy.Schema) - Plan out your index.
+* [](cfish:lucy.FullTextType) - Field type for full text search.
+* [](cfish:lucy.EasyAnalyzer) - A one-size-fits-all parser/tokenizer.
+* [](cfish:lucy.Indexer) - Manipulate index content.
+* [](cfish:lucy.IndexSearcher) - Search an index.
+* [](cfish:lucy.Hits) - Iterate over hits returned by a Searcher.
+
+## Adaptations to indexer.pl
+
+After we load our modules...
+
+~~~ perl
+use Lucy::Plan::Schema;
+use Lucy::Plan::FullTextType;
+use Lucy::Analysis::EasyAnalyzer;
+use Lucy::Index::Indexer;
+~~~
+
+... the first item we're going need is a [](cfish:lucy.Schema).
+
+The primary job of a Schema is to specify what fields are available and how
+they're defined.  We'll start off with three fields: title, content and url.
+
+~~~ perl
+# Create Schema.
+my $schema = Lucy::Plan::Schema->new;
+my $easyanalyzer = Lucy::Analysis::EasyAnalyzer->new(
+    language => 'en',
+);
+my $type = Lucy::Plan::FullTextType->new(
+    analyzer => $easyanalyzer,
+);
+$schema->spec_field( name => 'title',   type => $type );
+$schema->spec_field( name => 'content', type => $type );
+$schema->spec_field( name => 'url',     type => $type );
+~~~
+
+All of the fields are spec'd out using the "FullTextType" FieldType,
+indicating that they will be searchable as "full text" -- which means that
+they can be searched for individual words.  The "analyzer", which is unique to
+FullTextType fields, is what breaks up the text into searchable tokens.
+
+Next, we'll swap our Lucy::Simple object out for a Lucy::Index::Indexer.
+The substitution will be straightforward because Simple has merely been
+serving as a thin wrapper around an inner Indexer, and we'll just be peeling
+away the wrapper.
+
+First, replace the constructor:
+
+~~~ perl
+# Create Indexer.
+my $indexer = Lucy::Index::Indexer->new(
+    index    => $path_to_index,
+    schema   => $schema,
+    create   => 1,
+    truncate => 1,
+);
+~~~
+
+Next, have the `$indexer` object `add_doc` where we were having the
+`$lucy` object `add_doc` before:
+
+~~~ perl
+foreach my $filename (@filenames) {
+    my $doc = parse_file($filename);
+    $indexer->add_doc($doc);
+}
+~~~
+
+There's only one extra step required: at the end of the app, you must call
+commit() explicitly to close the indexing session and commit your changes.
+(Lucy::Simple hides this detail, calling commit() implicitly when it needs to).
+
+~~~ perl
+$indexer->commit;
+~~~
+
+## Adaptations to search.cgi
+
+In our search app as in our indexing app, Lucy::Simple has served as a
+thin wrapper -- this time around [](cfish:lucy.IndexSearcher) and
+[](cfish:lucy.Hits).  Swapping out Simple for these two classes is
+also straightforward:
+
+~~~ perl
+use Lucy::Search::IndexSearcher;
+
+my $searcher = Lucy::Search::IndexSearcher->new( 
+    index => $path_to_index,
+);
+my $hits = $searcher->hits(    # returns a Hits object, not a hit count
+    query      => $q,
+    offset     => $offset,
+    num_wanted => $page_size,
+);
+my $hit_count = $hits->total_hits;  # get the hit count here
+
+...
+
+while ( my $hit = $hits->next ) {
+    ...
+}
+~~~
+
+## Hooray!
+
+Congratulations!  Your apps do the same thing as before... but now they'll be
+easier to customize.  
+
+In our next chapter, ()[cfish:FieldTypeTutorial), we'll explore
+how to assign different behaviors to different fields.
+
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Tutorial/FieldTypeTutorial.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Tutorial/FieldTypeTutorial.md b/core/Lucy/Docs/Tutorial/FieldTypeTutorial.md
new file mode 100644
index 0000000..fe6885a
--- /dev/null
+++ b/core/Lucy/Docs/Tutorial/FieldTypeTutorial.md
@@ -0,0 +1,60 @@
+# Specify per-field properties and behaviors.
+
+The Schema we used in the last chapter specifies three fields: 
+
+~~~ perl
+my $type = Lucy::Plan::FullTextType->new(
+    analyzer => $polyanalyzer,
+);
+$schema->spec_field( name => 'title',   type => $type );
+$schema->spec_field( name => 'content', type => $type );
+$schema->spec_field( name => 'url',     type => $type );
+~~~
+
+Since they are all defined as "full text" fields, they are all searchable --
+including the `url` field, a dubious choice.  Some URLs contain meaningful
+information, but these don't, really:
+
+    http://example.com/us_constitution/amend1.txt
+
+We may as well not bother indexing the URL content.  To achieve that we need
+to assign the `url` field to a different FieldType.  
+
+## StringType
+
+Instead of FullTextType, we'll use a
+[](cfish:lucy.StringType), which doesn't use an
+Analyzer to break up text into individual fields.  Furthermore, we'll mark
+this StringType as unindexed, so that its content won't be searchable at all.
+
+~~~ perl
+my $url_type = Lucy::Plan::StringType->new( indexed => 0 );
+$schema->spec_field( name => 'url', type => $url_type );
+~~~
+
+To observe the change in behavior, try searching for `us_constitution` both
+before and after changing the Schema and re-indexing.
+
+## Toggling 'stored'
+
+For a taste of other FieldType possibilities, try turning off `stored` for
+one or more fields.
+
+~~~ perl
+my $content_type = Lucy::Plan::FullTextType->new(
+    analyzer => $polyanalyzer,
+    stored   => 0,
+);
+~~~
+
+Turning off `stored` for either `title` or `url` mangles our results page,
+but since we're not displaying `content`, turning it off for `content` has
+no effect -- except on index size.
+
+## Analyzers up next
+
+Analyzers play a crucial role in the behavior of FullTextType fields.  In our
+next tutorial chapter, [](cfish:AnalysisTutorial), we'll see how
+changing up the Analyzer changes search results.
+
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Tutorial/HighlighterTutorial.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Tutorial/HighlighterTutorial.md b/core/Lucy/Docs/Tutorial/HighlighterTutorial.md
new file mode 100644
index 0000000..857ee01
--- /dev/null
+++ b/core/Lucy/Docs/Tutorial/HighlighterTutorial.md
@@ -0,0 +1,62 @@
+# Augment search results with highlighted excerpts.
+
+Adding relevant excerpts with highlighted search terms to your search results
+display makes it much easier for end users to scan the page and assess which
+hits look promising, dramatically improving their search experience.
+
+## Adaptations to indexer.pl
+
+[](cfish:lucy.Highlighter) uses information generated at index
+time.  To save resources, highlighting is disabled by default and must be
+turned on for individual fields.
+
+~~~ perl
+my $highlightable = Lucy::Plan::FullTextType->new(
+    analyzer      => $polyanalyzer,
+    highlightable => 1,
+);
+$schema->spec_field( name => 'content', type => $highlightable );
+~~~
+
+## Adaptations to search.cgi
+
+To add highlighting and excerpting to the search.cgi sample app, create a
+`$highlighter` object outside the hits iterating loop...
+
+~~~ perl
+my $highlighter = Lucy::Highlight::Highlighter->new(
+    searcher => $searcher,
+    query    => $q,
+    field    => 'content'
+);
+~~~
+
+... then modify the loop and the per-hit display to generate and include the
+excerpt.
+
+~~~ perl
+# Create result list.
+my $report = '';
+while ( my $hit = $hits->next ) {
+    my $score   = sprintf( "%0.3f", $hit->get_score );
+    my $excerpt = $highlighter->create_excerpt($hit);
+    $report .= qq|
+        <p>
+          <a href="$hit->{url}"><strong>$hit->{title}</strong></a>
+          <em>$score</em>
+          <br />
+          $excerpt
+          <br />
+          <span class="excerptURL">$hit->{url}</span>
+        </p>
+    |;
+}
+~~~
+
+## Next chapter: Query objects
+
+Our next tutorial chapter, [](cfish:QueryObjectsTutorial),
+illustrates how to build an "advanced search" interface using
+[](cfish:lucy.Query) objects instead of query strings.
+
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Tutorial/QueryObjectsTutorial.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Tutorial/QueryObjectsTutorial.md b/core/Lucy/Docs/Tutorial/QueryObjectsTutorial.md
new file mode 100644
index 0000000..53d4cea
--- /dev/null
+++ b/core/Lucy/Docs/Tutorial/QueryObjectsTutorial.md
@@ -0,0 +1,185 @@
+# Use Query objects instead of query strings.
+
+Until now, our search app has had only a single search box.  In this tutorial
+chapter, we'll move towards an "advanced search" interface, by adding a
+"category" drop-down menu.  Three new classes will be required:
+
+* [](cfish:lucy.QueryParser) - Turn a query string into a
+  [](cfish:lucy.Query) object.
+
+* [](cfish:lucy.TermQuery) - Query for a specific term within
+  a specific field.
+
+* [](cfish:lucy.ANDQuery) - "AND" together multiple Query
+objects to produce an intersected result set.
+
+## Adaptations to indexer.pl
+
+Our new "category" field will be a StringType field rather than a FullTextType
+field, because we will only be looking for exact matches.  It needs to be
+indexed, but since we won't display its value, it doesn't need to be stored.
+
+~~~ perl
+my $cat_type = Lucy::Plan::StringType->new( stored => 0 );
+$schema->spec_field( name => 'category', type => $cat_type );
+~~~
+
+There will be three possible values: "article", "amendment", and "preamble",
+which we'll hack out of the source file's name during our `parse_file`
+subroutine:
+
+~~~ perl
+my $category
+    = $filename =~ /art/      ? 'article'
+    : $filename =~ /amend/    ? 'amendment'
+    : $filename =~ /preamble/ ? 'preamble'
+    :                           die "Can't derive category for $filename";
+return {
+    title    => $title,
+    content  => $bodytext,
+    url      => "/us_constitution/$filename",
+    category => $category,
+};
+~~~
+
+## Adaptations to search.cgi
+
+The "category" constraint will be added to our search interface using an HTML
+"select" element (this routine will need to be integrated into the HTML
+generation section of search.cgi):
+
+~~~ perl
+# Build up the HTML "select" object for the "category" field.
+sub generate_category_select {
+    my $cat = shift;
+    my $select = qq|
+      <select name="category">
+        <option value="">All Sections</option>
+        <option value="article">Articles</option>
+        <option value="amendment">Amendments</option>
+      </select>|;
+    if ($cat) {
+        $select =~ s/"$cat"/"$cat" selected/;
+    }
+    return $select;
+}
+~~~
+
+We'll start off by loading our new modules and extracting our new CGI
+parameter.
+
+~~~ perl
+use Lucy::Search::QueryParser;
+use Lucy::Search::TermQuery;
+use Lucy::Search::ANDQuery;
+
+... 
+
+my $category = decode( "UTF-8", $cgi->param('category') || '' );
+~~~
+
+QueryParser's constructor requires a "schema" argument.  We can get that from
+our IndexSearcher:
+
+~~~ perl
+# Create an IndexSearcher and a QueryParser.
+my $searcher = Lucy::Search::IndexSearcher->new( 
+    index => $path_to_index, 
+);
+my $qparser  = Lucy::Search::QueryParser->new( 
+    schema => $searcher->get_schema,
+);
+~~~
+
+Previously, we have been handing raw query strings to IndexSearcher.  Behind
+the scenes, IndexSearcher has been using a QueryParser to turn those query
+strings into Query objects.  Now, we will bring QueryParser into the
+foreground and parse the strings explicitly.
+
+~~~ perl
+my $query = $qparser->parse($q);
+~~~
+
+If the user has specified a category, we'll use an ANDQuery to join our parsed
+query together with a TermQuery representing the category.
+
+~~~ perl
+if ($category) {
+    my $category_query = Lucy::Search::TermQuery->new(
+        field => 'category', 
+        term  => $category,
+    );
+    $query = Lucy::Search::ANDQuery->new(
+        children => [ $query, $category_query ]
+    );
+}
+~~~
+
+Now when we execute the query...
+
+~~~ perl
+# Execute the Query and get a Hits object.
+my $hits = $searcher->hits(
+    query      => $query,
+    offset     => $offset,
+    num_wanted => $page_size,
+);
+~~~
+
+... we'll get a result set which is the intersection of the parsed query and
+the category query.
+
+## Using TermQuery with full text fields
+
+When querying full text fields, the easiest way is to create query objects
+using QueryParser. But sometimes you want to create TermQuery for a single
+term in a FullTextType field directly. In this case, we have to run the
+search term through the field's analyzer to make sure it gets normalized in
+the same way as the field's content.
+
+~~~ perl
+sub make_term_query {
+    my ($field, $term) = @_;
+
+    my $token;
+    my $type = $schema->fetch_type($field);
+
+    if ( $type->isa('Lucy::Plan::FullTextType') ) {
+        # Run the term through the full text analysis chain.
+        my $analyzer = $type->get_analyzer;
+        my $tokens   = $analyzer->split($term);
+
+        if ( @$tokens != 1 ) {
+            # If the term expands to more than one token, or no
+            # tokens at all, it will never match a token in the
+            # full text field.
+            return Lucy::Search::NoMatchQuery->new;
+        }
+
+        $token = $tokens->[0];
+    }
+    else {
+        # Exact match for other types.
+        $token = $term;
+    }
+
+    return Lucy::Search::TermQuery->new(
+        field => $field,
+        term  => $token,
+    );
+}
+~~~
+
+## Congratulations!
+
+You've made it to the end of the tutorial.
+
+## See Also
+
+For additional thematic documentation, see the Apache Lucy
+[](cfish:Cookbook).
+
+ANDQuery has a companion class, [](cfish:lucy.ORQuery), and a
+close relative, [](cfish:lucy.RequiredOptionalQuery).
+
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/core/Lucy/Docs/Tutorial/SimpleTutorial.md
----------------------------------------------------------------------
diff --git a/core/Lucy/Docs/Tutorial/SimpleTutorial.md b/core/Lucy/Docs/Tutorial/SimpleTutorial.md
new file mode 100644
index 0000000..83883e7
--- /dev/null
+++ b/core/Lucy/Docs/Tutorial/SimpleTutorial.md
@@ -0,0 +1,298 @@
+# Bare-bones search app.
+
+## Setup
+
+Copy the text presentation of the US Constitution from the `sample` directory
+of the Apache Lucy distribution to the base level of your web server's
+`htdocs` directory.
+
+    $ cp -R sample/us_constitution /usr/local/apache2/htdocs/
+
+## Indexing: indexer.pl
+
+Our first task will be to create an application called `indexer.pl` which
+builds a searchable "inverted index" from a collection of documents.  
+
+After we specify some configuration variables and load all necessary
+modules...
+
+~~~ perl
+#!/usr/local/bin/perl
+use strict;
+use warnings;
+
+# (Change configuration variables as needed.)
+my $path_to_index = '/path/to/index';
+my $uscon_source  = '/usr/local/apache2/htdocs/us_constitution';
+
+use Lucy::Simple;
+use File::Spec::Functions qw( catfile );
+~~~
+
+... we'll start by creating a Lucy::Simple object, telling it where we'd
+like the index to be located and the language of the source material.
+
+~~~ perl
+my $lucy = Lucy::Simple->new(
+    path     => $path_to_index,
+    language => 'en',
+);
+~~~
+
+Next, we'll add a subroutine which parses our sample documents.
+
+~~~ perl
+# Parse a file from our US Constitution collection and return a hashref with
+# the fields title, body, and url.
+sub parse_file {
+    my $filename = shift;
+    my $filepath = catfile( $uscon_source, $filename );
+    open( my $fh, '<', $filepath ) or die "Can't open '$filepath': $!";
+    my $text = do { local $/; <$fh> };    # slurp file content
+    $text =~ /\A(.+?)^\s+(.*)/ms
+        or die "Can't extract title/bodytext from '$filepath'";
+    my $title    = $1;
+    my $bodytext = $2;
+    return {
+        title    => $title,
+        content  => $bodytext,
+        url      => "/us_constitution/$filename",
+    };
+}
+~~~
+
+Add some elementary directory reading code...
+
+~~~ perl
+# Collect names of source files.
+opendir( my $dh, $uscon_source )
+    or die "Couldn't opendir '$uscon_source': $!";
+my @filenames = grep { $_ =~ /\.txt/ } readdir $dh;
+~~~
+
+... and now we're ready for the meat of indexer.pl -- which occupies exactly
+one line of code.
+
+~~~ perl
+foreach my $filename (@filenames) {
+    my $doc = parse_file($filename);
+    $lucy->add_doc($doc);  # ta-da!
+}
+~~~
+
+## Search: search.cgi
+
+As with our indexing app, the bulk of the code in our search script won't be
+Lucy-specific.  
+
+The beginning is dedicated to CGI processing and configuration.
+
+~~~ perl
+#!/usr/local/bin/perl -T
+use strict;
+use warnings;
+
+# (Change configuration variables as needed.)
+my $path_to_index = '/path/to/index';
+
+use CGI;
+use List::Util qw( max min );
+use POSIX qw( ceil );
+use Encode qw( decode );
+use Lucy::Simple;
+
+my $cgi       = CGI->new;
+my $q         = decode( "UTF-8", $cgi->param('q') || '' );
+my $offset    = decode( "UTF-8", $cgi->param('offset') || 0 );
+my $page_size = 10;
+~~~
+
+Once that's out of the way, we create our Lucy::Simple object and feed
+it a query string.
+
+~~~ perl
+my $lucy = Lucy::Simple->new(
+    path     => $path_to_index,
+    language => 'en',
+);
+my $hit_count = $lucy->search(
+    query      => $q,
+    offset     => $offset,
+    num_wanted => $page_size,
+);
+~~~
+
+The value returned by search() is the total number of documents in the
+collection which matched the query.  We'll show this hit count to the user,
+and also use it in conjunction with the parameters `offset` and `num_wanted`
+to break up results into "pages" of manageable size.
+
+Calling search() on our Simple object turns it into an iterator. Invoking
+next() now returns hits one at a time as [](cfish:lucy.HitDoc)
+objects, starting with the most relevant.
+
+~~~ perl
+# Create result list.
+my $report = '';
+while ( my $hit = $lucy->next ) {
+    my $score = sprintf( "%0.3f", $hit->get_score );
+    $report .= qq|
+        <p>
+          <a href="$hit->{url}"><strong>$hit->{title}</strong></a>
+          <em>$score</em>
+          <br>
+          <span class="excerptURL">$hit->{url}</span>
+        </p>
+        |;
+}
+~~~
+
+The rest of the script is just text wrangling. 
+
+~~~ perl5
+#---------------------------------------------------------------#
+# No tutorial material below this point - just html generation. #
+#---------------------------------------------------------------#
+
+# Generate paging links and hit count, print and exit.
+my $paging_links = generate_paging_info( $q, $hit_count );
+blast_out_content( $q, $report, $paging_links );
+
+# Create html fragment with links for paging through results n-at-a-time.
+sub generate_paging_info {
+    my ( $query_string, $total_hits ) = @_;
+    my $escaped_q = CGI::escapeHTML($query_string);
+    my $paging_info;
+    if ( !length $query_string ) {
+        # No query?  No display.
+        $paging_info = '';
+    }
+    elsif ( $total_hits == 0 ) {
+        # Alert the user that their search failed.
+        $paging_info
+            = qq|<p>No matches for <strong>$escaped_q</strong></p>|;
+    }
+    else {
+        # Calculate the nums for the first and last hit to display.
+        my $last_result = min( ( $offset + $page_size ), $total_hits );
+        my $first_result = min( ( $offset + 1 ), $last_result );
+
+        # Display the result nums, start paging info.
+        $paging_info = qq|
+            <p>
+                Results <strong>$first_result-$last_result</strong> 
+                of <strong>$total_hits</strong> 
+                for <strong>$escaped_q</strong>.
+            </p>
+            <p>
+                Results Page:
+            |;
+
+        # Calculate first and last hits pages to display / link to.
+        my $current_page = int( $first_result / $page_size ) + 1;
+        my $last_page    = ceil( $total_hits / $page_size );
+        my $first_page   = max( 1, ( $current_page - 9 ) );
+        $last_page = min( $last_page, ( $current_page + 10 ) );
+
+        # Create a url for use in paging links.
+        my $href = $cgi->url( -relative => 1 );
+        $href .= "?q=" . CGI::escape($query_string);
+        $href .= ";offset=" . CGI::escape($offset);
+
+        # Generate the "Prev" link.
+        if ( $current_page > 1 ) {
+            my $new_offset = ( $current_page - 2 ) * $page_size;
+            $href =~ s/(?<=offset=)\d+/$new_offset/;
+            $paging_info .= qq|<a href="$href">&lt;= Prev</a>\n|;
+        }
+
+        # Generate paging links.
+        for my $page_num ( $first_page .. $last_page ) {
+            if ( $page_num == $current_page ) {
+                $paging_info .= qq|$page_num \n|;
+            }
+            else {
+                my $new_offset = ( $page_num - 1 ) * $page_size;
+                $href =~ s/(?<=offset=)\d+/$new_offset/;
+                $paging_info .= qq|<a href="$href">$page_num</a>\n|;
+            }
+        }
+
+        # Generate the "Next" link.
+        if ( $current_page != $last_page ) {
+            my $new_offset = $current_page * $page_size;
+            $href =~ s/(?<=offset=)\d+/$new_offset/;
+            $paging_info .= qq|<a href="$href">Next =&gt;</a>\n|;
+        }
+
+        # Close tag.
+        $paging_info .= "</p>\n";
+    }
+
+    return $paging_info;
+}
+
+# Print content to output.
+sub blast_out_content {
+    my ( $query_string, $hit_list, $paging_info ) = @_;
+    my $escaped_q = CGI::escapeHTML($query_string);
+    binmode( STDOUT, ":encoding(UTF-8)" );
+    print qq|Content-type: text/html; charset=UTF-8\n\n|;
+    print qq|
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
+    "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+  <meta http-equiv="Content-type" 
+    content="text/html;charset=UTF-8">
+  <link rel="stylesheet" type="text/css" 
+    href="/us_constitution/uscon.css">
+  <title>Lucy: $escaped_q</title>
+</head>
+
+<body>
+
+  <div id="navigation">
+    <form id="usconSearch" action="">
+      <strong>
+        Search the 
+        <a href="/us_constitution/index.html">US Constitution</a>:
+      </strong>
+      <input type="text" name="q" id="q" value="$escaped_q">
+      <input type="submit" value="=&gt;">
+    </form>
+  </div><!--navigation-->
+
+  <div id="bodytext">
+
+  $hit_list
+
+  $paging_info
+
+    <p style="font-size: smaller; color: #666">
+      <em>
+        Powered by <a href="http://lucy.apache.org/"
+        >Apache Lucy<small><sup>TM</sup></small></a>
+      </em>
+    </p>
+  </div><!--bodytext-->
+
+</body>
+
+</html>
+|;
+}
+~~~
+
+## OK... now what?
+
+Lucy::Simple is perfectly adequate for some tasks, but it's not very flexible.
+Many people find that it doesn't do at least one or two things they can't live
+without.
+
+In our next tutorial chapter,
+[](cfish:BeyondSimpleTutorial), we'll rewrite our
+indexing and search scripts using the classes that Lucy::Simple hides
+from view, opening up the possibilities for expansion; then, we'll spend the
+rest of the tutorial chapters exploring these possibilities.
+

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Cookbook.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Cookbook.pod b/perl/lib/Lucy/Docs/Cookbook.pod
deleted file mode 100644
index 6726db9..0000000
--- a/perl/lib/Lucy/Docs/Cookbook.pod
+++ /dev/null
@@ -1,61 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Cookbook - Apache Lucy recipes.
-
-=head1 DESCRIPTION
-
-The Cookbook provides thematic documentation covering some of Apache Lucy's
-more sophisticated features.  For a step-by-step introduction to Lucy,
-see L<Lucy::Docs::Tutorial>.
-
-=head2 Chapters
-
-=over
-
-=item *
-
-L<Lucy::Docs::Cookbook::FastUpdates> - While index updates are fast on
-average, worst-case update performance may be significantly slower. To make
-index updates consistently quick, we must manually intervene to control the
-process of index segment consolidation.
-
-=item *
-
-L<Lucy::Docs::Cookbook::CustomQuery> - Explore Lucy's support for
-custom query types by creating a "PrefixQuery" class to handle trailing
-wildcards.
-
-=item *
-
-L<Lucy::Docs::Cookbook::CustomQueryParser> - Define your own custom
-search query syntax using Lucy::Search::QueryParser and
-L<Parse::RecDescent>.
-
-=back
-
-=head2 Materials
-
-Some of the recipes in the Cookbook reference the completed
-L<Tutorial|Lucy::Docs::Tutorial> application.  These materials can be
-found in the C<sample> directory at the root of the Lucy distribution:
-
-    sample/indexer.pl        # indexing app
-    sample/search.cgi        # search app
-    sample/us_constitution   # corpus
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Cookbook/CustomQuery.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Cookbook/CustomQuery.pod b/perl/lib/Lucy/Docs/Cookbook/CustomQuery.pod
deleted file mode 100644
index 2c78bf1..0000000
--- a/perl/lib/Lucy/Docs/Cookbook/CustomQuery.pod
+++ /dev/null
@@ -1,320 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Cookbook::CustomQuery - Sample subclass of Query.
-
-=head1 ABSTRACT
-
-Explore Apache Lucy's support for custom query types by creating a
-"PrefixQuery" class to handle trailing wildcards.
-
-    my $prefix_query = PrefixQuery->new(
-        field        => 'content',
-        query_string => 'foo*',
-    );
-    my $hits = $searcher->hits( query => $prefix_query );
-    ...
-
-=head1 Query, Compiler, and Matcher 
-
-To add support for a new query type, we need three classes: a Query, a
-Compiler, and a Matcher.  
-
-=over
-
-=item *
-
-PrefixQuery - a subclass of L<Lucy::Search::Query>, and the only class
-that client code will deal with directly.
-
-=item *
-
-PrefixCompiler - a subclass of L<Lucy::Search::Compiler>, whose primary 
-role is to compile a PrefixQuery to a PrefixMatcher.
-
-=item *
-
-PrefixMatcher - a subclass of L<Lucy::Search::Matcher>, which does the
-heavy lifting: it applies the query to individual documents and assigns a
-score to each match.
-
-=back
-
-The PrefixQuery class on its own isn't enough because a Query object's role is
-limited to expressing an abstract specification for the search.  A Query is
-basically nothing but metadata; execution is left to the Query's companion
-Compiler and Matcher.
-
-Here's a simplified sketch illustrating how a Searcher's hits() method ties
-together the three classes.
-
-    sub hits {
-        my ( $self, $query ) = @_;
-        my $compiler = $query->make_compiler(
-            searcher => $self,
-            boost    => $query->get_boost,
-        );
-        my $matcher = $compiler->make_matcher(
-            reader     => $self->get_reader,
-            need_score => 1,
-        );
-        my @hits = $matcher->capture_hits;
-        return \@hits;
-    }
-
-=head2 PrefixQuery
-
-Our PrefixQuery class will have two attributes: a query string and a field
-name.
-
-    package PrefixQuery;
-    use base qw( Lucy::Search::Query );
-    use Carp;
-    use Scalar::Util qw( blessed );
-    
-    # Inside-out member vars and hand-rolled accessors.
-    my %query_string;
-    my %field;
-    sub get_query_string { my $self = shift; return $query_string{$$self} }
-    sub get_field        { my $self = shift; return $field{$$self} }
-
-PrefixQuery's constructor collects and validates the attributes.
-
-    sub new {
-        my ( $class, %args ) = @_;
-        my $query_string = delete $args{query_string};
-        my $field        = delete $args{field};
-        my $self         = $class->SUPER::new(%args);
-        confess("'query_string' param is required")
-            unless defined $query_string;
-        confess("Invalid query_string: '$query_string'")
-            unless $query_string =~ /\*\s*$/;
-        confess("'field' param is required")
-            unless defined $field;
-        $query_string{$$self} = $query_string;
-        $field{$$self}        = $field;
-        return $self;
-    }
-
-Since this is an inside-out class, we'll need a destructor:
-
-    sub DESTROY {
-        my $self = shift;
-        delete $query_string{$$self};
-        delete $field{$$self};
-        $self->SUPER::DESTROY;
-    }
-
-The equals() method determines whether two Queries are logically equivalent:
-
-    sub equals {
-        my ( $self, $other ) = @_;
-        return 0 unless blessed($other);
-        return 0 unless $other->isa("PrefixQuery");
-        return 0 unless $field{$$self} eq $field{$$other};
-        return 0 unless $query_string{$$self} eq $query_string{$$other};
-        return 1;
-    }
-
-The last thing we'll need is a make_compiler() factory method which kicks out
-a subclass of L<Compiler|Lucy::Search::Compiler>.
-
-    sub make_compiler {
-        my ( $self, %args ) = @_;
-        my $subordinate = delete $args{subordinate};
-        my $compiler = PrefixCompiler->new( %args, parent => $self );
-        $compiler->normalize unless $subordinate;
-        return $compiler;
-    }
-
-=head2 PrefixCompiler
-
-PrefixQuery's make_compiler() method will be called internally at search-time
-by objects which subclass L<Lucy::Search::Searcher> -- such as
-L<IndexSearchers|Lucy::Search::IndexSearcher>.
-
-A Searcher is associated with a particular collection of documents.   These
-documents may all reside in one index, as with IndexSearcher, or they may be
-spread out across multiple indexes on one or more machines, as with
-L<LucyX::Remote::ClusterSearcher>.  
-
-Searcher objects have access to certain statistical information about the
-collections they represent; for instance, a Searcher can tell you how many
-documents are in the collection...
-
-    my $maximum_number_of_docs_in_collection = $searcher->doc_max;
-
-... or how many documents a specific term appears in:
-
-    my $term_appears_in_this_many_docs = $searcher->doc_freq(
-        field => 'content',
-        term  => 'foo',
-    );
-
-Such information can be used by sophisticated Compiler implementations to
-assign more or less heft to individual queries or sub-queries.  However, we're
-not going to bother with weighting for this demo; we'll just assign a fixed
-score of 1.0 to each matching document.
-
-We don't need to write a constructor, as it will suffice to inherit new() from
-Lucy::Search::Compiler.  The only method we need to implement for
-PrefixCompiler is make_matcher().
-
-    package PrefixCompiler;
-    use base qw( Lucy::Search::Compiler );
-
-    sub make_matcher {
-        my ( $self, %args ) = @_;
-        my $seg_reader = $args{reader};
-
-        # Retrieve low-level components LexiconReader and PostingListReader.
-        my $lex_reader
-            = $seg_reader->obtain("Lucy::Index::LexiconReader");
-        my $plist_reader
-            = $seg_reader->obtain("Lucy::Index::PostingListReader");
-        
-        # Acquire a Lexicon and seek it to our query string.
-        my $substring = $self->get_parent->get_query_string;
-        $substring =~ s/\*.\s*$//;
-        my $field = $self->get_parent->get_field;
-        my $lexicon = $lex_reader->lexicon( field => $field );
-        return unless $lexicon;
-        $lexicon->seek($substring);
-        
-        # Accumulate PostingLists for each matching term.
-        my @posting_lists;
-        while ( defined( my $term = $lexicon->get_term ) ) {
-            last unless $term =~ /^\Q$substring/;
-            my $posting_list = $plist_reader->posting_list(
-                field => $field,
-                term  => $term,
-            );
-            if ($posting_list) {
-                push @posting_lists, $posting_list;
-            }
-            last unless $lexicon->next;
-        }
-        return unless @posting_lists;
-        
-        return PrefixMatcher->new( posting_lists => \@posting_lists );
-    }
-
-PrefixCompiler gets access to a L<SegReader|Lucy::Index::SegReader>
-object when make_matcher() gets called.  From the SegReader and its
-sub-components L<LexiconReader|Lucy::Index::LexiconReader> and
-L<PostingListReader|Lucy::Index::PostingListReader>, we acquire a
-L<Lexicon|Lucy::Index::Lexicon>, scan through the Lexicon's unique
-terms, and acquire a L<PostingList|Lucy::Index::PostingList> for each
-term that matches our prefix.
-
-Each of these PostingList objects represents a set of documents which match
-the query.
-
-=head2 PrefixMatcher
-
-The Matcher subclass is the most involved.  
-
-    package PrefixMatcher;
-    use base qw( Lucy::Search::Matcher );
-    
-    # Inside-out member vars.
-    my %doc_ids;
-    my %tick;
-    
-    sub new {
-        my ( $class, %args ) = @_;
-        my $posting_lists = delete $args{posting_lists};
-        my $self          = $class->SUPER::new(%args);
-        
-        # Cheesy but simple way of interleaving PostingList doc sets.
-        my %all_doc_ids;
-        for my $posting_list (@$posting_lists) {
-            while ( my $doc_id = $posting_list->next ) {
-                $all_doc_ids{$doc_id} = undef;
-            }
-        }
-        my @doc_ids = sort { $a <=> $b } keys %all_doc_ids;
-        $doc_ids{$$self} = \@doc_ids;
-        
-        # Track our position within the array of doc ids.
-        $tick{$$self} = -1;
-        
-        return $self;
-    }
-    
-    sub DESTROY {
-        my $self = shift;
-        delete $doc_ids{$$self};
-        delete $tick{$$self};
-        $self->SUPER::DESTROY;
-    }
-
-The doc ids must be in order, or some will be ignored; hence the C<sort>
-above.
-
-In addition to the constructor and destructor, there are three methods that
-must be overridden.
-
-next() advances the Matcher to the next valid matching doc.  
-
-    sub next {
-        my $self    = shift;
-        my $doc_ids = $doc_ids{$$self};
-        my $tick    = ++$tick{$$self};
-        return 0 if $tick >= scalar @$doc_ids;
-        return $doc_ids->[$tick];
-    }
-
-get_doc_id() returns the current document id, or 0 if the Matcher is
-exhausted.  (L<Document numbers|Lucy::Docs::DocIDs> start at 1, so 0 is
-a sentinel.)
-
-    sub get_doc_id {
-        my $self    = shift;
-        my $tick    = $tick{$$self};
-        my $doc_ids = $doc_ids{$$self};
-        return $tick < scalar @$doc_ids ? $doc_ids->[$tick] : 0;
-    }
-
-score() conveys the relevance score of the current match.  We'll just return a
-fixed score of 1.0:
-
-    sub score { 1.0 }
-
-=head1 Usage 
-
-To get a basic feel for PrefixQuery, insert the FlatQueryParser module
-described in L<Lucy::Docs::Cookbook::CustomQueryParser> (which supports
-PrefixQuery) into the search.cgi sample app.
-
-    my $parser = FlatQueryParser->new( schema => $searcher->get_schema );
-    my $query  = $parser->parse($q);
-
-If you're planning on using PrefixQuery in earnest, though, you may want to
-change up analyzers to avoid stemming, because stemming -- another approach to
-prefix conflation -- is not perfectly compatible with prefix searches.
-
-    # Polyanalyzer with no SnowballStemmer.
-    my $analyzer = Lucy::Analysis::PolyAnalyzer->new(
-        analyzers => [
-            Lucy::Analysis::StandardTokenizer->new,
-            Lucy::Analysis::Normalizer->new,
-        ],
-    );
-
-=cut
-

[5/5] lucy git commit: Merge branch 'standalone_docs'

Posted by nw...@apache.org.

Merge branch 'standalone_docs'


Project: http://git-wip-us.apache.org/repos/asf/lucy/repo
Commit: http://git-wip-us.apache.org/repos/asf/lucy/commit/a57374a2
Tree: http://git-wip-us.apache.org/repos/asf/lucy/tree/a57374a2
Diff: http://git-wip-us.apache.org/repos/asf/lucy/diff/a57374a2

Branch: refs/heads/master
Commit: a57374a2bd0e3fd15c8a8858b30ea57b02a3645e
Parents: a1d2e1c 6966293
Author: Nick Wellnhofer <we...@aevum.de>
Authored: Sat Jul 11 15:03:22 2015 +0200
Committer: Nick Wellnhofer <we...@aevum.de>
Committed: Sat Jul 11 15:03:22 2015 +0200

----------------------------------------------------------------------
 core/Lucy/Docs/Cookbook.md                      |  33 ++
 core/Lucy/Docs/Cookbook/CustomQuery.md          | 321 +++++++++++++++++++
 core/Lucy/Docs/Cookbook/CustomQueryParser.md    | 231 +++++++++++++
 core/Lucy/Docs/Cookbook/FastUpdates.md          | 140 ++++++++
 core/Lucy/Docs/DevGuide.cfh                     |  59 ----
 core/Lucy/Docs/DevGuide.md                      |  37 +++
 core/Lucy/Docs/DocIDs.md                        |  28 ++
 core/Lucy/Docs/FileFormat.md                    | 191 +++++++++++
 core/Lucy/Docs/FileLocking.cfh                  |  83 -----
 core/Lucy/Docs/FileLocking.md                   |  80 +++++
 core/Lucy/Docs/IRTheory.md                      |  44 +++
 core/Lucy/Docs/Tutorial.md                      |  53 +++
 core/Lucy/Docs/Tutorial/AnalysisTutorial.md     |  85 +++++
 core/Lucy/Docs/Tutorial/BeyondSimpleTutorial.md | 125 ++++++++
 core/Lucy/Docs/Tutorial/FieldTypeTutorial.md    |  60 ++++
 core/Lucy/Docs/Tutorial/HighlighterTutorial.md  |  62 ++++
 core/Lucy/Docs/Tutorial/QueryObjectsTutorial.md | 185 +++++++++++
 core/Lucy/Docs/Tutorial/SimpleTutorial.md       | 298 +++++++++++++++++
 perl/buildlib/Lucy/Build/Binding/Docs.pm        |  69 ----
 perl/lib/Lucy/Docs/Cookbook.pod                 |  61 ----
 perl/lib/Lucy/Docs/Cookbook/CustomQuery.pod     | 320 ------------------
 .../Lucy/Docs/Cookbook/CustomQueryParser.pod    | 236 --------------
 perl/lib/Lucy/Docs/Cookbook/FastUpdates.pod     | 153 ---------
 perl/lib/Lucy/Docs/DevGuide.pm                  |  24 --
 perl/lib/Lucy/Docs/DocIDs.pod                   |  47 ---
 perl/lib/Lucy/Docs/FileFormat.pod               | 239 --------------
 perl/lib/Lucy/Docs/FileLocking.pm               |  24 --
 perl/lib/Lucy/Docs/IRTheory.pod                 |  94 ------
 perl/lib/Lucy/Docs/Tutorial.pod                 |  89 -----
 perl/lib/Lucy/Docs/Tutorial/Analysis.pod        |  94 ------
 perl/lib/Lucy/Docs/Tutorial/BeyondSimple.pod    | 153 ---------
 perl/lib/Lucy/Docs/Tutorial/FieldType.pod       |  74 -----
 perl/lib/Lucy/Docs/Tutorial/Highlighter.pod     |  76 -----
 perl/lib/Lucy/Docs/Tutorial/QueryObjects.pod    | 198 ------------
 perl/lib/Lucy/Docs/Tutorial/Simple.pod          | 298 -----------------
 35 files changed, 1973 insertions(+), 2391 deletions(-)
----------------------------------------------------------------------

[2/5] lucy git commit: Convert POD to Markdown

Posted by nw...@apache.org.

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Cookbook/CustomQueryParser.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Cookbook/CustomQueryParser.pod b/perl/lib/Lucy/Docs/Cookbook/CustomQueryParser.pod
deleted file mode 100644
index 250d536..0000000
--- a/perl/lib/Lucy/Docs/Cookbook/CustomQueryParser.pod
+++ /dev/null
@@ -1,236 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Cookbook::CustomQueryParser - Sample subclass of QueryParser.
-
-=head1 ABSTRACT
-
-Implement a custom search query language using a subclass of
-L<Lucy::Search::QueryParser>.
-
-=head1 The language
-
-At first, our query language will support only simple term queries and phrases
-delimited by double quotes.  For simplicity's sake, it will not support
-parenthetical groupings, boolean operators, or prepended plus/minus.  The
-results for all subqueries will be unioned together -- i.e. joined using an OR
--- which is usually the best approach for small-to-medium-sized document
-collections.
-
-Later, we'll add support for trailing wildcards.
-
-=head1 Single-field parser
-
-Our initial parser implentation will generate queries against a single fixed
-field, "content", and it will analyze text using a fixed choice of English
-EasyAnalyzer.  We won't subclass Lucy::Search::QueryParser just yet.
-
-    package FlatQueryParser;
-    use Lucy::Search::TermQuery;
-    use Lucy::Search::PhraseQuery;
-    use Lucy::Search::ORQuery;
-    use Carp;
-    
-    sub new { 
-        my $analyzer = Lucy::Analysis::EasyAnalyzer->new(
-            language => 'en',
-        );
-        return bless { 
-            field    => 'content',
-            analyzer => $analyzer,
-        }, __PACKAGE__;
-    }
-
-Some private helper subs for creating TermQuery and PhraseQuery objects will
-help keep the size of our main parse() subroutine down:
-
-    sub _make_term_query {
-        my ( $self, $term ) = @_;
-        return Lucy::Search::TermQuery->new(
-            field => $self->{field},
-            term  => $term,
-        );
-    }
-    
-    sub _make_phrase_query {
-        my ( $self, $terms ) = @_;
-        return Lucy::Search::PhraseQuery->new(
-            field => $self->{field},
-            terms => $terms,
-        );
-    }
-
-Our private _tokenize() method treats double-quote delimited material as a
-single token and splits on whitespace everywhere else.
-
-    sub _tokenize {
-        my ( $self, $query_string ) = @_;
-        my @tokens;
-        while ( length $query_string ) {
-            if ( $query_string =~ s/^\s+// ) {
-                next;    # skip whitespace
-            }
-            elsif ( $query_string =~ s/^("[^"]*(?:"|$))// ) {
-                push @tokens, $1;    # double-quoted phrase
-            }
-            else {
-                $query_string =~ s/(\S+)//;
-                push @tokens, $1;    # single word
-            }
-        }
-        return \@tokens;
-    }
-
-The main parsing routine creates an array of tokens by calling _tokenize(),
-runs the tokens through through the EasyAnalyzer, creates TermQuery or
-PhraseQuery objects according to how many tokens emerge from the
-EasyAnalyzer's split() method, and adds each of the sub-queries to the primary
-ORQuery.
-
-    sub parse {
-        my ( $self, $query_string ) = @_;
-        my $tokens   = $self->_tokenize($query_string);
-        my $analyzer = $self->{analyzer};
-        my $or_query = Lucy::Search::ORQuery->new;
-    
-        for my $token (@$tokens) {
-            if ( $token =~ s/^"// ) {
-                $token =~ s/"$//;
-                my $terms = $analyzer->split($token);
-                my $query = $self->_make_phrase_query($terms);
-                $or_query->add_child($phrase_query);
-            }
-            else {
-                my $terms = $analyzer->split($token);
-                if ( @$terms == 1 ) {
-                    my $query = $self->_make_term_query( $terms->[0] );
-                    $or_query->add_child($query);
-                }
-                elsif ( @$terms > 1 ) {
-                    my $query = $self->_make_phrase_query($terms);
-                    $or_query->add_child($query);
-                }
-            }
-        }
-    
-        return $or_query;
-    }
-
-=head1 Multi-field parser
-
-Most often, the end user will want their search query to match not only a
-single 'content' field, but also 'title' and so on.  To make that happen, we
-have to turn queries such as this...
-
-    foo AND NOT bar
-
-... into the logical equivalent of this:
-
-    (title:foo OR content:foo) AND NOT (title:bar OR content:bar)
-
-Rather than continue with our own from-scratch parser class and write the
-routines to accomplish that expansion, we're now going to subclass Lucy::Search::QueryParser
-and take advantage of some of its existing methods.
-
-Our first parser implementation had the "content" field name and the choice of
-English EasyAnalyzer hard-coded for simplicity, but we don't need to do that
-once we subclass Lucy::Search::QueryParser.  QueryParser's constructor --
-which we will inherit, allowing us to eliminate our own constructor --
-requires a Schema which conveys field
-and Analyzer information, so we can just defer to that.
-
-    package FlatQueryParser;
-    use base qw( Lucy::Search::QueryParser );
-    use Lucy::Search::TermQuery;
-    use Lucy::Search::PhraseQuery;
-    use Lucy::Search::ORQuery;
-    use PrefixQuery;
-    use Carp;
-    
-    # Inherit new()
-
-We're also going to jettison our _make_term_query() and _make_phrase_query()
-helper subs and chop our parse() subroutine way down.  Our revised parse()
-routine will generate Lucy::Search::LeafQuery objects instead of TermQueries
-and PhraseQueries:
-
-    sub parse {
-        my ( $self, $query_string ) = @_;
-        my $tokens = $self->_tokenize($query_string);
-        my $or_query = Lucy::Search::ORQuery->new;
-        for my $token (@$tokens) {
-            my $leaf_query = Lucy::Search::LeafQuery->new( text => $token );
-            $or_query->add_child($leaf_query);
-        }
-        return $self->expand($or_query);
-    }
-
-The magic happens in QueryParser's expand() method, which walks the ORQuery
-object we supply to it looking for LeafQuery objects, and calls expand_leaf()
-for each one it finds.  expand_leaf() performs field-specific analysis,
-decides whether each query should be a TermQuery or a PhraseQuery, and if
-multiple fields are required, creates an ORQuery which mults out e.g.  C<foo>
-into C<(title:foo OR content:foo)>.
-
-=head1 Extending the query language
-
-To add support for trailing wildcards to our query language, we need to
-override expand_leaf() to accommodate PrefixQuery, while deferring to the
-parent class implementation on TermQuery and PhraseQuery.
-
-    sub expand_leaf {
-        my ( $self, $leaf_query ) = @_;
-        my $text = $leaf_query->get_text;
-        if ( $text =~ /\*$/ ) {
-            my $or_query = Lucy::Search::ORQuery->new;
-            for my $field ( @{ $self->get_fields } ) {
-                my $prefix_query = PrefixQuery->new(
-                    field        => $field,
-                    query_string => $text,
-                );
-                $or_query->add_child($prefix_query);
-            }
-            return $or_query;
-        }
-        else {
-            return $self->SUPER::expand_leaf($leaf_query);
-        }
-    }
-
-Ordinarily, those asterisks would have been stripped when running tokens
-through the EasyAnalyzer -- query strings containing "foo*" would produce
-TermQueries for the term "foo".  Our override intercepts tokens with trailing
-asterisks and processes them as PrefixQueries before C<SUPER::expand_leaf> can
-discard them, so that a search for "foo*" can match "food", "foosball", and so
-on.
-
-=head1 Usage
-
-Insert our custom parser into the search.cgi sample app to get a feel for how
-it behaves:
-
-    my $parser = FlatQueryParser->new( schema => $searcher->get_schema );
-    my $query  = $parser->parse( decode( 'UTF-8', $cgi->param('q') || '' ) );
-    my $hits   = $searcher->hits(
-        query      => $query,
-        offset     => $offset,
-        num_wanted => $page_size,
-    );
-    ...
-
-=cut
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Cookbook/FastUpdates.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Cookbook/FastUpdates.pod b/perl/lib/Lucy/Docs/Cookbook/FastUpdates.pod
deleted file mode 100644
index eff8e54..0000000
--- a/perl/lib/Lucy/Docs/Cookbook/FastUpdates.pod
+++ /dev/null
@@ -1,153 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Cookbook::FastUpdates - Near real-time index updates.
-
-=head1 ABSTRACT
-
-While index updates are fast on average, worst-case update performance may be
-significantly slower.  To make index updates consistently quick, we must
-manually intervene to control the process of index segment consolidation.
-
-=head1 The problem
-
-Ordinarily, modifying an index is cheap. New data is added to new segments,
-and the time to write a new segment scales more or less linearly with the
-number of documents added during the indexing session.  
-
-Deletions are also cheap most of the time, because we don't remove documents
-immediately but instead mark them as deleted, and adding the deletion mark is
-cheap.
-
-However, as new segments are added and the deletion rate for existing segments
-increases, search-time performance slowly begins to degrade.  At some point,
-it becomes necessary to consolidate existing segments, rewriting their data
-into a new segment.  
-
-If the recycled segments are small, the time it takes to rewrite them may not
-be significant.  Every once in a while, though, a large amount of data must be
-rewritten.
-
-=head1 Procrastinating and playing catch-up
-
-The simplest way to force fast index updates is to avoid rewriting anything.
-
-Indexer relies upon L<IndexManager|Lucy::Index::IndexManager>'s
-recycle() method to tell it which segments should be consolidated.  If we
-subclass IndexManager and override recycle() so that it always returns an
-empty array, we get consistently quick performance:
-
-    package NoMergeManager;
-    use base qw( Lucy::Index::IndexManager );
-    sub recycle { [] }
-    
-    package main;
-    my $indexer = Lucy::Index::Indexer->new(
-        index => '/path/to/index',
-        manager => NoMergeManager->new,
-    );
-    ...
-    $indexer->commit;
-
-However, we can't procrastinate forever.  Eventually, we'll have to run an
-ordinary, uncontrolled indexing session, potentially triggering a large
-rewrite of lots of small and/or degraded segments:
-
-    my $indexer = Lucy::Index::Indexer->new( 
-        index => '/path/to/index', 
-        # manager => NoMergeManager->new,
-    );
-    ...
-    $indexer->commit;
-
-=head1 Acceptable worst-case update time, slower degradation
-
-Never merging anything at all in the main indexing process is probably
-overkill.  Small segments are relatively cheap to merge; we just need to guard
-against the big rewrites.  
-
-Setting a ceiling on the number of documents in the segments to be recycled
-allows us to avoid a mass proliferation of tiny, single-document segments,
-while still offering decent worst-case update speed:
-
-    package LightMergeManager;
-    use base qw( Lucy::Index::IndexManager );
-    
-    sub recycle {
-        my $self = shift;
-        my $seg_readers = $self->SUPER::recycle(@_);
-        @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
-        return $seg_readers;
-    }
-
-However, we still have to consolidate every once in a while, and while that
-happens content updates will be locked out.
-
-=head1 Background merging
-
-If it's not acceptable to lock out updates while the index consolidation
-process runs, the alternative is to move the consolidation process out of
-band, using Lucy::Index::BackgroundMerger.  
-
-It's never safe to have more than one Indexer attempting to modify the content
-of an index at the same time, but a BackgroundMerger and an Indexer can
-operate simultaneously:
-
-    # Indexing process.
-    use Scalar::Util qw( blessed );
-    my $retries = 0;
-    while (1) {
-        eval {
-            my $indexer = Lucy::Index::Indexer->new(
-                    index => '/path/to/index',
-                    manager => LightMergeManager->new,
-                );
-            $indexer->add_doc($doc);
-            $indexer->commit;
-        };
-        last unless $@;
-        if ( blessed($@) and $@->isa("Lucy::Store::LockErr") ) {
-            # Catch LockErr.
-            warn "Couldn't get lock ($retries retries)";
-            $retries++;
-        }
-        else {
-            die "Write failed: $@";
-        }
-    }
-
-    # Background merge process.
-    my $manager = Lucy::Index::IndexManager->new;
-    $manager->set_write_lock_timeout(60_000);
-    my $bg_merger = Lucy::Index::BackgroundMerger->new(
-        index   => '/path/to/index',
-        manager => $manager,
-    );
-    $bg_merger->commit;
-
-The exception handling code becomes useful once you have more than one index
-modification process happening simultaneously.  By default, Indexer tries
-several times to acquire a write lock over the span of one second, then holds
-it until commit() completes.  BackgroundMerger handles most of its work
-without the write lock, but it does need it briefly once at the beginning and
-once again near the end.  Under normal loads, the internal retry logic will
-resolve conflicts, but if it's not acceptable to miss an insert, you probably
-want to catch LockErr exceptions thrown by Indexer.  In contrast, a LockErr
-from BackgroundMerger probably just needs to be logged.
-
-=cut
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/DocIDs.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/DocIDs.pod b/perl/lib/Lucy/Docs/DocIDs.pod
deleted file mode 100644
index 4210f3d..0000000
--- a/perl/lib/Lucy/Docs/DocIDs.pod
+++ /dev/null
@@ -1,47 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::DocIDs - Characteristics of Apache Lucy document ids.
-
-=head1 DESCRIPTION
-
-=head2 Document ids are signed 32-bit integers
-
-Document ids in Apache Lucy start at 1.  Because 0 is never a valid doc id, we
-can use it as a sentinel value:
-
-    while ( my $doc_id = $posting_list->next ) {
-        ...
-    }
-
-=head2 Document ids are ephemeral
-
-The document ids used by Lucy are associated with a single index
-snapshot.  The moment an index is updated, the mapping of document ids to
-documents is subject to change.
-
-Since IndexReader objects represent a point-in-time view of an index, document
-ids are guaranteed to remain static for the life of the reader.  However,
-because they are not permanent, Lucy document ids cannot be used as
-foreign keys to locate records in external data sources.  If you truly need a
-primary key field, you must define it and populate it yourself.
-
-Furthermore, the order of document ids does not tell you anything about the
-sequence in which documents were added to the index.
-
-=cut
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/FileFormat.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/FileFormat.pod b/perl/lib/Lucy/Docs/FileFormat.pod
deleted file mode 100644
index 2859442..0000000
--- a/perl/lib/Lucy/Docs/FileFormat.pod
+++ /dev/null
@@ -1,239 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::FileFormat - Overview of index file format.
-
-=head1 OVERVIEW
-
-It is not necessary to understand the current implementation details of the
-index file format in order to use Apache Lucy effectively, but it may be
-helpful if you are interested in tweaking for high performance, exotic usage,
-or debugging and development.  
-
-On a file system, an index is a directory.  The files inside have a
-hierarchical relationship: an index is made up of "segments", each of which is
-an independent inverted index with its own subdirectory; each segment is made
-up of several component parts.
-
-    [index]--|
-             |--snapshot_XXX.json
-             |--schema_XXX.json
-             |--write.lock
-             |
-             |--seg_1--|
-             |         |--segmeta.json
-             |         |--cfmeta.json
-             |         |--cf.dat-------|
-             |                         |--[lexicon]
-             |                         |--[postings]
-             |                         |--[documents]
-             |                         |--[highlight]
-             |                         |--[deletions]
-             |
-             |--seg_2--|
-             |         |--segmeta.json
-             |         |--cfmeta.json
-             |         |--cf.dat-------|
-             |                         |--[lexicon]
-             |                         |--[postings]
-             |                         |--[documents]
-             |                         |--[highlight]
-             |                         |--[deletions]
-             |
-             |--[...]--| 
-
-=head1 Write-once philosophy
-
-All segment directory names consist of the string "seg_" followed by a number
-in base 36: seg_1, seg_5m, seg_p9s2 and so on, with higher numbers indicating
-more recent segments.  Once a segment is finished and committed, its name is
-never re-used and its files are never modified.
-
-Old segments become obsolete and can be removed when their data has been
-consolidated into new segments during the process of segment merging and
-optimization.  A fully-optimized index has only one segment.
-
-=head1 Top-level entries
-
-There are a handful of "top-level" files and directories which belong to the
-entire index rather than to a particular segment.
-
-=head2 snapshot_XXX.json
-
-A "snapshot" file, e.g. C<snapshot_m7p.json>, is list of index files and
-directories.  Because index files, once written, are never modified, the list
-of entries in a snapshot defines a point-in-time view of the data in an index.
-
-Like segment directories, snapshot files also utilize the
-unique-base-36-number naming convention; the higher the number, the more
-recent the file.  The appearance of a new snapshot file within the index
-directory constitutes an index update.  While a new segment is being written
-new files may be added to the index directory, but until a new snapshot file
-gets written, a Searcher opening the index for reading won't know about them.
-
-=head2 schema_XXX.json
-
-The schema file is a Schema object describing the index's format, serialized
-as JSON.  It, too, is versioned, and a given snapshot file will reference one
-and only one schema file.
-
-=head2 locks 
-
-By default, only one indexing process may safely modify the index at any given
-time.  Processes reserve an index by laying claim to the C<write.lock> file
-within the C<locks/> directory.  A smattering of other lock files may be used
-from time to time, as well.
-
-=head1 A segment's component parts
-
-By default, each segment has up to five logical components: lexicon, postings,
-document storage, highlight data, and deletions.  Binary data from these
-components gets stored in virtual files within the "cf.dat" compound file;
-metadata is stored in a shared "segmeta.json" file.
-
-=head2 segmeta.json
-
-The segmeta.json file is a central repository for segment metadata.  In
-addition to information such as document counts and field numbers, it also
-warehouses arbitrary metadata on behalf of individual index components.
-
-=head2 Lexicon 
-
-Each indexed field gets its own lexicon in each segment.  The exact files
-involved depend on the field's type, but generally speaking there will be two
-parts.  First, there's a primary C<lexicon-XXX.dat> file which houses a
-complete term list associating terms with corpus frequency statistics,
-postings file locations, etc.  Second, one or more "lexicon index" files may
-be present which contain periodic samples from the primary lexicon file to
-facilitate fast lookups.
-
-=head2 Postings
-
-"Posting" is a technical term from the field of 
-L<information retrieval|Lucy::Docs::IRTheory>, defined as a single
-instance of a one term indexing one document.  If you are looking at the index
-in the back of a book, and you see that "freedom" is referenced on pages 8,
-86, and 240, that would be three postings, which taken together form a
-"posting list".  The same terminology applies to an index in electronic form.
-
-Each segment has one postings file per indexed field.  When a search is
-performed for a single term, first that term is looked up in the lexicon.  If
-the term exists in the segment, the record in the lexicon will contain
-information about which postings file to look at and where to look.
-
-The first thing any posting record tells you is a document id.  By iterating
-over all the postings associated with a term, you can find all the documents
-that match that term, a process which is analogous to looking up page numbers
-in a book's index.  However, each posting record typically contains other
-information in addition to document id, e.g. the positions at which the term
-occurs within the field.
-
-=head2 Documents
-
-The document storage section is a simple database, organized into two files:
-
-=over
-
-=item * 
-
-B<documents.dat> - Serialized documents.
-
-=item *
-
-B<documents.ix> - Document storage index, a solid array of 64-bit integers
-where each integer location corresponds to a document id, and the value at
-that location points at a file position in the documents.dat file.
-
-=back
-
-=head2 Highlight data 
-
-The files which store data used for excerpting and highlighting are organized
-similarly to the files used to store documents.
-
-=over
-
-=item * 
-
-B<highlight.dat> - Chunks of serialized highlight data, one per doc id.
-
-=item *
-
-B<highlight.ix> - Highlight data index -- as with the C<documents.ix> file, a
-solid array of 64-bit file pointers.
-
-=back
-
-=head2 Deletions
-
-When a document is "deleted" from a segment, it is not actually purged right
-away; it is merely marked as "deleted" via a deletions file.  Deletions files
-contains bit vectors with one bit for each document in the segment; if bit
-#254 is set then document 254 is deleted, and if that document turns up in a
-search it will be masked out.
-
-It is only when a segment's contents are rewritten to a new segment during the
-segment-merging process that deleted documents truly go away.
-
-=head1 Compound Files
-
-If you peer inside an index directory, you won't actually find any files named
-"documents.dat", "highlight.ix", etc. unless there is an indexing process
-underway.  What you will find instead is one "cf.dat" and one "cfmeta.json"
-file per segment.
-
-To minimize the need for file descriptors at search-time, all per-segment
-binary data files are concatenated together in "cf.dat" at the close of each
-indexing session.  Information about where each file begins and ends is stored
-in C<cfmeta.json>.  When the segment is opened for reading, a single file
-descriptor per "cf.dat" file can be shared among several readers.
-
-=head1 A Typical Search
-
-Here's a simplified narrative, dramatizing how a search for "freedom" against
-a given segment plays out:
-
-=over
-
-=item 1
-
-The searcher asks the relevant Lexicon Index, "Do you know anything about
-'freedom'?"  Lexicon Index replies, "Can't say for sure, but if the main
-Lexicon file does, 'freedom' is probably somewhere around byte 21008".  
-
-=item 2
-
-The main Lexicon tells the searcher "One moment, let me scan our records...
-Yes, we have 2 documents which contain 'freedom'.  You'll find them in
-seg_6/postings-4.dat starting at byte 66991."
-
-=item 3
-
-The Postings file says "Yep, we have 'freedom', all right!  Document id 40
-has 1 'freedom', and document 44 has 8.  If you need to know more, like if any
-'freedom' is part of the phrase 'freedom of speech', ask me about positions!
-
-=item 4
-
-If the searcher is only looking for 'freedom' in isolation, that's where it
-stops.  It now knows enough to assign the documents scores against "freedom",
-with the 8-freedom document likely ranking higher than the single-freedom
-document.
-
-=back
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/IRTheory.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/IRTheory.pod b/perl/lib/Lucy/Docs/IRTheory.pod
deleted file mode 100644
index 7696ea8..0000000
--- a/perl/lib/Lucy/Docs/IRTheory.pod
+++ /dev/null
@@ -1,94 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::IRTheory - Crash course in information retrieval.
-
-=head1 ABSTRACT
-
-Just enough Information Retrieval theory to find your way around Apache Lucy.
-
-=head1 Terminology
-
-Lucy uses some terminology from the field of information retrieval which
-may be unfamiliar to many users.  "Document" and "term" mean pretty much what
-you'd expect them to, but others such as "posting" and "inverted index" need a
-formal introduction:
-
-=over
-
-=item *
-
-I<document> - An atomic unit of retrieval.
-
-=item *
-
-I<term> - An attribute which describes a document.
-
-=item *
-
-I<posting> - One term indexing one document.
-
-=item *
-
-I<term list> - The complete list of terms which describe a document.
-
-=item *
-
-I<posting list> - The complete list of documents which a term indexes.
-
-=item *
-
-I<inverted index> - A data structure which maps from terms to documents.
-
-=back
-
-Since Lucy is a practical implementation of IR theory, it loads these
-abstract, distilled definitions down with useful traits.  For instance, a
-"posting" in its most rarefied form is simply a term-document pairing; in
-Lucy, the class L<Lucy::Index::Posting::MatchPosting> fills this
-role.  However, by associating additional information with a posting like the
-number of times the term occurs in the document, we can turn it into a
-L<ScorePosting|Lucy::Index::Posting::ScorePosting>, making it possible
-to rank documents by relevance rather than just list documents which happen to
-match in no particular order.
-
-=head1 TF/IDF ranking algorithm
-
-Lucy uses a variant of the well-established "Term Frequency / Inverse
-Document Frequency" weighting scheme.  A thorough treatment of TF/IDF is too
-ambitious for our present purposes, but in a nutshell, it means that...
-
-=over
-
-=item
-
-in a search for C<skate park>, documents which score well for the
-comparatively rare term C<skate> will rank higher than documents which score
-well for the more common term C<park>.  
-
-=item
-
-a 10-word text which has one occurrence each of both C<skate> and C<park> will
-rank higher than a 1000-word text which also contains one occurrence of each.
-
-=back
-
-A web search for "tf idf" will turn up many excellent explanations of the
-algorithm.
-
-=cut
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Tutorial.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Tutorial.pod b/perl/lib/Lucy/Docs/Tutorial.pod
deleted file mode 100644
index 7ec7467..0000000
--- a/perl/lib/Lucy/Docs/Tutorial.pod
+++ /dev/null
@@ -1,89 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Tutorial - Step-by-step introduction to Apache Lucy.
-
-=head1 ABSTRACT 
-
-Explore Apache Lucy's basic functionality by starting with a minimalist CGI
-search app based on L<Lucy::Simple> and transforming it, step by step, into an
-"advanced search" interface utilizing more flexible core modules like
-L<Lucy::Index::Indexer> and L<Lucy::Search::IndexSearcher>.
-
-=head1 DESCRIPTION
-
-=head2 Chapters
-
-=over
-
-=item *
-
-L<Lucy::Docs::Tutorial::Simple> - Build a bare-bones search app using
-L<Lucy::Simple>.
-
-=item *
-
-L<Lucy::Docs::Tutorial::BeyondSimple> - Rebuild the app using core
-classes like L<Indexer|Lucy::Index::Indexer> and
-L<IndexSearcher|Lucy::Search::IndexSearcher> in place of Lucy::Simple.
-
-=item *
-
-L<Lucy::Docs::Tutorial::FieldType> - Experiment with different field
-characteristics using subclasses of L<Lucy::Plan::FieldType>.
-
-=item *
-
-L<Lucy::Docs::Tutorial::Analysis> - Examine how the choice of
-L<Lucy::Analysis::Analyzer> subclass affects search results.
-
-=item *
-
-L<Lucy::Docs::Tutorial::Highlighter> - Augment search results with
-highlighted excerpts.
-
-=item *
-
-L<Lucy::Docs::Tutorial::QueryObjects> - Unlock advanced search features
-by using Query objects instead of query strings.
-
-=back
-
-=head2 Source materials
-
-The source material used by the tutorial app -- a multi-text-file presentation
-of the United States constitution -- can be found in the C<sample> directory
-at the root of the Lucy distribution, along with finished indexing and search
-apps.
-
-    sample/indexer.pl        # indexing app
-    sample/search.cgi        # search app
-    sample/us_constitution   # corpus
-
-=head2 Conventions
-
-The user is expected to be familiar with OO Perl and basic CGI programming.
-
-The code in this tutorial assumes a Unix-flavored operating system and the
-Apache webserver, but will work with minor modifications on other setups.
-
-=head1 SEE ALSO
-
-More advanced and esoteric subjects are covered in
-L<Lucy::Docs::Cookbook>.
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Tutorial/Analysis.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Tutorial/Analysis.pod b/perl/lib/Lucy/Docs/Tutorial/Analysis.pod
deleted file mode 100644
index 24c0b58..0000000
--- a/perl/lib/Lucy/Docs/Tutorial/Analysis.pod
+++ /dev/null
@@ -1,94 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Tutorial::Analysis - How to choose and use Analyzers.
-
-=head1 DESCRIPTION
-
-Try swapping out the EasyAnalyzer in our Schema for a StandardTokenizer:
-
-    my $tokenizer = Lucy::Analysis::StandardTokenizer->new;
-    my $type = Lucy::Plan::FullTextType->new(
-        analyzer => $tokenizer,
-    );
-
-Search for C<senate>, C<Senate>, and C<Senator> before and after making the
-change and re-indexing.
-
-Under EasyAnalyzer, the results are identical for all three searches, but
-under StandardTokenizer, searches are case-sensitive, and the result sets for
-C<Senate> and C<Senator> are distinct.
-
-=head2 EasyAnalyzer
-
-What's happening is that EasyAnalyzer is performing more aggressive processing
-than StandardTokenizer.  In addition to tokenizing, it's also converting all
-text to lower case so that searches are case-insensitive, and using a
-"stemming" algorithm to reduce related words to a common stem (C<senat>, in
-this case).
-
-EasyAnalyzer is actually multiple Analyzers wrapped up in a single package.
-In this case, it's three-in-one, since specifying a EasyAnalyzer with
-C<< language => 'en' >> is equivalent to this snippet:
-
-    my $tokenizer    = Lucy::Analysis::StandardTokenizer->new;
-    my $normalizer   = Lucy::Analysis::Normalizer->new;
-    my $stemmer      = Lucy::Analysis::SnowballStemmer->new( language => 'en' );
-    my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
-        analyzers => [ $tokenizer, $normalizer, $stemmer ],
-    );
-
-You can add or subtract Analyzers from there if you like.  Try adding a fourth
-Analyzer, a SnowballStopFilter for suppressing "stopwords" like "the", "if",
-and "maybe".
-
-    my $stopfilter = Lucy::Analysis::SnowballStopFilter->new( 
-        language => 'en',
-    );
-    my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
-        analyzers => [ $tokenizer, $normalizer, $stopfilter, $stemmer ],
-    );
-
-Also, try removing the SnowballStemmer.
-
-    my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
-        analyzers => [ $tokenizer, $normalizer ],
-    );
-
-The original choice of a stock English EasyAnalyzer probably still yields the
-best results for this document collection, but you get the idea: sometimes you
-want a different Analyzer.
-
-=head2 When the best Analyzer is no Analyzer
-
-Sometimes you don't want an Analyzer at all.  That was true for our "url"
-field because we didn't need it to be searchable, but it's also true for
-certain types of searchable fields.  For instance, "category" fields are often
-set up to match exactly or not at all, as are fields like "last_name" (because
-you may not want to conflate results for "Humphrey" and "Humphries").
-
-To specify that there should be no analysis performed at all, use StringType:
-
-    my $type = Lucy::Plan::StringType->new;
-    $schema->spec_field( name => 'category', type => $type );
-
-=head2 Highlighting up next
-
-In our next tutorial chapter, L<Lucy::Docs::Tutorial::Highlighter>,
-we'll add highlighted excerpts from the "content" field to our search results.
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Tutorial/BeyondSimple.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Tutorial/BeyondSimple.pod b/perl/lib/Lucy/Docs/Tutorial/BeyondSimple.pod
deleted file mode 100644
index 6ce1261..0000000
--- a/perl/lib/Lucy/Docs/Tutorial/BeyondSimple.pod
+++ /dev/null
@@ -1,153 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Tutorial::BeyondSimple - A more flexible app structure.
-
-=head1 DESCRIPTION
-
-=head2 Goal
-
-In this tutorial chapter, we'll refactor the apps we built in
-L<Lucy::Docs::Tutorial::Simple> so that they look exactly the same from
-the end user's point of view, but offer the developer greater possibilites for
-expansion.  
-
-To achieve this, we'll ditch Lucy::Simple and replace it with the
-classes that it uses internally:
-
-=over
-
-=item *
-
-L<Lucy::Plan::Schema> - Plan out your index.
-
-=item *
-
-L<Lucy::Plan::FullTextType> - Field type for full text search.
-
-=item *
-
-L<Lucy::Analysis::EasyAnalyzer> - A one-size-fits-all parser/tokenizer.
-
-=item *
-
-L<Lucy::Index::Indexer> - Manipulate index content.
-
-=item *
-
-L<Lucy::Search::IndexSearcher> - Search an index.
-
-=item *
-
-L<Lucy::Search::Hits> - Iterate over hits returned by a Searcher.
-
-=back
-
-=head2 Adaptations to indexer.pl
-
-After we load our modules...
-
-    use Lucy::Plan::Schema;
-    use Lucy::Plan::FullTextType;
-    use Lucy::Analysis::EasyAnalyzer;
-    use Lucy::Index::Indexer;
-
-... the first item we're going need is a L<Schema|Lucy::Plan::Schema>. 
-
-The primary job of a Schema is to specify what fields are available and how
-they're defined.  We'll start off with three fields: title, content and url.
-
-    # Create Schema.
-    my $schema = Lucy::Plan::Schema->new;
-    my $easyanalyzer = Lucy::Analysis::EasyAnalyzer->new(
-        language => 'en',
-    );
-    my $type = Lucy::Plan::FullTextType->new(
-        analyzer => $easyanalyzer,
-    );
-    $schema->spec_field( name => 'title',   type => $type );
-    $schema->spec_field( name => 'content', type => $type );
-    $schema->spec_field( name => 'url',     type => $type );
-
-All of the fields are spec'd out using the "FullTextType" FieldType,
-indicating that they will be searchable as "full text" -- which means that
-they can be searched for individual words.  The "analyzer", which is unique to
-FullTextType fields, is what breaks up the text into searchable tokens.
-
-Next, we'll swap our Lucy::Simple object out for a Lucy::Index::Indexer.
-The substitution will be straightforward because Simple has merely been
-serving as a thin wrapper around an inner Indexer, and we'll just be peeling
-away the wrapper.
-
-First, replace the constructor:
-
-    # Create Indexer.
-    my $indexer = Lucy::Index::Indexer->new(
-        index    => $path_to_index,
-        schema   => $schema,
-        create   => 1,
-        truncate => 1,
-    );
-
-Next, have the C<$indexer> object C<add_doc> where we were having the
-C<$lucy> object C<add_doc> before:
-
-    foreach my $filename (@filenames) {
-        my $doc = parse_file($filename);
-        $indexer->add_doc($doc);
-    }
-
-There's only one extra step required: at the end of the app, you must call
-commit() explicitly to close the indexing session and commit your changes.
-(Lucy::Simple hides this detail, calling commit() implicitly when it needs to).
-
-    $indexer->commit;
-
-=head2 Adaptations to search.cgi
-
-In our search app as in our indexing app, Lucy::Simple has served as a
-thin wrapper -- this time around L<Lucy::Search::IndexSearcher> and
-L<Lucy::Search::Hits>.  Swapping out Simple for these two classes is
-also straightforward:
-
-    use Lucy::Search::IndexSearcher;
-    
-    my $searcher = Lucy::Search::IndexSearcher->new( 
-        index => $path_to_index,
-    );
-    my $hits = $searcher->hits(    # returns a Hits object, not a hit count
-        query      => $q,
-        offset     => $offset,
-        num_wanted => $page_size,
-    );
-    my $hit_count = $hits->total_hits;  # get the hit count here
-    
-    ...
-    
-    while ( my $hit = $hits->next ) {
-        ...
-    }
-
-=head2 Hooray!
-
-Congratulations!  Your apps do the same thing as before... but now they'll be
-easier to customize.  
-
-In our next chapter, L<Lucy::Docs::Tutorial::FieldType>, we'll explore
-how to assign different behaviors to different fields.
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Tutorial/FieldType.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Tutorial/FieldType.pod b/perl/lib/Lucy/Docs/Tutorial/FieldType.pod
deleted file mode 100644
index 05d0e82..0000000
--- a/perl/lib/Lucy/Docs/Tutorial/FieldType.pod
+++ /dev/null
@@ -1,74 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Tutorial::FieldType - Specify per-field properties and
-behaviors.
-
-=head1 DESCRIPTION
-
-The Schema we used in the last chapter specifies three fields: 
-
-    my $type = Lucy::Plan::FullTextType->new(
-        analyzer => $polyanalyzer,
-    );
-    $schema->spec_field( name => 'title',   type => $type );
-    $schema->spec_field( name => 'content', type => $type );
-    $schema->spec_field( name => 'url',     type => $type );
-
-Since they are all defined as "full text" fields, they are all searchable --
-including the C<url> field, a dubious choice.  Some URLs contain meaningful
-information, but these don't, really:
-
-    http://example.com/us_constitution/amend1.txt
-
-We may as well not bother indexing the URL content.  To achieve that we need
-to assign the C<url> field to a different FieldType.  
-
-=head2 StringType
-
-Instead of FullTextType, we'll use a
-L<StringType|Lucy::Plan::StringType>, which doesn't use an
-Analyzer to break up text into individual fields.  Furthermore, we'll mark
-this StringType as unindexed, so that its content won't be searchable at all.
-
-    my $url_type = Lucy::Plan::StringType->new( indexed => 0 );
-    $schema->spec_field( name => 'url', type => $url_type );
-
-To observe the change in behavior, try searching for C<us_constitution> both
-before and after changing the Schema and re-indexing.
-
-=head2 Toggling 'stored'
-
-For a taste of other FieldType possibilities, try turning off C<stored> for
-one or more fields.
-
-    my $content_type = Lucy::Plan::FullTextType->new(
-        analyzer => $polyanalyzer,
-        stored   => 0,
-    );
-
-Turning off C<stored> for either C<title> or C<url> mangles our results page,
-but since we're not displaying C<content>, turning it off for C<content> has
-no effect -- except on index size.
-
-=head2 Analyzers up next
-
-Analyzers play a crucial role in the behavior of FullTextType fields.  In our
-next tutorial chapter, L<Lucy::Docs::Tutorial::Analysis>, we'll see how
-changing up the Analyzer changes search results.
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Tutorial/Highlighter.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Tutorial/Highlighter.pod b/perl/lib/Lucy/Docs/Tutorial/Highlighter.pod
deleted file mode 100644
index 9b6879c..0000000
--- a/perl/lib/Lucy/Docs/Tutorial/Highlighter.pod
+++ /dev/null
@@ -1,76 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Tutorial::Highlighter - Augment search results with
-highlighted excerpts.
-
-=head1 DESCRIPTION
-
-Adding relevant excerpts with highlighted search terms to your search results
-display makes it much easier for end users to scan the page and assess which
-hits look promising, dramatically improving their search experience.
-
-=head2 Adaptations to indexer.pl
-
-L<Lucy::Highlight::Highlighter> uses information generated at index
-time.  To save resources, highlighting is disabled by default and must be
-turned on for individual fields.
-
-    my $highlightable = Lucy::Plan::FullTextType->new(
-        analyzer      => $polyanalyzer,
-        highlightable => 1,
-    );
-    $schema->spec_field( name => 'content', type => $highlightable );
-
-=head2 Adaptations to search.cgi
-
-To add highlighting and excerpting to the search.cgi sample app, create a
-C<$highlighter> object outside the hits iterating loop...
-
-    my $highlighter = Lucy::Highlight::Highlighter->new(
-        searcher => $searcher,
-        query    => $q,
-        field    => 'content'
-    );
-
-... then modify the loop and the per-hit display to generate and include the
-excerpt.
-
-    # Create result list.
-    my $report = '';
-    while ( my $hit = $hits->next ) {
-        my $score   = sprintf( "%0.3f", $hit->get_score );
-        my $excerpt = $highlighter->create_excerpt($hit);
-        $report .= qq|
-            <p>
-              <a href="$hit->{url}"><strong>$hit->{title}</strong></a>
-              <em>$score</em>
-              <br />
-              $excerpt
-              <br />
-              <span class="excerptURL">$hit->{url}</span>
-            </p>
-        |;
-    }
-
-=head2 Next chapter: Query objects
-
-Our next tutorial chapter, L<Lucy::Docs::Tutorial::QueryObjects>,
-illustrates how to build an "advanced search" interface using
-L<Query|Lucy::Search::Query> objects instead of query strings.
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Tutorial/QueryObjects.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Tutorial/QueryObjects.pod b/perl/lib/Lucy/Docs/Tutorial/QueryObjects.pod
deleted file mode 100644
index 6ff812a..0000000
--- a/perl/lib/Lucy/Docs/Tutorial/QueryObjects.pod
+++ /dev/null
@@ -1,198 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Tutorial::QueryObjects - Use Query objects instead of query
-strings.
-
-=head1 DESCRIPTION
-
-Until now, our search app has had only a single search box.  In this tutorial
-chapter, we'll move towards an "advanced search" interface, by adding a
-"category" drop-down menu.  Three new classes will be required:
-
-=over
-
-=item *
-
-L<QueryParser|Lucy::Search::QueryParser> - Turn a query string into a
-L<Query|Lucy::Search::Query> object.
-
-=item *
-
-L<TermQuery|Lucy::Search::TermQuery> - Query for a specific term within
-a specific field.
-
-=item *
-
-L<ANDQuery|Lucy::Search::ANDQuery> - "AND" together multiple Query
-objects to produce an intersected result set.
-
-=back
-
-=head2 Adaptations to indexer.pl
-
-Our new "category" field will be a StringType field rather than a FullTextType
-field, because we will only be looking for exact matches.  It needs to be
-indexed, but since we won't display its value, it doesn't need to be stored.
-
-    my $cat_type = Lucy::Plan::StringType->new( stored => 0 );
-    $schema->spec_field( name => 'category', type => $cat_type );
-
-There will be three possible values: "article", "amendment", and "preamble",
-which we'll hack out of the source file's name during our C<parse_file>
-subroutine:
-
-    my $category
-        = $filename =~ /art/      ? 'article'
-        : $filename =~ /amend/    ? 'amendment'
-        : $filename =~ /preamble/ ? 'preamble'
-        :                           die "Can't derive category for $filename";
-    return {
-        title    => $title,
-        content  => $bodytext,
-        url      => "/us_constitution/$filename",
-        category => $category,
-    };
-
-=head2 Adaptations to search.cgi
-
-The "category" constraint will be added to our search interface using an HTML
-"select" element (this routine will need to be integrated into the HTML
-generation section of search.cgi):
-
-    # Build up the HTML "select" object for the "category" field.
-    sub generate_category_select {
-        my $cat = shift;
-        my $select = qq|
-          <select name="category">
-            <option value="">All Sections</option>
-            <option value="article">Articles</option>
-            <option value="amendment">Amendments</option>
-          </select>|;
-        if ($cat) {
-            $select =~ s/"$cat"/"$cat" selected/;
-        }
-        return $select;
-    }
-
-We'll start off by loading our new modules and extracting our new CGI
-parameter.
-
-    use Lucy::Search::QueryParser;
-    use Lucy::Search::TermQuery;
-    use Lucy::Search::ANDQuery;
-    
-    ... 
-    
-    my $category = decode( "UTF-8", $cgi->param('category') || '' );
-
-QueryParser's constructor requires a "schema" argument.  We can get that from
-our IndexSearcher:
-
-    # Create an IndexSearcher and a QueryParser.
-    my $searcher = Lucy::Search::IndexSearcher->new( 
-        index => $path_to_index, 
-    );
-    my $qparser  = Lucy::Search::QueryParser->new( 
-        schema => $searcher->get_schema,
-    );
-
-Previously, we have been handing raw query strings to IndexSearcher.  Behind
-the scenes, IndexSearcher has been using a QueryParser to turn those query
-strings into Query objects.  Now, we will bring QueryParser into the
-foreground and parse the strings explicitly.
-
-    my $query = $qparser->parse($q);
-
-If the user has specified a category, we'll use an ANDQuery to join our parsed
-query together with a TermQuery representing the category.
-
-    if ($category) {
-        my $category_query = Lucy::Search::TermQuery->new(
-            field => 'category', 
-            term  => $category,
-        );
-        $query = Lucy::Search::ANDQuery->new(
-            children => [ $query, $category_query ]
-        );
-    }
-
-Now when we execute the query...
-
-    # Execute the Query and get a Hits object.
-    my $hits = $searcher->hits(
-        query      => $query,
-        offset     => $offset,
-        num_wanted => $page_size,
-    );
-
-... we'll get a result set which is the intersection of the parsed query and
-the category query.
-
-=head1 Using TermQuery with full text fields
-
-When querying full text fields, the easiest way is to create query objects
-using QueryParser. But sometimes you want to create TermQuery for a single
-term in a FullTextType field directly. In this case, we have to run the
-search term through the field's analyzer to make sure it gets normalized in
-the same way as the field's content.
-
-    sub make_term_query {
-        my ($field, $term) = @_;
-
-        my $token;
-        my $type = $schema->fetch_type($field);
-
-        if ( $type->isa('Lucy::Plan::FullTextType') ) {
-            # Run the term through the full text analysis chain.
-            my $analyzer = $type->get_analyzer;
-            my $tokens   = $analyzer->split($term);
-
-            if ( @$tokens != 1 ) {
-                # If the term expands to more than one token, or no
-                # tokens at all, it will never match a token in the
-                # full text field.
-                return Lucy::Search::NoMatchQuery->new;
-            }
-
-            $token = $tokens->[0];
-        }
-        else {
-            # Exact match for other types.
-            $token = $term;
-        }
-
-        return Lucy::Search::TermQuery->new(
-            field => $field,
-            term  => $token,
-        );
-    }
-
-=head1 Congratulations!
-
-You've made it to the end of the tutorial.
-
-=head1 SEE ALSO
-
-For additional thematic documentation, see the Apache Lucy
-L<Cookbook|Lucy::Docs::Cookbook>.
-
-ANDQuery has a companion class, L<ORQuery|Lucy::Search::ORQuery>, and a
-close relative,
-L<RequiredOptionalQuery|Lucy::Search::RequiredOptionalQuery>.
-
-

http://git-wip-us.apache.org/repos/asf/lucy/blob/5618020f/perl/lib/Lucy/Docs/Tutorial/Simple.pod
----------------------------------------------------------------------
diff --git a/perl/lib/Lucy/Docs/Tutorial/Simple.pod b/perl/lib/Lucy/Docs/Tutorial/Simple.pod
deleted file mode 100644
index b40d7a1..0000000
--- a/perl/lib/Lucy/Docs/Tutorial/Simple.pod
+++ /dev/null
@@ -1,298 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-=head1 NAME
-
-Lucy::Docs::Tutorial::Simple - Bare-bones search app.
-
-=head2 Setup
-
-Copy the text presentation of the US Constitution from the C<sample> directory
-of the Apache Lucy distribution to the base level of your web server's
-C<htdocs> directory.
-
-    $ cp -R sample/us_constitution /usr/local/apache2/htdocs/
-
-=head2 Indexing: indexer.pl
-
-Our first task will be to create an application called C<indexer.pl> which
-builds a searchable "inverted index" from a collection of documents.  
-
-After we specify some configuration variables and load all necessary
-modules...
-
-    #!/usr/local/bin/perl
-    use strict;
-    use warnings;
-    
-    # (Change configuration variables as needed.)
-    my $path_to_index = '/path/to/index';
-    my $uscon_source  = '/usr/local/apache2/htdocs/us_constitution';
-
-    use Lucy::Simple;
-    use File::Spec::Functions qw( catfile );
-
-... we'll start by creating a Lucy::Simple object, telling it where we'd
-like the index to be located and the language of the source material.
-
-    my $lucy = Lucy::Simple->new(
-        path     => $path_to_index,
-        language => 'en',
-    );
-
-Next, we'll add a subroutine which parses our sample documents.
-
-    # Parse a file from our US Constitution collection and return a hashref with
-    # the fields title, body, and url.
-    sub parse_file {
-        my $filename = shift;
-        my $filepath = catfile( $uscon_source, $filename );
-        open( my $fh, '<', $filepath ) or die "Can't open '$filepath': $!";
-        my $text = do { local $/; <$fh> };    # slurp file content
-        $text =~ /\A(.+?)^\s+(.*)/ms
-            or die "Can't extract title/bodytext from '$filepath'";
-        my $title    = $1;
-        my $bodytext = $2;
-        return {
-            title    => $title,
-            content  => $bodytext,
-            url      => "/us_constitution/$filename",
-        };
-    }
-
-Add some elementary directory reading code...
-
-    # Collect names of source files.
-    opendir( my $dh, $uscon_source )
-        or die "Couldn't opendir '$uscon_source': $!";
-    my @filenames = grep { $_ =~ /\.txt/ } readdir $dh;
-
-... and now we're ready for the meat of indexer.pl -- which occupies exactly
-one line of code.
-
-    foreach my $filename (@filenames) {
-        my $doc = parse_file($filename);
-        $lucy->add_doc($doc);  # ta-da!
-    }
-
-=head2 Search: search.cgi
-
-As with our indexing app, the bulk of the code in our search script won't be
-Lucy-specific.  
-
-The beginning is dedicated to CGI processing and configuration.
-
-    #!/usr/local/bin/perl -T
-    use strict;
-    use warnings;
-    
-    # (Change configuration variables as needed.)
-    my $path_to_index = '/path/to/index';
-
-    use CGI;
-    use List::Util qw( max min );
-    use POSIX qw( ceil );
-    use Encode qw( decode );
-    use Lucy::Simple;
-    
-    my $cgi       = CGI->new;
-    my $q         = decode( "UTF-8", $cgi->param('q') || '' );
-    my $offset    = decode( "UTF-8", $cgi->param('offset') || 0 );
-    my $page_size = 10;
-
-Once that's out of the way, we create our Lucy::Simple object and feed
-it a query string.
-
-    my $lucy = Lucy::Simple->new(
-        path     => $path_to_index,
-        language => 'en',
-    );
-    my $hit_count = $lucy->search(
-        query      => $q,
-        offset     => $offset,
-        num_wanted => $page_size,
-    );
-
-The value returned by search() is the total number of documents in the
-collection which matched the query.  We'll show this hit count to the user,
-and also use it in conjunction with the parameters C<offset> and C<num_wanted>
-to break up results into "pages" of manageable size.
-
-Calling search() on our Simple object turns it into an iterator. Invoking
-next() now returns hits one at a time as L<Lucy::Document::HitDoc>
-objects, starting with the most relevant.
-
-    # Create result list.
-    my $report = '';
-    while ( my $hit = $lucy->next ) {
-        my $score = sprintf( "%0.3f", $hit->get_score );
-        $report .= qq|
-            <p>
-              <a href="$hit->{url}"><strong>$hit->{title}</strong></a>
-              <em>$score</em>
-              <br>
-              <span class="excerptURL">$hit->{url}</span>
-            </p>
-            |;
-    }
-
-The rest of the script is just text wrangling. 
-
-    #---------------------------------------------------------------#
-    # No tutorial material below this point - just html generation. #
-    #---------------------------------------------------------------#
-    
-    # Generate paging links and hit count, print and exit.
-    my $paging_links = generate_paging_info( $q, $hit_count );
-    blast_out_content( $q, $report, $paging_links );
-    
-    # Create html fragment with links for paging through results n-at-a-time.
-    sub generate_paging_info {
-        my ( $query_string, $total_hits ) = @_;
-        my $escaped_q = CGI::escapeHTML($query_string);
-        my $paging_info;
-        if ( !length $query_string ) {
-            # No query?  No display.
-            $paging_info = '';
-        }
-        elsif ( $total_hits == 0 ) {
-            # Alert the user that their search failed.
-            $paging_info
-                = qq|<p>No matches for <strong>$escaped_q</strong></p>|;
-        }
-        else {
-            # Calculate the nums for the first and last hit to display.
-            my $last_result = min( ( $offset + $page_size ), $total_hits );
-            my $first_result = min( ( $offset + 1 ), $last_result );
-
-            # Display the result nums, start paging info.
-            $paging_info = qq|
-                <p>
-                    Results <strong>$first_result-$last_result</strong> 
-                    of <strong>$total_hits</strong> 
-                    for <strong>$escaped_q</strong>.
-                </p>
-                <p>
-                    Results Page:
-                |;
-
-            # Calculate first and last hits pages to display / link to.
-            my $current_page = int( $first_result / $page_size ) + 1;
-            my $last_page    = ceil( $total_hits / $page_size );
-            my $first_page   = max( 1, ( $current_page - 9 ) );
-            $last_page = min( $last_page, ( $current_page + 10 ) );
-
-            # Create a url for use in paging links.
-            my $href = $cgi->url( -relative => 1 );
-            $href .= "?q=" . CGI::escape($query_string);
-            $href .= ";offset=" . CGI::escape($offset);
-
-            # Generate the "Prev" link.
-            if ( $current_page > 1 ) {
-                my $new_offset = ( $current_page - 2 ) * $page_size;
-                $href =~ s/(?<=offset=)\d+/$new_offset/;
-                $paging_info .= qq|<a href="$href">&lt;= Prev</a>\n|;
-            }
-
-            # Generate paging links.
-            for my $page_num ( $first_page .. $last_page ) {
-                if ( $page_num == $current_page ) {
-                    $paging_info .= qq|$page_num \n|;
-                }
-                else {
-                    my $new_offset = ( $page_num - 1 ) * $page_size;
-                    $href =~ s/(?<=offset=)\d+/$new_offset/;
-                    $paging_info .= qq|<a href="$href">$page_num</a>\n|;
-                }
-            }
-
-            # Generate the "Next" link.
-            if ( $current_page != $last_page ) {
-                my $new_offset = $current_page * $page_size;
-                $href =~ s/(?<=offset=)\d+/$new_offset/;
-                $paging_info .= qq|<a href="$href">Next =&gt;</a>\n|;
-            }
-
-            # Close tag.
-            $paging_info .= "</p>\n";
-        }
-
-        return $paging_info;
-    }
-
-    # Print content to output.
-    sub blast_out_content {
-        my ( $query_string, $hit_list, $paging_info ) = @_;
-        my $escaped_q = CGI::escapeHTML($query_string);
-        binmode( STDOUT, ":encoding(UTF-8)" );
-        print qq|Content-type: text/html; charset=UTF-8\n\n|;
-        print qq|
-    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
-        "http://www.w3.org/TR/html4/loose.dtd">
-    <html>
-    <head>
-      <meta http-equiv="Content-type" 
-        content="text/html;charset=UTF-8">
-      <link rel="stylesheet" type="text/css" 
-        href="/us_constitution/uscon.css">
-      <title>Lucy: $escaped_q</title>
-    </head>
-    
-    <body>
-    
-      <div id="navigation">
-        <form id="usconSearch" action="">
-          <strong>
-            Search the 
-            <a href="/us_constitution/index.html">US Constitution</a>:
-          </strong>
-          <input type="text" name="q" id="q" value="$escaped_q">
-          <input type="submit" value="=&gt;">
-        </form>
-      </div><!--navigation-->
-    
-      <div id="bodytext">
-    
-      $hit_list
-    
-      $paging_info
-    
-        <p style="font-size: smaller; color: #666">
-          <em>
-            Powered by <a href="http://lucy.apache.org/"
-            >Apache Lucy<small><sup>TM</sup></small></a>
-          </em>
-        </p>
-      </div><!--bodytext-->
-    
-    </body>
-    
-    </html>
-    |;
-    }
-
-=head2 OK... now what?
-
-Lucy::Simple is perfectly adequate for some tasks, but it's not very flexible.
-Many people find that it doesn't do at least one or two things they can't live
-without.
-
-In our next tutorial chapter,
-L<BeyondSimple|Lucy::Docs::Tutorial::BeyondSimple>, we'll rewrite our
-indexing and search scripts using the classes that Lucy::Simple hides
-from view, opening up the possibilities for expansion; then, we'll spend the
-rest of the tutorial chapters exploring these possibilities.
-
-=cut