You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by gs...@apache.org on 2006/11/27 01:00:49 UTC
svn commit: r479465 [1/4] - in /lucene/java/trunk: docs/ docs/images/
docs/lucene-sandbox/ docs/styles/ src/site/ src/site/src/
src/site/src/documentation/ src/site/src/documentation/classes/
src/site/src/documentation/conf/ src/site/src/documentation/...
Author: gsingers
Date: Sun Nov 26 16:00:46 2006
New Revision: 479465
URL: http://svn.apache.org/viewvc?view=rev&rev=479465
Log:
Updated the website to new Forrest based site, see Issue 707, part one of commits
Added:
lucene/java/trunk/src/site/ (with props)
lucene/java/trunk/src/site/forrest.properties (with props)
lucene/java/trunk/src/site/src/
lucene/java/trunk/src/site/src/documentation/
lucene/java/trunk/src/site/src/documentation/classes/
lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties (with props)
lucene/java/trunk/src/site/src/documentation/conf/
lucene/java/trunk/src/site/src/documentation/conf/cli.xconf
lucene/java/trunk/src/site/src/documentation/content/
lucene/java/trunk/src/site/src/documentation/content/.htaccess
lucene/java/trunk/src/site/src/documentation/content/xdocs/
lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/demo4.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/features.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/fileformats.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/gettingstarted.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/asf-logo.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/favicon.ico (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_architecture.jpg (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/larm_crawling-process.jpg (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lia_3d.jpg (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_100.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_150.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_200.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_250.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_green_300.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_100.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_150.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_200.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_250.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/images/lucene_outline_300.gif (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/index.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/lucene-sandbox/
lucene/java/trunk/src/site/src/documentation/content/xdocs/lucene-sandbox/index.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/mailinglists.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/queryparsersyntax.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/releases.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/resources.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/scoring.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/site.xml (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/systemproperties.xml
lucene/java/trunk/src/site/src/documentation/content/xdocs/tabs.xml (with props)
lucene/java/trunk/src/site/src/documentation/content/xdocs/whoweare.xml
lucene/java/trunk/src/site/src/documentation/sitemap.xmap (with props)
lucene/java/trunk/src/site/src/documentation/skinconf.xml (with props)
Removed:
lucene/java/trunk/docs/benchmarks.html
lucene/java/trunk/docs/benchmarktemplate.xml
lucene/java/trunk/docs/contributions.html
lucene/java/trunk/docs/demo.html
lucene/java/trunk/docs/demo2.html
lucene/java/trunk/docs/demo3.html
lucene/java/trunk/docs/demo4.html
lucene/java/trunk/docs/features.html
lucene/java/trunk/docs/fileformats.html
lucene/java/trunk/docs/gettingstarted.html
lucene/java/trunk/docs/images/
lucene/java/trunk/docs/index.html
lucene/java/trunk/docs/lucene-sandbox/
lucene/java/trunk/docs/mailinglists.html
lucene/java/trunk/docs/queryparsersyntax.html
lucene/java/trunk/docs/resources.html
lucene/java/trunk/docs/scoring.html
lucene/java/trunk/docs/styles/
lucene/java/trunk/docs/systemproperties.html
lucene/java/trunk/docs/whoweare.html
lucene/java/trunk/xdocs/
Propchange: lucene/java/trunk/src/site/
------------------------------------------------------------------------------
--- svn:ignore (added)
+++ svn:ignore Sun Nov 26 16:00:46 2006
@@ -0,0 +1 @@
+build
Added: lucene/java/trunk/src/site/forrest.properties
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/forrest.properties?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/forrest.properties (added)
+++ lucene/java/trunk/src/site/forrest.properties Sun Nov 26 16:00:46 2006
@@ -0,0 +1,130 @@
+# Copyright 2002-2005 The Apache Software Foundation or its licensors,
+# as applicable.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+##############
+# Properties used by forrest.build.xml for building the website
+# These are the defaults, un-comment them only if you need to change them.
+##############
+
+# Prints out a summary of Forrest settings for this project
+#forrest.echo=true
+
+# Project name (used to name .war file)
+#project.name=my-project
+
+# Specifies name of Forrest skin to use
+# See list at http://forrest.apache.org/docs/skins.html
+#project.skin=pelt
+
+# Descriptors for plugins and skins
+# comma separated list, file:// is supported
+#forrest.skins.descriptors=http://forrest.apache.org/skins/skins.xml,file:///c:/myskins/skins.xml
+#forrest.plugins.descriptors=http://forrest.apache.org/plugins/plugins.xml,http://forrest.apache.org/plugins/whiteboard-plugins.xml
+
+##############
+# behavioural properties
+#project.menu-scheme=tab_attributes
+#project.menu-scheme=directories
+
+##############
+# layout properties
+
+# Properties that can be set to override the default locations
+#
+# Parent properties must be set. This usually means uncommenting
+# project.content-dir if any other property using it is uncommented
+
+#project.status=status.xml
+#project.content-dir=src/documentation
+#project.raw-content-dir=${project.content-dir}/content
+#project.conf-dir=${project.content-dir}/conf
+#project.sitemap-dir=${project.content-dir}
+#project.xdocs-dir=${project.content-dir}/content/xdocs
+#project.resources-dir=${project.content-dir}/resources
+#project.stylesheets-dir=${project.resources-dir}/stylesheets
+#project.images-dir=${project.resources-dir}/images
+#project.schema-dir=${project.resources-dir}/schema
+#project.skins-dir=${project.content-dir}/skins
+#project.skinconf=${project.content-dir}/skinconf.xml
+#project.lib-dir=${project.content-dir}/lib
+#project.classes-dir=${project.content-dir}/classes
+#project.translations-dir=${project.content-dir}/translations
+project.configfile=${project.home}/src/documentation/conf/cli.xconf
+
+##############
+# validation properties
+
+# This set of properties determine if validation is performed
+# Values are inherited unless overridden.
+# e.g. if forrest.validate=false then all others are false unless set to true.
+#forrest.validate=true
+#forrest.validate.xdocs=${forrest.validate}
+#forrest.validate.skinconf=${forrest.validate}
+#forrest.validate.sitemap=${forrest.validate}
+#forrest.validate.stylesheets=${forrest.validate}
+#forrest.validate.skins=${forrest.validate}
+#forrest.validate.skins.stylesheets=${forrest.validate.skins}
+
+# *.failonerror=(true|false) - stop when an XML file is invalid
+#forrest.validate.failonerror=true
+
+# *.excludes=(pattern) - comma-separated list of path patterns to not validate
+# e.g.
+#forrest.validate.xdocs.excludes=samples/subdir/**, samples/faq.xml
+#forrest.validate.xdocs.excludes=
+
+
+##############
+# General Forrest properties
+
+# The URL to start crawling from
+#project.start-uri=linkmap.html
+
+# Set logging level for messages printed to the console
+# (DEBUG, INFO, WARN, ERROR, FATAL_ERROR)
+#project.debuglevel=ERROR
+
+# Max memory to allocate to Java
+#forrest.maxmemory=64m
+
+# Any other arguments to pass to the JVM. For example, to run on an X-less
+# server, set to -Djava.awt.headless=true
+#forrest.jvmargs=
+
+# The bugtracking URL - the issue number will be appended
+#project.bugtracking-url=http://issues.apache.org/bugzilla/show_bug.cgi?id=
+#project.bugtracking-url=http://issues.apache.org/jira/browse/
+
+# The issues list as rss
+#project.issues-rss-url=
+
+#I18n Property. Based on the locale request for the browser.
+#If you want to use it for static site then modify the JVM system.language
+# and run once per language
+#project.i18n=true
+
+# The names of plugins that are required to build the project
+# comma separated list (no spaces)
+# You can request a specific version by appending "-VERSION" to the end of
+# the plugin name. If you exclude a version number the latest released version
+# will be used, however, be aware that this may be a development version. In
+# a production environment it is recomended that you specify a known working
+# version.
+# Run "forrest available-plugins" for a list of plug-ins currently available
+project.required.plugins=org.apache.forrest.plugin.output.pdf
+
+# Proxy configuration
+# proxy.host=
+# proxy.port=
Propchange: lucene/java/trunk/src/site/forrest.properties
------------------------------------------------------------------------------
svn:executable = *
Added: lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties (added)
+++ lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties Sun Nov 26 16:00:46 2006
@@ -0,0 +1,57 @@
+# Copyright 2002-2005 The Apache Software Foundation or its licensors,
+# as applicable.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#=======================================================================
+# CatalogManager.properties for Catalog Entity Resolver.
+#
+# This is the default properties file for your project.
+# This facilitates local configuration of application-specific catalogs.
+# If you have defined any local catalogs, then they will be loaded
+# before Forrest's core catalogs.
+#
+# See the Apache Forrest documentation:
+# http://forrest.apache.org/docs/your-project.html
+# http://forrest.apache.org/docs/validation.html
+
+# verbosity:
+# The level of messages for status/debug (messages go to standard output).
+# The setting here is for your own local catalogs.
+# The verbosity of Forrest's core catalogs is controlled via
+# main/webapp/WEB-INF/cocoon.xconf
+#
+# The following messages are provided ...
+# 0 = none
+# 1 = ? (... not sure yet)
+# 2 = 1+, Loading catalog, Resolved public, Resolved system
+# 3 = 2+, Catalog does not exist, resolvePublic, resolveSystem
+# 10 = 3+, List all catalog entries when loading a catalog
+# (Cocoon also logs the "Resolved public" messages.)
+verbosity=1
+
+# catalogs ... list of additional catalogs to load
+# (Note that Apache Forrest will automatically load its own default catalog
+# from main/webapp/resources/schema/catalog.xcat)
+# Use either full pathnames or relative pathnames.
+# pathname separator is always semi-colon (;) regardless of operating system
+# directory separator is always slash (/) regardless of operating system
+catalogs=../resources/schema/catalog.xcat
+
+# relative-catalogs
+# If false, relative catalog URIs are made absolute with respect to the
+# base URI of the CatalogManager.properties file. This setting only
+# applies to catalog URIs obtained from the catalogs property in the
+# CatalogManager.properties file
+# Example: relative-catalogs=[yes|no]
+relative-catalogs=no
Propchange: lucene/java/trunk/src/site/src/documentation/classes/CatalogManager.properties
------------------------------------------------------------------------------
svn:executable = *
Added: lucene/java/trunk/src/site/src/documentation/conf/cli.xconf
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/conf/cli.xconf?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/conf/cli.xconf (added)
+++ lucene/java/trunk/src/site/src/documentation/conf/cli.xconf Sun Nov 26 16:00:46 2006
@@ -0,0 +1,321 @@
+<?xml version="1.0"?>
+<!--
+ Copyright 2002-2004 The Apache Software Foundation or its licensors,
+ as applicable.
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<!--+
+ | This is the Apache Cocoon command line configuration file.
+ | Here you give the command line interface details of where
+ | to find various aspects of your Cocoon installation.
+ |
+ | If you wish, you can also use this file to specify the URIs
+ | that you wish to generate.
+ |
+ | The current configuration information in this file is for
+ | building the Cocoon documentation. Therefore, all links here
+ | are relative to the build context dir, which, in the build.xml
+ | file, is set to ${build.context}
+ |
+ | Options:
+ | verbose: increase amount of information presented
+ | to standard output (default: false)
+ | follow-links: whether linked pages should also be
+ | generated (default: true)
+ | precompile-only: precompile sitemaps and XSP pages, but
+ | do not generate any pages (default: false)
+ | confirm-extensions: check the mime type for the generated page
+ | and adjust filename and links extensions
+ | to match the mime type
+ | (e.g. text/html->.html)
+ |
+ | Note: Whilst using an xconf file to configure the Cocoon
+ | Command Line gives access to more features, the use of
+ | command line parameters is more stable, as there are
+ | currently plans to improve the xconf format to allow
+ | greater flexibility. If you require a stable and
+ | consistent method for accessing the CLI, it is recommended
+ | that you use the command line parameters to configure
+ | the CLI. See documentation at:
+ | http://cocoon.apache.org/2.1/userdocs/offline/
+ | http://wiki.apache.org/cocoon/CommandLine
+ |
+ +-->
+
+<cocoon verbose="true"
+ follow-links="true"
+ precompile-only="false"
+ confirm-extensions="false">
+
+ <!--+
+ | The context directory is usually the webapp directory
+ | containing the sitemap.xmap file.
+ |
+ | The config file is the cocoon.xconf file.
+ |
+ | The work directory is used by Cocoon to store temporary
+ | files and cache files.
+ |
+ | The destination directory is where generated pages will
+ | be written (assuming the 'simple' mapper is used, see
+ | below)
+ +-->
+ <context-dir>.</context-dir>
+ <config-file>WEB-INF/cocoon.xconf</config-file>
+ <work-dir>../tmp/cocoon-work</work-dir>
+ <dest-dir>../site</dest-dir>
+
+ <!--+
+ | A checksum file can be used to store checksums for pages
+ | as they are generated. When the site is next generated,
+ | files will not be written if their checksum has not changed.
+ | This means that it will be easier to detect which files
+ | need to be uploaded to a server, using the timestamp.
+ +-->
+ <!-- <checksums-uri>build/work/checksums</checksums-uri>-->
+
+ <!--+
+ | Broken link reporting options:
+ | Report into a text file, one link per line:
+ | <broken-links type="text" report="filename"/>
+ | Report into an XML file:
+ | <broken-links type="xml" report="filename"/>
+ | Ignore broken links (default):
+ | <broken-links type="none"/>
+ |
+ | Two attributes to this node specify whether a page should
+ | be generated when an error has occured. 'generate' specifies
+ | whether a page should be generated (default: true) and
+ | extension specifies an extension that should be appended
+ | to the generated page's filename (default: none)
+ |
+ | Using this, a quick scan through the destination directory
+ | will show broken links, by their filename extension.
+ +-->
+ <broken-links type="xml"
+ file="../brokenlinks.xml"
+ generate="false"
+ extension=".error"
+ show-referrers="true"/>
+
+ <!--+
+ | Load classes at startup. This is necessary for generating
+ | from sites that use SQL databases and JDBC.
+ | The <load-class> element can be repeated if multiple classes
+ | are needed.
+ +-->
+ <!--
+ <load-class>org.firebirdsql.jdbc.Driver</load-class>
+ -->
+
+ <!--+
+ | Configures logging.
+ | The 'log-kit' parameter specifies the location of the log kit
+ | configuration file (usually called logkit.xconf.
+ |
+ | Logger specifies the logging category (for all logging prior
+ | to other Cocoon logging categories taking over)
+ |
+ | Available log levels are:
+ | DEBUG: prints all level of log messages.
+ | INFO: prints all level of log messages except DEBUG
+ | ones.
+ | WARN: prints all level of log messages except DEBUG
+ | and INFO ones.
+ | ERROR: prints all level of log messages except DEBUG,
+ | INFO and WARN ones.
+ | FATAL_ERROR: prints only log messages of this level
+ +-->
+ <!-- <logging log-kit="WEB-INF/logkit.xconf" logger="cli" level="ERROR" /> -->
+
+ <!--+
+ | Specifies the filename to be appended to URIs that
+ | refer to a directory (i.e. end with a forward slash).
+ +-->
+ <default-filename>index.html</default-filename>
+
+ <!--+
+ | Specifies a user agent string to the sitemap when
+ | generating the site.
+ |
+ | A generic term for a web browser is "user agent". Any
+ | user agent, when connecting to a web server, will provide
+ | a string to identify itself (e.g. as Internet Explorer or
+ | Mozilla). It is possible to have Cocoon serve different
+ | content depending upon the user agent string provided by
+ | the browser. If your site does this, then you may want to
+ | use this <user-agent> entry to provide a 'fake' user agent
+ | to Cocoon, so that it generates the correct version of your
+ | site.
+ |
+ | For most sites, this can be ignored.
+ +-->
+ <!--
+ <user-agent>Cocoon Command Line Environment 2.1</user-agent>
+ -->
+
+ <!--+
+ | Specifies an accept string to the sitemap when generating
+ | the site.
+ | User agents can specify to an HTTP server what types of content
+ | (by mime-type) they are able to receive. E.g. a browser may be
+ | able to handle jpegs, but not pngs. The HTTP accept header
+ | allows the server to take the browser's capabilities into account,
+ | and only send back content that it can handle.
+ |
+ | For most sites, this can be ignored.
+ +-->
+
+ <accept>*/*</accept>
+
+ <!--+
+ | Specifies which URIs should be included or excluded, according
+ | to wildcard patterns.
+ |
+ | These includes/excludes are only relevant when you are following
+ | links. A link URI must match an include pattern (if one is given)
+ | and not match an exclude pattern, if it is to be followed by
+ | Cocoon. It can be useful, for example, where there are links in
+ | your site to pages that are not generated by Cocoon, such as
+ | references to api-documentation.
+ |
+ | By default, all URIs are included. If both include and exclude
+ | patterns are specified, a URI is first checked against the
+ | include patterns, and then against the exclude patterns.
+ |
+ | Multiple patterns can be given, using muliple include or exclude
+ | nodes.
+ |
+ | The order of the elements is not significant, as only the first
+ | successful match of each category is used.
+ |
+ | Currently, only the complete source URI can be matched (including
+ | any URI prefix). Future plans include destination URI matching
+ | and regexp matching. If you have requirements for these, contact
+ | dev@cocoon.apache.org.
+ +-->
+
+ <exclude pattern="**/"/>
+ <exclude pattern="**apidocs**"/>
+ <exclude pattern="api/**"/>
+ <exclude pattern="**benchmarktemplate.xml"/>
+
+<!--
+ This is a workaround for FOR-284 "link rewriting broken when
+ linking to xml source views which contain site: links".
+ See the explanation there and in declare-broken-site-links.xsl
+-->
+ <exclude pattern="site:**"/>
+ <exclude pattern="ext:**"/>
+ <exclude pattern="**/site:**"/>
+ <exclude pattern="**/ext:**"/>
+
+ <!-- Exclude tokens used in URLs to ASF mirrors (interpreted by a CGI) -->
+ <exclude pattern="[preferred]/**"/>
+ <exclude pattern="[location]"/>
+
+ <!-- <include-links extension=".html"/>-->
+
+ <!--+
+ | <uri> nodes specify the URIs that should be generated, and
+ | where required, what should be done with the generated pages.
+ | They describe the way the URI of the generated file is created
+ | from the source page's URI. There are three ways that a generated
+ | file URI can be created: append, replace and insert.
+ |
+ | The "type" attribute specifies one of (append|replace|insert):
+ |
+ | append:
+ | Append the generated page's URI to the end of the source URI:
+ |
+ | <uri type="append" src-prefix="documents/" src="index.html"
+ | dest="build/dest/"/>
+ |
+ | This means that
+ | (1) the "documents/index.html" page is generated
+ | (2) the file will be written to "build/dest/documents/index.html"
+ |
+ | replace:
+ | Completely ignore the generated page's URI - just
+ | use the destination URI:
+ |
+ | <uri type="replace" src-prefix="documents/" src="index.html"
+ | dest="build/dest/docs.html"/>
+ |
+ | This means that
+ | (1) the "documents/index.html" page is generated
+ | (2) the result is written to "build/dest/docs.html"
+ | (3) this works only for "single" pages - and not when links
+ | are followed
+ |
+ | insert:
+ | Insert generated page's URI into the destination
+ | URI at the point marked with a * (example uses fictional
+ | zip protocol)
+ |
+ | <uri type="insert" src-prefix="documents/" src="index.html"
+ | dest="zip://*.zip/page.html"/>
+ |
+ | This means that
+ | (1)
+ |
+ | In any of these scenarios, if the dest attribute is omitted,
+ | the value provided globally using the <dest-dir> node will
+ | be used instead.
+ +-->
+ <!--
+ <uri type="replace"
+ src-prefix="samples/"
+ src="hello-world/hello.html"
+ dest="build/dest/hello-world.html"/>
+ -->
+
+ <!--+
+ | <uri> nodes can be grouped together in a <uris> node. This
+ | enables a group of URIs to share properties. The following
+ | properties can be set for a group of URIs:
+ | * follow-links: should pages be crawled for links
+ | * confirm-extensions: should file extensions be checked
+ | for the correct mime type
+ | * src-prefix: all source URIs should be
+ | pre-pended with this prefix before
+ | generation. The prefix is not
+ | included when calculating the
+ | destination URI
+ | * dest: the base destination URI to be
+ | shared by all pages in this group
+ | * type: the method to be used to calculate
+ | the destination URI. See above
+ | section on <uri> node for details.
+ |
+ | Each <uris> node can have a name attribute. When a name
+ | attribute has been specified, the -n switch on the command
+ | line can be used to tell Cocoon to only process the URIs
+ | within this URI group. When no -n switch is given, all
+ | <uris> nodes are processed. Thus, one xconf file can be
+ | used to manage multiple sites.
+ +-->
+ <!--
+ <uris name="mirrors" follow-links="false">
+ <uri type="append" src="mirrors.html"/>
+ </uris>
+ -->
+
+ <!--+
+ | File containing URIs (plain text, one per line).
+ +-->
+ <!--
+ <uri-file>uris.txt</uri-file>
+ -->
+</cocoon>
Added: lucene/java/trunk/src/site/src/documentation/content/.htaccess
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/.htaccess?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/.htaccess (added)
+++ lucene/java/trunk/src/site/src/documentation/content/.htaccess Sun Nov 26 16:00:46 2006
@@ -0,0 +1,3 @@
+#Forrest generates UTF-8 by default, but these httpd servers are
+#ignoring the meta http-equiv charset tags
+AddDefaultCharset off
Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/benchmarks.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,525 @@
+<?xml version="1.0"?>
+<document>
+ <header>
+ <title>Apache Lucene - Resources - Performance Benchmarks</title>
+ </header>
+ <properties>
+ <author email="kelvint@apache.org">Kelvin Tan</author>
+
+ </properties>
+ <body>
+
+ <section id="Performance Benchmarks"><title>Performance Benchmarks</title>
+ <p>
+ The purpose of these user-submitted performance figures is to
+ give current and potential users of Lucene a sense
+ of how well Lucene scales. If the requirements for an upcoming
+ project is similar to an existing benchmark, you
+ will also have something to work with when designing the system
+ architecture for the application.
+ </p>
+ <p>
+ If you've conducted performance tests with Lucene, we'd
+ appreciate if you can submit these figures for display
+ on this page. Post these figures to the lucene-user mailing list
+ using this
+ <a href="benchmarktemplate.xml">template</a>.
+ </p>
+ </section>
+
+ <section id="Benchmark Variables"><title>Benchmark Variables</title>
+ <p>
+ <ul>
+ <p>
+ <b>Hardware Environment</b><br/>
+ <li><i>Dedicated machine for indexing</i>: Self-explanatory
+ (yes/no)</li>
+ <li><i>CPU</i>: Self-explanatory (Type, Speed and Quantity)</li>
+ <li><i>RAM</i>: Self-explanatory</li>
+ <li><i>Drive configuration</i>: Self-explanatory (IDE, SCSI,
+ RAID-1, RAID-5)</li>
+ </p>
+ <p>
+ <b>Software environment</b><br/>
+ <li><i>Lucene Version</i>: Self-explanatory</li>
+ <li><i>Java Version</i>: Version of Java SDK/JRE that is run
+ </li>
+ <li><i>Java VM</i>: Server/client VM, Sun VM/JRockIt</li>
+ <li><i>OS Version</i>: Self-explanatory</li>
+ <li><i>Location of index</i>: Is the index stored in filesystem
+ or database? Is it on the same server(local) or
+ over the network?</li>
+ </p>
+ <p>
+ <b>Lucene indexing variables</b><br/>
+ <li><i>Number of source documents</i>: Number of documents being
+ indexed</li>
+ <li><i>Total filesize of source documents</i>:
+ Self-explanatory</li>
+ <li><i>Average filesize of source documents</i>:
+ Self-explanatory</li>
+ <li><i>Source documents storage location</i>: Where are the
+ documents being indexed located?
+ Filesystem, DB, http, etc.</li>
+ <li><i>File type of source documents</i>: Types of files being
+ indexed, e.g. HTML files, XML files, PDF files, etc.</li>
+ <li><i>Parser(s) used, if any</i>: Parsers used for parsing the
+ various files for indexing,
+ e.g. XML parser, HTML parser, etc.</li>
+ <li><i>Analyzer(s) used</i>: Type of Lucene analyzer used</li>
+ <li><i>Number of fields per document</i>: Number of Fields each
+ Document contains</li>
+ <li><i>Type of fields</i>: Type of each field</li>
+ <li><i>Index persistence</i>: Where the index is stored, e.g.
+ FSDirectory, SqlDirectory, etc.</li>
+ </p>
+ <p>
+ <b>Figures</b><br/>
+ <li><i>Time taken (in ms/s as an average of at least 3 indexing
+ runs)</i>: Time taken to index all files</li>
+ <li><i>Time taken / 1000 docs indexed</i>: Time taken to index
+ 1000 files</li>
+ <li><i>Memory consumption</i>: Self-explanatory</li>
+ <li><i>Query speed</i>: average time a query takes, type
+ of queries (e.g. simple one-term query, phrase query),
+ not measuring any overhead outside Lucene</li>
+ </p>
+ <p>
+ <b>Notes</b><br/>
+ <li><i>Notes</i>: Any comments which don't belong in the above,
+ special tuning/strategies, etc.</li>
+ </p>
+ </ul>
+ </p>
+ </section>
+
+ <section id="User-submitted Benchmarks"><title>User-submitted Benchmarks</title>
+ <p>
+ These benchmarks have been kindly submitted by Lucene users for
+ reference purposes.
+ </p>
+ <p><b>We make NO guarantees regarding their accuracy or
+ validity.</b>
+ </p>
+ <p>We strongly recommend you conduct your own
+ performance benchmarks before deciding on a particular
+ hardware/software setup (and hopefully submit
+ these figures to us).
+ </p>
+
+ <section id="Hamish Carpenter's benchmarks"><title>Hamish Carpenter's benchmarks</title>
+ <ul>
+ <p>
+ <b>Hardware Environment</b><br/>
+ <li><i>Dedicated machine for indexing</i>: yes</li>
+ <li><i>CPU</i>: Intel x86 P4 1.5Ghz</li>
+ <li><i>RAM</i>: 512 DDR</li>
+ <li><i>Drive configuration</i>: IDE 7200rpm Raid-1</li>
+ </p>
+ <p>
+ <b>Software environment</b><br/>
+ <li><i>Lucene Version</i>: 1.3</li>
+ <li><i>Java Version</i>: 1.3.1 IBM JITC Enabled</li>
+ <li><i>Java VM</i>: </li>
+ <li><i>OS Version</i>: Debian Linux 2.4.18-686</li>
+ <li><i>Location of index</i>: local</li>
+ </p>
+ <p>
+ <b>Lucene indexing variables</b><br/>
+ <li><i>Number of source documents</i>: Random generator. Set
+ to make 1M documents
+ in 2x500,000 batches.</li>
+ <li><i>Total filesize of source documents</i>: > 1GB if
+ stored</li>
+ <li><i>Average filesize of source documents</i>: 1KB</li>
+ <li><i>Source documents storage location</i>: Filesystem</li>
+ <li><i>File type of source documents</i>: Generated</li>
+ <li><i>Parser(s) used, if any</i>: </li>
+ <li><i>Analyzer(s) used</i>: Default</li>
+ <li><i>Number of fields per document</i>: 11</li>
+ <li><i>Type of fields</i>: 1 date, 1 id, 9 text</li>
+ <li><i>Index persistence</i>: FSDirectory</li>
+ </p>
+ <p>
+ <b>Figures</b><br/>
+ <li><i>Time taken (in ms/s as an average of at least 3
+ indexing runs)</i>: </li>
+ <li><i>Time taken / 1000 docs indexed</i>: 49 seconds</li>
+ <li><i>Memory consumption</i>:</li>
+ </p>
+ <p>
+ <b>Notes</b><br/>
+ <p>
+ A windows client ran a random document generator which
+ created
+ documents based on some arrays of values and an excerpt
+ (approx 1kb)
+ from a text file of the bible (King James version).<br/>
+ These were submitted via a socket connection (open throughout
+ indexing process).<br/>
+ The index writer was not closed between index calls.<br/>
+ This created a 400Mb index in 23 files (after
+ optimization).<br/>
+ </p>
+ <p>
+ <u>Query details</u>:<br/>
+ </p>
+ <p>
+ Set up a threaded class to start x number of simultaneous
+ threads to
+ search the above created index.
+ </p>
+ <p>
+ Query: +Domain:sos +(+((Name:goo*^2.0 Name:plan*^2.0)
+ (Teaser:goo* Tea
+ ser:plan*) (Details:goo* Details:plan*)) -Cancel:y)
+ +DisplayStartDate:[mkwsw2jk0
+ -mq3dj1uq0] +EndDate:[mq3dj1uq0-ntlxuggw0]
+ </p>
+ <p>
+ This query counted 34000 documents and I limited the returned
+ documents
+ to 5.
+ </p>
+ <p>
+ This is using Peter Halacsy's IndexSearcherCache slightly
+ modified to
+ be a singleton returned cached searchers for a given
+ directory. This
+ solved an initial problem with too many files open and
+ running out of
+ linux handles for them.
+ </p>
+ <pre>
+ Threads|Avg Time per query (ms)
+ 1 1009ms
+ 2 2043ms
+ 3 3087ms
+ 4 4045ms
+ .. .
+ .. .
+ 10 10091ms
+ </pre>
+ <p>
+ I removed the two date range terms from the query and it made
+ a HUGE
+ difference in performance. With 4 threads the avg time
+ dropped to 900ms!
+ </p>
+ <p>Other query optimizations made little difference.</p>
+ </p>
+ </ul>
+ <p>
+ Hamish can be contacted at hamish at catalyst.net.nz.
+ </p>
+ </section>
+
+ <section id="Justin Greene's benchmarks"><title>Justin Greene's benchmarks</title>
+ <ul>
+ <p>
+ <b>Hardware Environment</b><br/>
+ <li><i>Dedicated machine for indexing</i>: No, but nominal
+ usage at time of indexing.</li>
+ <li><i>CPU</i>: Compaq Proliant 1850R/600 2 X pIII 600</li>
+ <li><i>RAM</i>: 1GB, 256MB allocated to JVM.</li>
+ <li><i>Drive configuration</i>: RAID 5 on Fibre Channel
+ Array</li>
+ </p>
+ <p>
+ <b>Software environment</b><br/>
+ <li><i>Java Version</i>: 1.3.1_06</li>
+ <li><i>Java VM</i>: </li>
+ <li><i>OS Version</i>: Winnt 4/Sp6</li>
+ <li><i>Location of index</i>: local</li>
+ </p>
+ <p>
+ <b>Lucene indexing variables</b><br/>
+ <li><i>Number of source documents</i>: about 60K</li>
+ <li><i>Total filesize of source documents</i>: 6.5GB</li>
+ <li><i>Average filesize of source documents</i>: 100K
+ (6.5GB/60K documents)</li>
+ <li><i>Source documents storage location</i>: filesystem on
+ NTFS</li>
+ <li><i>File type of source documents</i>: </li>
+ <li><i>Parser(s) used, if any</i>: Currently the only parser
+ used is the Quiotix html
+ parser.</li>
+ <li><i>Analyzer(s) used</i>: SimpleAnalyzer</li>
+ <li><i>Number of fields per document</i>: 8</li>
+ <li><i>Type of fields</i>: All strings, and all are stored
+ and indexed.</li>
+ <li><i>Index persistence</i>: FSDirectory</li>
+ </p>
+ <p>
+ <b>Figures</b><br/>
+ <li><i>Time taken (in ms/s as an average of at least 3
+ indexing runs)</i>: 1 hour 12 minutes, 1 hour 14 minutes and 1 hour 17
+ minutes. Note that the #
+ and size of documents changes daily.</li>
+ <li><i>Time taken / 1000 docs indexed</i>: </li>
+ <li><i>Memory consumption</i>: JVM is given 256MB and uses it
+ all.</li>
+ </p>
+ <p>
+ <b>Notes</b><br/>
+ <p>
+ We have 10 threads reading files from the filesystem and
+ parsing and
+ analyzing them and the pushing them onto a queue and a single
+ thread poping
+ them from the queue and indexing. Note that we are indexing
+ email messages
+ and are storing the entire plaintext in of the message in the
+ index. If the
+ message contains attachment and we do not have a filter for
+ the attachment
+ (ie. we do not do PDFs yet), we discard the data.
+ </p>
+ </p>
+ </ul>
+ <p>
+ Justin can be contacted at tvxh-lw4x at spamex.com.
+ </p>
+ </section>
+
+
+ <section id="Daniel Armbrust's benchmarks"><title>Daniel Armbrust's benchmarks</title>
+ <p>
+ My disclaimer is that this is a very poor "Benchmark". It was not done for raw speed,
+ nor was the total index built in one shot. The index was created on several different
+ machines (all with these specs, or very similar), with each machine indexing batches of 500,000 to
+ 1 million documents per batch. Each of these small indexes was then moved to a
+ much larger drive, where they were all merged together into a big index.
+ This process was done manually, over the course of several months, as the sources became available.
+ </p>
+ <ul>
+ <p>
+ <b>Hardware Environment</b><br/>
+ <li><i>Dedicated machine for indexing</i>: no - The machine had moderate to low load. However, the indexing process was built single
+ threaded, so it only took advantage of 1 of the processors. It usually got 100% of this processor.</li>
+ <li><i>CPU</i>: Sun Ultra 80 4 x 64 bit processors</li>
+ <li><i>RAM</i>: 4 GB Memory</li>
+ <li><i>Drive configuration</i>: Ultra-SCSI Wide 10000 RPM 36GB Drive</li>
+ </p>
+ <p>
+ <b>Software environment</b><br/>
+ <li><i>Lucene Version</i>: 1.2</li>
+ <li><i>Java Version</i>: 1.3.1</li>
+ <li><i>Java VM</i>: </li>
+ <li><i>OS Version</i>: Sun 5.8 (64 bit)</li>
+ <li><i>Location of index</i>: local</li>
+ </p>
+ <p>
+ <b>Lucene indexing variables</b><br/>
+ <li><i>Number of source documents</i>: 13,820,517</li>
+ <li><i>Total filesize of source documents</i>: 87.3 GB</li>
+ <li><i>Average filesize of source documents</i>: 6.3 KB</li>
+ <li><i>Source documents storage location</i>: Filesystem</li>
+ <li><i>File type of source documents</i>: XML</li>
+ <li><i>Parser(s) used, if any</i>: </li>
+ <li><i>Analyzer(s) used</i>: A home grown analyzer that simply removes stopwords.</li>
+ <li><i>Number of fields per document</i>: 1 - 31</li>
+ <li><i>Type of fields</i>: All text, though 2 of them are dates (20001205) that we filter on</li>
+ <li><i>Index persistence</i>: FSDirectory</li>
+ <li><i>Index size</i>: 12.5 GB</li>
+ </p>
+ <p>
+ <b>Figures</b><br/>
+ <li><i>Time taken (in ms/s as an average of at least 3
+ indexing runs)</i>: For 617271 documents, 209698 seconds (or ~2.5 days)</li>
+ <li><i>Time taken / 1000 docs indexed</i>: 340 Seconds</li>
+ <li><i>Memory consumption</i>: (java executed with) java -Xmx1000m -Xss8192k so
+ 1 GB of memory was allotted to the indexer</li>
+ </p>
+ <p>
+ <b>Notes</b><br/>
+ <p>
+ The source documents were XML. The "indexer" opened each document one at a time, ran an
+ XSL transformation on them, and then proceeded to index the stream. The indexer optimized
+ the index every 50,000 documents (on this run) though previously, we optimized every
+ 300,000 documents. The performance didn't change much either way. We did no other
+ tuning (RAM Directories, separate process to pretransform the source material, etc.)
+ to make it index faster. When all of these individual indexes were built, they were
+ merged together into the main index. That process usually took ~ a day.
+ </p>
+ </p>
+ </ul>
+ <p>
+ Daniel can be contacted at Armbrust.Daniel at mayo.edu.
+ </p>
+ </section>
+ <section id="Geoffrey Peddle's benchmarks"><title>Geoffrey Peddle's benchmarks</title>
+ <p>
+ I'm doing a technical evaluation of search engines
+ for Ariba, an enterprise application software company.
+ I compared Lucene to a commercial C language based
+ search engine which I'll refer to as vendor A.
+ Overall Lucene's performance was similar to vendor A
+ and met our application's requirements. I've
+ summarized our results below.
+ </p>
+ <p>
+ Search scalability:<br/>
+ We ran a set of 16 queries in a single thread for 20
+ iterations. We report below the times for the last 15
+ iterations (ie after the system was warmed up). The
+ 4 sets of results below are for indexes with between
+ 50,000 documents to 600,000 documents. Although the
+ times for Lucene grew faster with document count than
+ vendor A they were comparable.
+ </p>
+<pre>
+50K documents
+Lucene 5.2 seconds
+A 7.2
+200K
+Lucene 15.3
+A 15.2
+400K
+Lucene 28.2
+A 25.5
+600K
+Lucene 41
+A 33
+</pre>
+ <p>
+ Individual Query times:<br/>
+ Total query times are very similar between the 2
+ systems but there were larger differences when you
+ looked at individual queries.
+ </p>
+ <p>
+ For simple queries with small result sets Vendor A was
+ consistently faster than Lucene. For example a
+ single query might take vendor A 32 thousands of a
+ second and Lucene 64 thousands of a second. Both
+ times are however well within acceptable response
+ times for our application.
+ </p>
+ <p>
+ For simple queries with large result sets Vendor A was
+ consistently slower than Lucene. For example a
+ single query might take vendor A 300 thousands of a
+ second and Lucene 200 thousands of a second.
+ For more complex queries of the form (term1 or term2
+ or term3) AND (term4 or term5 or term6) AND (term7 or
+ term8) the results were more divergent. For
+ queries with small result sets Vendor A generally had
+ very short response times and sometimes Lucene had
+ significantly larger response times. For example
+ Vendor A might take 16 thousands of a second and
+ Lucene might take 156. I do not consider it to be
+ the case that Lucene's response time grew unexpectedly
+ but rather that Vendor A appeared to be taking
+ advantage of an optimization which Lucene didn't have.
+ (I believe there's been discussions on the dev
+ mailing list on complex queries of this sort.)
+ </p>
+ <p>
+ Index Size:<br/>
+ For our test data the size of both indexes grew
+ linearly with the number of documents. Note that
+ these sizes are compact sizes, not maximum size during
+ index loading. The numbers below are from running du
+ -k in the directory containing the index data. The
+ larger number's below for Vendor A may be because it
+ supports additional functionality not available in
+ Lucene. I think it's the constant rate of growth
+ rather than the absolute amount which is more
+ important.
+ </p>
+<pre>
+50K documents
+Lucene 45516 K
+A 63921
+200K
+Lucene 171565
+A 228370
+400K
+Lucene 345717
+A 457843
+600K
+Lucene 511338
+A 684913
+</pre>
+ <p>
+ Indexing Times:<br/>
+ These times are for reading the documents from our
+ database, processing them, inserting them into the
+ document search product and index compacting. Our
+ data has a large number of fields/attributes. For
+ this test I restricted Lucene to 24 attributes to
+ reduce the number of files created. Doing this I was
+ able to specify a merge width for Lucene of 60. I
+ found in general that Lucene indexing performance to
+ be very sensitive to changes in the merge width.
+ Note also that our application does a full compaction
+ after inserting every 20,000 documents. These times
+ are just within our acceptable limits but we are
+ interested in alternatives to increase Lucene's
+ performance in this area.
+ </p>
+<p>
+<pre>
+600K documents
+Lucene 81 minutes
+A 34 minutes
+</pre>
+</p>
+ <p>
+ (I don't have accurate results for all sizes on this
+ measure but believe that the indexing time for both
+ solutions grew essentially linearly with size. The
+ time to compact the index generally grew with index
+ size but it's a small percent of overall time at these
+ sizes.)
+ </p>
+ <ul>
+ <p>
+ <b>Hardware Environment</b><br/>
+ <li><i>Dedicated machine for indexing</i>: yes</li>
+ <li><i>CPU</i>: Dell Pentium 4 CPU 2.00Ghz, 1cpu</li>
+ <li><i>RAM</i>: 1 GB Memory</li>
+ <li><i>Drive configuration</i>: Fujitsu MAM3367MP SCSI </li>
+ </p>
+ <p>
+ <b>Software environment</b><br/>
+ <li><i>Java Version</i>: 1.4.2_02</li>
+ <li><i>Java VM</i>: JDK</li>
+ <li><i>OS Version</i>: Windows XP </li>
+ <li><i>Location of index</i>: local</li>
+ </p>
+ <p>
+ <b>Lucene indexing variables</b><br/>
+ <li><i>Number of source documents</i>: 600,000</li>
+ <li><i>Total filesize of source documents</i>: from database</li>
+ <li><i>Average filesize of source documents</i>: from database</li>
+ <li><i>Source documents storage location</i>: from database</li>
+ <li><i>File type of source documents</i>: XML</li>
+ <li><i>Parser(s) used, if any</i>: </li>
+ <li><i>Analyzer(s) used</i>: small variation on WhitespaceAnalyzer</li>
+ <li><i>Number of fields per document</i>: 24</li>
+ <li><i>Type of fields</i>: A1 keyword, 1 big unindexed, rest are unstored and a mix of tokenized/untokenized</li>
+ <li><i>Index persistence</i>: FSDirectory</li>
+ <li><i>Index size</i>: 12.5 GB</li>
+ </p>
+ <p>
+ <b>Figures</b><br/>
+ <li><i>Time taken (in ms/s as an average of at least 3
+ indexing runs)</i>: 600,000 documents in 81 minutes (du -k = 511338)</li>
+ <li><i>Time taken / 1000 docs indexed</i>: 123 documents/second</li>
+ <li><i>Memory consumption</i>: -ms256m -mx512m -Xss4m -XX:MaxPermSize=512M</li>
+ </p>
+ <p>
+ <b>Notes</b><br/>
+ <p>
+ <li>merge width of 60</li>
+ <li>did a compact every 20,000 documents</li>
+ </p>
+ </p>
+ </ul>
+ </section>
+ </section>
+
+ </body>
+</document>
Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/contributions.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,327 @@
+<?xml version="1.0"?>
+<document>
+ <header>
+ <title>
+ Apache Lucene - Contributions
+ </title>
+ </header>
+ <properties>
+ <author email="carlson@apache.org">
+ Peter Carlson
+ </author>
+ </properties>
+ <body>
+ <section id="Overview">
+ <title>Overview</title>
+ <p>This page lists external Lucene resources. If you have
+ written something that should be included, please post all
+ relevant information to one of the mailing lists. Nothing
+ listed here is directly supported by the Lucene
+ developers, so if you encounter any problems with any of
+ this software, please use the author's contact information
+ to get help.</p>
+ <p>If you are looking for information on contributing patches or other improvements to Lucene, see
+ <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">How To Contribute</a> on the Lucene Wiki.</p>
+ </section>
+
+ <section id="Lucene Tools">
+ <title>Lucene Tools</title>
+ <p>
+ Software that works with Lucene indices.
+ </p>
+ <section id="Luke"><title>Luke</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://www.getopt.org/luke/">
+ http://www.getopt.org/luke/
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ Andrzej Bialecki
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section id="LIMO (Lucene Index Monitor)">
+ <title>LIMO (Lucene Index Monitor)</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://limo.sf.net/">
+ http://limo.sf.net/
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ Julien Nioche
+ </td>
+ </tr>
+ </table>
+ </section>
+ </section>
+
+ <section id="Lucene Document Converters">
+ <title>Lucene Document Converters</title>
+ <p>
+ Lucene requires information you want to index to be
+ converted into a Document class. Here are
+ contributions for various solutions that convert different
+ content types to Lucene's Document classes.
+ </p>
+ <section id="XML Document #1">
+ <title>XML Document #1</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://marc.theaimsgroup.com/?l=lucene-dev&m=100723333506246&w=2">
+ http://marc.theaimsgroup.com/?l=lucene-dev&m=100723333506246&w=2
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ Philip Ogren - ogren@mayo.edu
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section id="XML Document #2">
+ <title>XML Document #2</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00346.html">
+ http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00346.html
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ Peter Carlson - carlson@bookandhammer.com
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section id="PDF Box">
+ <title>PDF Box</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://www.pdfbox.org/">
+ http://www.pdfbox.org/
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ Ben Litchfield - ben@csh.rit.edu
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section id="XPDF - PDF Document Conversion">
+ <title>XPDF - PDF Document Conversion</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://www.foolabs.com/xpdf">
+ http://www.foolabs.com/xpdf
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ N/A
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section id="PDFTextStream -- PDF text and metadata extraction">
+ <title>PDFTextStream -- PDF text and metadata extraction</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://snowtide.com">
+ http://snowtide.com
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ N/A
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section id="PJ Classic & PJ Professional - PDF Document Conversion">
+ <title>PJ Classic & PJ Professional - PDF Document Conversion</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href=" http://www.etymon.com/">
+ http://www.etymon.com/
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ N/A
+ </td>
+ </tr>
+ </table>
+ </section>
+ </section>
+
+ <section id="Miscellaneous">
+ <title>Miscellaneous</title>
+ <p>
+ </p>
+ <section id="Arabic Analyzer for Java">
+ <title>Arabic Analyzer for Java</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://savannah.nongnu.org/projects/aramorph">
+ http://savannah.nongnu.org/projects/aramorph
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ Pierrick Brihaye
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section id="Phonetix">
+ <title>Phonetix</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html">
+ http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ tangentum technologies
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section id="ejIndex - JBoss MBean for Lucene">
+ <title>ejIndex - JBoss MBean for Lucene</title>
+ <p>
+ </p>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="http://ejindex.sourceforge.net/">
+ http://ejindex.sourceforge.net/
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ Andy Scholz
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section id="JavaCC">
+ <title>JavaCC</title>
+ <table>
+ <tr>
+ <th width="%1">
+ URL
+ </th>
+ <td>
+ <a href="https://javacc.dev.java.net/">
+ https://javacc.dev.java.net/
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th width="%1">
+ author
+ </th>
+ <td>
+ Sun Microsystems (java.net)
+ </td>
+ </tr>
+ </table>
+ </section>
+ </section>
+ </body>
+</document>
Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/demo.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,78 @@
+<?xml version="1.0"?>
+<document>
+ <header>
+ <title>
+ Apache Lucene - Building and Installing the Basic Demo
+ </title>
+ </header>
+<properties>
+<author email="acoliver@apache.org">Andrew C. Oliver</author>
+</properties>
+<body>
+
+<section id="About this Document"><title>About this Document</title>
+<p>
+This document is intended as a "getting started" guide to using and running the Lucene demos.
+It walks you through some basic installation and configuration.
+</p>
+</section>
+
+
+<section id="About the Demos"><title>About the Demos</title>
+<p>
+The Lucene command-line demo code consists of two applications that demonstrate various
+functionalities of Lucene and how one should go about adding Lucene to their applications.
+</p>
+</section>
+
+<section id="Setting your CLASSPATH"><title>Setting your CLASSPATH</title>
+<p>
+First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
+latest Lucene distribution and then extract it to a working directory. Alternatively, you can <a
+href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
+Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
+</p>
+<p>
+You should see the Lucene JAR file in the directory you created when you extracted the archive. It
+should be named something like <code>lucene-core-{version}.jar</code>. You should also see a file
+called <code>lucene-demos-{version}.jar</code>. If you checked out the sources from Subversion then
+the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
+successfully). Put both of these files in your Java CLASSPATH.
+</p>
+</section>
+
+<section id="Indexing Files"><title>Indexing Files</title>
+<p>
+Once you've gotten this far you're probably itching to go. Let's <b>build an index!</b> Assuming
+you've set your CLASSPATH correctly, just type:
+
+<pre>
+ java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
+</pre>
+
+This will produce a subdirectory called <code>index</code> which will contain an index of all of the
+Lucene source code.
+</p>
+<p>
+To <b>search the index</b> type:
+
+<pre>
+ java org.apache.lucene.demo.SearchFiles
+</pre>
+
+You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the
+Lucene developers are very well mannered and get no results. Now try entering the word "vector".
+That should return a whole bunch of documents. The results will page at every tenth result and ask
+you whether you want more results.
+</p>
+</section>
+
+<section id="About the code..."><title>About the code...</title>
+<p>
+<a href="demo2.html">read on>>></a>
+</p>
+</section>
+
+</body>
+</document>
+
Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/demo2.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,139 @@
+<?xml version="1.0"?>
+<document>
+ <header>
+ <title>
+ Apache Lucene - Basic Demo Sources Walk-through
+ </title>
+ </header>
+<properties>
+<author email="acoliver@apache.org">Andrew C. Oliver</author>
+</properties>
+<body>
+
+<section id="About the Code"><title>About the Code</title>
+<p>
+In this section we walk through the sources behind the command-line Lucene demo: where to find them,
+their parts and their function. This section is intended for Java developers wishing to understand
+how to use Lucene in their applications.
+</p>
+</section>
+
+
+<section id="Location of the source"><title>Location of the source</title>
+
+<p>
+Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
+should see a directory called <code>src</code> which in turn contains a directory called
+<code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
+<code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
+</p>
+
+<p>
+Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
+Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
+</p>
+
+</section>
+
+<section id="IndexFiles"><title>IndexFiles</title>
+
+<p>
+As we discussed in the previous walk-through, the <code><a
+href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
+Index. Let's take a look at how it does this.
+</p>
+
+<p>
+The first substantial thing the <code>main</code> function does is instantiate <code><a
+href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
+"<code>index</code>" and a new instance of a class called <code><a
+href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
+The "<code>index</code>" string is the name of the filesystem directory where all index information
+should be stored. Because we're not passing a full path, this will be created as a subdirectory of
+the current working directory (if it does not already exist). On some platforms, it may be created
+in other directories (such as the user's home directory).
+</p>
+
+<p>
+The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
+class responsible for creating indices. To use it you must instantiate it with a path that it can
+write the index into. If this path does not exist it will first create it. Otherwise it will
+refresh the index at that path. You can also create an index using one of the subclasses of <code><a
+href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
+instance of <code><a
+href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
+</p>
+
+<p>
+The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
+are using, <code><a
+href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
+little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
+useless words and characters from the index. By useless words and characters I mean common language
+words such as articles (a, an, the, etc.) and other strings that would be useless for searching
+(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
+should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
+different languages (see the <code>*Analyzer.java</code> sources under <a
+href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
+</p>
+
+<p>
+Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
+function simply crawls the directories and uses <code><a
+href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a
+href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a
+href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
+represent the content in the file as well as its creation time and location. These instances are
+added to the <code>indexWriter</code>. Take a look inside <code><a
+href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
+complicated. It just adds fields to the <code><a
+href="api/org/apache/lucene/document/Document.html">Document</a></code>.
+</p>
+
+<p>
+As you can see there isn't much to creating an index. The devil is in the details. You may also
+wish to examine the other samples in this directory, particularly the <code><a
+href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
+complex but builds upon this example.
+</p>
+
+</section>
+
+<section id="Searching Files"><title>Searching Files</title>
+
+<p>
+The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
+quite simple. It primarily collaborates with an <code><a
+href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a
+href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
+(which is used in the <code><a
+href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
+<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
+query parser is constructed with an analyzer used to interpret your query text in the same way the
+documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
+'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
+the results from the <code><a
+href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
+the searcher. Note that it's also possible to programmatically construct a rich <code><a
+href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
+parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
+syntax</a> into the corresponding <code><a
+href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
+returned in a collection of Documents called <code><a
+href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
+displayed to the user.
+</p>
+
+</section>
+
+<section id="The Web example..."><title>The Web example...</title>
+
+<p>
+<a href="demo3.html">read on>>></a>
+</p>
+
+</section>
+
+</body>
+</document>
+
Added: lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml?view=auto&rev=479465
==============================================================================
--- lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml (added)
+++ lucene/java/trunk/src/site/src/documentation/content/xdocs/demo3.xml Sun Nov 26 16:00:46 2006
@@ -0,0 +1,90 @@
+<?xml version="1.0"?>
+
+<document>
+ <header>
+ <title>
+ Apache Lucene - Building and Installing the Basic Demo
+ </title>
+ </header>
+<properties>
+<author email="acoliver@apache.org">Andrew C. Oliver</author>
+</properties>
+<body>
+
+<section id="About this Document"><title>About this Document</title>
+<p>
+This document is intended as a "getting started" guide to installing and running the Lucene
+web application demo. This guide assumes that you have read the information in the previous two
+examples. We'll use Tomcat as our reference web container. These demos should work with nearly any
+container, but you may have to adapt them appropriately.
+</p>
+</section>
+
+
+<section id="About the Demos"><title>About the Demos</title>
+<p>
+The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
+similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's
+more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate
+Lucene. With that being said, it should be relatively simple to create a small searchable website
+in Tomcat or a similar application server.
+</p>
+</section>
+
+<section id="Indexing Files"><title>Indexing Files</title>
+<p> Once you've gotten this far you're probably itching to go. Let's start by creating the index
+you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples,
+all you need to do is type:
+
+<pre>
+ java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
+</pre>
+
+You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
+(make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
+<code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
+outside of a web accessible context. By default the webapp is configured to look in
+<code>/opt/lucene/index</code> for this index.
+</p>
+</section>
+
+<section id="Deploying the Demos"><title>Deploying the Demos</title>
+<p>Located in your distribution directory you should see a war file called
+<code>luceneweb.war</code>. If you're working with a Subversion checkout, this will be under the
+<code>build</code> subdirectory. Copy this to your <code>{tomcat-home}/webapps</code> directory.
+You may need to restart Tomcat. </p> </section>
+
+<section id="Configuration"><title>Configuration</title>
+<p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory. If it's not
+present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
+the webapp), then look again. Edit a file called <code>configuration.jsp</code>. Ensure that the
+<code>indexLocation</code> is equal to the location you used for your index. You may also customize
+the <code>appTitle</code> and <code>appFooter</code> strings as you see fit. Once you have finished
+altering the configuration you may need to restart Tomcat. You may also wish to update the war file
+by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
+subdirectory. (The -u option is not available in all versions of jar. In this case recreate the
+war file).
+</p>
+</section>
+
+<section id="Running the Demos"><title>Running the Demos</title>
+<p>Now you're ready to roll. In your browser set the url to
+<code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
+page and press search.</p>
+<p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
+examples) or nothing. If you get an error regarding opening the index, then you probably set the
+path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
+(or you skipped the step of creating it). Try other search terms. Depending on the number of items
+per page you set and results returned, there may be a link at the bottom that says <b>More
+Results>></b>; clicking it takes you to subsequent pages. </p> </section>
+
+<section id="About the code..."><title>About the code...</title>
+<p>
+If you want to know more about how this web app works or how to customize it then <a
+href="demo4.html">read on>>></a>.
+</p>
+</section>
+
+</body>
+</document>
+