You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by ju...@apache.org on 2006/12/03 19:26:41 UTC
svn commit: r481851 - in /jackrabbit/trunk/jackrabbit-index-filters:
README.txt pom.xml src/java/ src/main/ src/main/java/ src/test/java/
src/test/java/org/ src/test/org/
Author: jukka
Date: Sun Dec 3 10:26:38 2006
New Revision: 481851
URL: http://svn.apache.org/viewvc?view=rev&rev=481851
Log:
JCR-612, JCR-332: Upgraded jackrabbit-index filters to Maven 2.
Added:
jackrabbit/trunk/jackrabbit-index-filters/pom.xml
- copied, changed from r481769, jackrabbit/trunk/jackrabbit-api/pom.xml
jackrabbit/trunk/jackrabbit-index-filters/src/main/
jackrabbit/trunk/jackrabbit-index-filters/src/main/java/
- copied from r481759, jackrabbit/trunk/jackrabbit-index-filters/src/java/
jackrabbit/trunk/jackrabbit-index-filters/src/test/java/
jackrabbit/trunk/jackrabbit-index-filters/src/test/java/org/
- copied from r481759, jackrabbit/trunk/jackrabbit-index-filters/src/test/org/
Removed:
jackrabbit/trunk/jackrabbit-index-filters/src/java/
jackrabbit/trunk/jackrabbit-index-filters/src/test/org/
Modified:
jackrabbit/trunk/jackrabbit-index-filters/README.txt
Modified: jackrabbit/trunk/jackrabbit-index-filters/README.txt
URL: http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-index-filters/README.txt?view=diff&rev=481851&r1=481850&r2=481851
==============================================================================
--- jackrabbit/trunk/jackrabbit-index-filters/README.txt (original)
+++ jackrabbit/trunk/jackrabbit-index-filters/README.txt Sun Dec 3 10:26:38 2006
@@ -1,31 +1,72 @@
-TextFilters allow Jackrabbit to extract text from binary
-properties for indexing purposes.
+===================================
+Welcome to Jackrabbit Index Filters
+===================================
+
+This is the Index Filters component of the Apache Jackrabbit project.
+This component contains filter classes that allow Jackrabbit to
+extract text content from binary properties for full text indexing.
+The following file formats and MIME types are currently supported:
+
+ * Microsoft Word
+ [org.apache.jackrabbit.core.query.MsWordTextFilter]
+ * application/vnd.ms-word
+ * application/msword
+
+ * Microsoft Excel
+ [org.apache.jackrabbit.core.query.MsExcelTextFilter]
+ * application/vnd.ms-excel
+
+ * Microsoft PowerPoint
+ [org.apache.jackrabbit.core.query.MsPowerPointTextFilter]
+ * application/vnd.ms-powerpoint
+ * application/mspowerpoint
+
+ * Portable Document Format (PDF)
+ [org.apache.jackrabbit.core.query.PdfTextFilter]
+ * application/pdf
+
+ * OpenOffice.org
+ [org.apache.jackrabbit.core.query.OpenOfficeTextFilter]
+ * application/vnd.oasis.opendocument.database
+ * application/vnd.oasis.opendocument.formula
+ * application/vnd.oasis.opendocument.graphics
+ * application/vnd.oasis.opendocument.presentation
+ * application/vnd.oasis.opendocument.spreadsheet
+ * application/vnd.oasis.opendocument.text
+
+ * Rich Text Format (RTF)
+ [org.apache.jackrabbit.core.query.RTFTextFilter]
+ * application/rtf
+
+ * HyperText Markup Language (HTML)
+ [org.apache.jackrabbit.core.query.HTMLTextFilter]
+ * text/html
+
+ * Extensible Markup Language (XML)
+ [org.apache.jackrabbit.core.query.XMLTextFilter]
+ * text/xml
+
+To use these index filters with the Jackrabbit Core:
+
+ 1) add the jackrabbit-index-filters jar file and the dependencies defined
+ in the Maven POM in the Jackrabbit classpath, and
+ 2) add the fully qualified class names listed above in the "textFilterClasses"
+ parameter of the "SearchIndex" configuration element of a Jackrabbit
+ workspace configuration file (workspace.xml).
+
+See the javadocs of org.apache.jackrabbit.core.query.TextFilter in the
+Jackrabbit Core compoment for more information.
+
+See the Apache Jackrabbit web site (http://jackrabbit.apache.org/)
+for documentation and other information. You are welcome to join the
+Jackrabbit mailing lists (http://jackrabbit.apache.org/mail-lists.html)
+to discuss this compoment and to use the Jackrabbit issue tracker
+(http://issues.apache.org/jira/browse/JCR) to report issues or request
+new features.
-This project contains TextFilter implementations for the
-following binary formats:
+Apache Jackrabbit is a project of the Apache Software Foundation
+(http://www.apache.org).
-1. MsExcel
-2. MsPowerPoint
-3. MsWord
-4. Pdf
-
-How to register in jackrabbit?
-Build the jar file and place it in the Jackrabbit
-classpath together with the dependencies of these text
-filters.
-Configure them in the SearchIndex element of the workspace.xml
-
-Sample:
-
-...
- <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
- <param name="path" value="${wsp.home}/index" />
- <param name="textFilterClasses" value="org.apache.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.MsPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jackrabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.query.OpenOfficeTextFilter" />
- </SearchIndex>
-...
-
-For further information, see the javadocs for:
-org.apache.jackrabbit.core.query.TextFilter
License (see also LICENSE.txt)
==============================
@@ -46,3 +87,24 @@
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
+
+
+Getting Started
+===============
+
+This compoment uses a Maven 2 (http://maven.apache.org/) build
+environment. If you have Maven 2 installed, you can compile and
+package the jacrabbit-index-filters jar using the following command:
+
+ mvn package
+
+See the Maven 2 documentation for other build features.
+
+The latest source code for this compoment is available in the
+Subversion (http://subversion.tigris.org/) source repository of
+the Apache Software Foundation. If you have Subversion installed,
+you can checkout the latest source using the following command:
+
+ svn checkout http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-index-filters
+
+See the Subversion documentation for other source control features.
Copied: jackrabbit/trunk/jackrabbit-index-filters/pom.xml (from r481769, jackrabbit/trunk/jackrabbit-api/pom.xml)
URL: http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-index-filters/pom.xml?view=diff&rev=481851&p1=jackrabbit/trunk/jackrabbit-api/pom.xml&r1=481769&p2=jackrabbit/trunk/jackrabbit-index-filters/pom.xml&r2=481851
==============================================================================
--- jackrabbit/trunk/jackrabbit-api/pom.xml (original)
+++ jackrabbit/trunk/jackrabbit-index-filters/pom.xml Sun Dec 3 10:26:38 2006
@@ -32,18 +32,18 @@
<version>1.2-SNAPSHOT</version>
<relativePath>..</relativePath>
</parent>
- <artifactId>jackrabbit-api</artifactId>
- <name>Jackrabbit API</name>
- <description>Jacrabbit-specific extensions to the JCR API</description>
+ <artifactId>jackrabbit-index-filters</artifactId>
+ <name>Jackrabbit Index Filters</name>
+ <description>Classes to extract text content from binary documents</description>
<scm>
<connection>
- scm:svn:http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-api
+ scm:svn:http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-index-filters
</connection>
<developerConnection>
- scm:svn:https://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-api
+ scm:svn:https://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-index-filters
</developerConnection>
- <url>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-api</url>
+ <url>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-index-filters</url>
</scm>
<build>
@@ -72,9 +72,35 @@
<dependencies>
<dependency>
- <groupId>javax.jcr</groupId>
- <artifactId>jcr</artifactId>
- <version>1.0</version>
+ <groupId>org.apache.jackrabbit</groupId>
+ <artifactId>jackrabbit-core</artifactId>
+ <version>${pom.version}</version>
+ </dependency>
+ <dependency>
+ <groupId>poi</groupId>
+ <artifactId>poi</artifactId>
+ <version>2.5.1-final-20040804</version>
+ </dependency>
+ <dependency>
+ <groupId>pdfbox</groupId>
+ <artifactId>pdfbox</artifactId>
+ <version>0.6.4</version>
+ </dependency>
+ <dependency>
+ <groupId>org.textmining</groupId>
+ <artifactId>tm-extractors</artifactId>
+ <version>0.4</version>
+ </dependency>
+ <dependency>
+ <groupId>nekohtml</groupId>
+ <artifactId>nekohtml</artifactId>
+ <version>0.9.4</version>
+ </dependency>
+ <dependency>
+ <groupId>junit</groupId>
+ <artifactId>junit</artifactId>
+ <version>3.8.1</version>
+ <scope>test</scope>
</dependency>
</dependencies>