You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by ju...@apache.org on 2006/12/03 19:26:41 UTC

svn commit: r481851 - in /jackrabbit/trunk/jackrabbit-index-filters: README.txt pom.xml src/java/ src/main/ src/main/java/ src/test/java/ src/test/java/org/ src/test/org/

Author: jukka
Date: Sun Dec  3 10:26:38 2006
New Revision: 481851

URL: http://svn.apache.org/viewvc?view=rev&rev=481851
Log:
JCR-612, JCR-332: Upgraded jackrabbit-index filters to Maven 2.

Added:
    jackrabbit/trunk/jackrabbit-index-filters/pom.xml
      - copied, changed from r481769, jackrabbit/trunk/jackrabbit-api/pom.xml
    jackrabbit/trunk/jackrabbit-index-filters/src/main/
    jackrabbit/trunk/jackrabbit-index-filters/src/main/java/
      - copied from r481759, jackrabbit/trunk/jackrabbit-index-filters/src/java/
    jackrabbit/trunk/jackrabbit-index-filters/src/test/java/
    jackrabbit/trunk/jackrabbit-index-filters/src/test/java/org/
      - copied from r481759, jackrabbit/trunk/jackrabbit-index-filters/src/test/org/
Removed:
    jackrabbit/trunk/jackrabbit-index-filters/src/java/
    jackrabbit/trunk/jackrabbit-index-filters/src/test/org/
Modified:
    jackrabbit/trunk/jackrabbit-index-filters/README.txt

Modified: jackrabbit/trunk/jackrabbit-index-filters/README.txt
URL: http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-index-filters/README.txt?view=diff&rev=481851&r1=481850&r2=481851
==============================================================================
--- jackrabbit/trunk/jackrabbit-index-filters/README.txt (original)
+++ jackrabbit/trunk/jackrabbit-index-filters/README.txt Sun Dec  3 10:26:38 2006
@@ -1,31 +1,72 @@
-TextFilters allow Jackrabbit to extract text from binary
-properties for indexing purposes.
+===================================
+Welcome to Jackrabbit Index Filters
+===================================
+
+This is the Index Filters component of the Apache Jackrabbit project.
+This component contains filter classes that allow Jackrabbit to
+extract text content from binary properties for full text indexing.
+The following file formats and MIME types are currently supported:
+
+    * Microsoft Word
+      [org.apache.jackrabbit.core.query.MsWordTextFilter]
+      * application/vnd.ms-word
+      * application/msword
+
+    * Microsoft Excel
+      [org.apache.jackrabbit.core.query.MsExcelTextFilter]
+      * application/vnd.ms-excel
+
+    * Microsoft PowerPoint
+      [org.apache.jackrabbit.core.query.MsPowerPointTextFilter] 
+      * application/vnd.ms-powerpoint
+      * application/mspowerpoint
+
+    * Portable Document Format (PDF)
+      [org.apache.jackrabbit.core.query.PdfTextFilter]
+      * application/pdf
+
+    * OpenOffice.org
+      [org.apache.jackrabbit.core.query.OpenOfficeTextFilter]
+      * application/vnd.oasis.opendocument.database
+      * application/vnd.oasis.opendocument.formula
+      * application/vnd.oasis.opendocument.graphics
+      * application/vnd.oasis.opendocument.presentation
+      * application/vnd.oasis.opendocument.spreadsheet
+      * application/vnd.oasis.opendocument.text
+
+    * Rich Text Format (RTF)
+      [org.apache.jackrabbit.core.query.RTFTextFilter]
+      * application/rtf
+
+    * HyperText Markup Language (HTML)
+      [org.apache.jackrabbit.core.query.HTMLTextFilter]
+      * text/html
+
+    * Extensible Markup Language (XML)
+      [org.apache.jackrabbit.core.query.XMLTextFilter]
+      * text/xml
+
+To use these index filters with the Jackrabbit Core:
+
+   1) add the jackrabbit-index-filters jar file and the dependencies defined
+      in the Maven POM in the Jackrabbit classpath, and
+   2) add the fully qualified class names listed above in the "textFilterClasses"
+      parameter of the "SearchIndex" configuration element of a Jackrabbit
+      workspace configuration file (workspace.xml).
+
+See the javadocs of org.apache.jackrabbit.core.query.TextFilter in the
+Jackrabbit Core compoment for more information.
+
+See the Apache Jackrabbit web site (http://jackrabbit.apache.org/)
+for documentation and other information. You are welcome to join the
+Jackrabbit mailing lists (http://jackrabbit.apache.org/mail-lists.html)
+to discuss this compoment and to use the Jackrabbit issue tracker
+(http://issues.apache.org/jira/browse/JCR) to report issues or request
+new features.
 
-This project contains TextFilter implementations for the 
-following binary formats:
+Apache Jackrabbit is a project of the Apache Software Foundation
+(http://www.apache.org).
 
-1. MsExcel
-2. MsPowerPoint
-3. MsWord
-4. Pdf
-
-How to register in jackrabbit?
-Build the jar file and place it in the Jackrabbit 
-classpath together with the dependencies of these text
-filters.
-Configure them in the SearchIndex element of the workspace.xml
-
-Sample:
-
-...
-  <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
-    <param name="path" value="${wsp.home}/index" />
-    <param name="textFilterClasses" value="org.apache.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrabbit.core.query.MsPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWordTextFilter,org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackrabbit.core.query.HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilter,org.apache.jackrabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.query.OpenOfficeTextFilter" />
-  </SearchIndex>
-...
-
-For further information, see the javadocs for:
-org.apache.jackrabbit.core.query.TextFilter
 
 License (see also LICENSE.txt)
 ==============================
@@ -46,3 +87,24 @@
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
+
+
+Getting Started
+===============
+
+This compoment uses a Maven 2 (http://maven.apache.org/) build
+environment. If you have Maven 2 installed, you can compile and
+package the jacrabbit-index-filters jar using the following command:
+
+    mvn package
+
+See the Maven 2 documentation for other build features.
+
+The latest source code for this compoment is available in the
+Subversion (http://subversion.tigris.org/) source repository of
+the Apache Software Foundation. If you have Subversion installed,
+you can checkout the latest source using the following command:
+
+    svn checkout http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-index-filters
+
+See the Subversion documentation for other source control features.

Copied: jackrabbit/trunk/jackrabbit-index-filters/pom.xml (from r481769, jackrabbit/trunk/jackrabbit-api/pom.xml)
URL: http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-index-filters/pom.xml?view=diff&rev=481851&p1=jackrabbit/trunk/jackrabbit-api/pom.xml&r1=481769&p2=jackrabbit/trunk/jackrabbit-index-filters/pom.xml&r2=481851
==============================================================================
--- jackrabbit/trunk/jackrabbit-api/pom.xml (original)
+++ jackrabbit/trunk/jackrabbit-index-filters/pom.xml Sun Dec  3 10:26:38 2006
@@ -32,18 +32,18 @@
     <version>1.2-SNAPSHOT</version>
     <relativePath>..</relativePath>
   </parent>
-  <artifactId>jackrabbit-api</artifactId>
-  <name>Jackrabbit API</name>
-  <description>Jacrabbit-specific extensions to the JCR API</description>
+  <artifactId>jackrabbit-index-filters</artifactId>
+  <name>Jackrabbit Index Filters</name>
+  <description>Classes to extract text content from binary documents</description>
 
   <scm>
     <connection>
-      scm:svn:http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-api
+      scm:svn:http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-index-filters
     </connection>
     <developerConnection>
-      scm:svn:https://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-api
+      scm:svn:https://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-index-filters
     </developerConnection>
-    <url>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-api</url>
+    <url>http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-index-filters</url>
   </scm>
 
   <build>
@@ -72,9 +72,35 @@
 
   <dependencies>
     <dependency>
-      <groupId>javax.jcr</groupId>
-      <artifactId>jcr</artifactId>
-      <version>1.0</version>
+      <groupId>org.apache.jackrabbit</groupId>
+      <artifactId>jackrabbit-core</artifactId>
+      <version>${pom.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>poi</groupId>
+      <artifactId>poi</artifactId>
+      <version>2.5.1-final-20040804</version>
+    </dependency>
+    <dependency>
+      <groupId>pdfbox</groupId>
+      <artifactId>pdfbox</artifactId>
+      <version>0.6.4</version>
+    </dependency>
+    <dependency>
+      <groupId>org.textmining</groupId>
+      <artifactId>tm-extractors</artifactId>
+      <version>0.4</version>
+    </dependency>
+    <dependency>
+      <groupId>nekohtml</groupId>
+      <artifactId>nekohtml</artifactId>
+      <version>0.9.4</version>
+    </dependency>
+    <dependency>
+      <groupId>junit</groupId>
+      <artifactId>junit</artifactId>
+      <version>3.8.1</version>
+      <scope>test</scope>
     </dependency>
   </dependencies>