You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ju...@apache.org on 2012/08/05 18:08:10 UTC

svn commit: r1369613 - in /tika/trunk/src/site/apt: gettingstarted.apt index.apt

Author: jukka
Date: Sun Aug  5 16:08:09 2012
New Revision: 1369613

URL: http://svn.apache.org/viewvc?rev=1369613&view=rev
Log:
TIKA-966: org.apache.tika.Tika missing from tika-bundle-1.2.jar

Update documentation on tika-bundle.
Also some other documentation improvements.

Modified:
    tika/trunk/src/site/apt/gettingstarted.apt
    tika/trunk/src/site/apt/index.apt

Modified: tika/trunk/src/site/apt/gettingstarted.apt
URL: http://svn.apache.org/viewvc/tika/trunk/src/site/apt/gettingstarted.apt?rev=1369613&r1=1369612&r2=1369613&view=diff
==============================================================================
--- tika/trunk/src/site/apt/gettingstarted.apt (original)
+++ tika/trunk/src/site/apt/gettingstarted.apt Sun Aug  5 16:08:09 2012
@@ -45,25 +45,25 @@ mvn install
 
 Build artifacts
 
- The Tika 0.8 build consists of a number of components and produces
+ The Tika build consists of a number of components and produces
  the following main binaries:
 
- [tika-core/target/tika-core-0.8.jar]
+ [tika-core/target/tika-core-*.jar]
   Tika core library. Contains the core interfaces and classes of Tika,
   but none of the parser implementations. Depends only on Java 5.
 
- [tika-parsers/target/tika-parsers-0.8.jar]
+ [tika-parsers/target/tika-parsers-*.jar]
   Tika parsers. Collection of classes that implement the Tika Parser
   interface based on various external parser libraries.
 
- [tika-app/target/tika-app-0.8.jar]
-  Tika application. Combines the above libraries and all the external
+ [tika-app/target/tika-app-*.jar]
+  Tika application. Combines the above components and all the external
   parser libraries into a single runnable jar with a GUI and a command
   line interface.
 
- [tika-bundle/target/tika-bundle-0.8.jar]
-  Tika bundle. An OSGi bundle that includes everything you need to use all
-  Tika functionality in an OSGi environment.
+ [tika-bundle/target/tika-bundle-*.jar]
+  Tika bundle. An OSGi bundle that combines tika-parsers with non-OSGified
+  parser libraries to make them easy to deploy in an OSGi environment.
 
 Using Tika as a Maven dependency
 
@@ -75,7 +75,7 @@ Using Tika as a Maven dependency
   <dependency>
     <groupId>org.apache.tika</groupId>
     <artifactId>tika-core</artifactId>
-    <version>0.8</version>
+    <version>...</version>
   </dependency>
 ---
 
@@ -86,80 +86,43 @@ Using Tika as a Maven dependency
   <dependency>
     <groupId>org.apache.tika</groupId>
     <artifactId>tika-parsers</artifactId>
-    <version>0.8</version>
+    <version>...</version>
   </dependency>
 ---
 
  Note that adding this dependency will introduce a number of
  transitive dependencies to your project, including one on tika-core.
  You need to make sure that these dependencies won't conflict with your
- existing project dependencies. The listing below shows all the
- compile-scope dependencies of tika-parsers in the Tika 0.8 release.
+ existing project dependencies. You can use the following command in
+ the tika-parsers directory to get a full listing of all the dependencies.
 
 ---
-org.apache.tika:tika-parsers:bundle:0.8
-+- org.apache.tika:tika-core:jar:0.8:compile
-+- org.apache.commons:commons-compress:jar:1.0:compile
-+- org.apache.pdfbox:pdfbox:jar:0.8.0-incubating:compile
-|  +- org.apache.pdfbox:fontbox:jar:0.8.0-incubator:compile
-|  \- org.apache.pdfbox:jempbox:jar:0.8.0-incubator:compile
-+- org.apache.poi:poi:jar:3.6:compile
-+- org.apache.poi:poi-scratchpad:jar:3.6:compile
-+- org.apache.poi:poi-ooxml:jar:3.6:compile
-|  +- org.apache.poi:poi-ooxml-schemas:jar:3.6:compile
-|  |  \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
-|  \- dom4j:dom4j:jar:1.6.1:compile
-|     \- xml-apis:xml-apis:jar:1.0.b2:compile
-+- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
-+- commons-logging:commons-logging:jar:1.1.1:compile
-+- org.ccil.cowan.tagsoup:tagsoup:jar:1.2:compile
-+- asm:asm:jar:3.1:compile
-+- log4j:log4j:jar:1.2.14:compile
-\- com.drewnoakes:metadata-extractor:jar:2.4.0-beta-1:compile
+$ mvn dependency:tree | grep :compile
 ---
 
 Using Tika in an Ant project
 
  Unless you use a dependency manager tool like
- {{{http://ant.apache.org/ivy/}Apache Ivy}}, to use Tika in you application
- you can include the Tika jar files and the dependencies individually.
+ {{{http://ant.apache.org/ivy/}Apache Ivy}}, the easiest way to use
+ Tika is to include either the tika-core or the tika-app jar in your
+ classpath, depending on whether you want just the core functionality
+ or also all the parser implementations.
 
 ---
 <classpath>
   ... <!-- your other classpath entries -->
-  <pathelement location="path/to/tika-core-0.8.jar"/>
-  <pathelement location="path/to/tika-parsers-0.8.jar"/>
-  <pathelement location="path/to/commons-logging-1.1.1.jar"/>
-  <pathelement location="path/to/commons-compress-1.0.jar"/>
-  <pathelement location="path/to/pdfbox-0.8.0-incubating.jar"/>
-  <pathelement location="path/to/fontbox-0.8.0-incubator.jar"/>
-  <pathelement location="path/to/jempbox-0.8.0-incubator.jar"/>
-  <pathelement location="path/to/poi-3.6.jar"/>
-  <pathelement location="path/to/poi-scratchpad-3.6.jar"/>
-  <pathelement location="path/to/poi-ooxml-3.6.jar"/>
-  <pathelement location="path/to/poi-ooxml-schemas-3.6.jar"/>
-  <pathelement location="path/to/xmlbeans-2.3.0.jar"/>
-  <pathelement location="path/to/dom4j-1.6.1.jar"/>
-  <pathelement location="path/to/xml-apis-1.0.b2.jar"/>
-  <pathelement location="path/to/geronimo-stax-api_1.0_spec-1.0.jar"/>
-  <pathelement location="path/to/tagsoup-1.2.jar"/>
-  <pathelement location="path/to/asm-3.1.jar"/>
-  <pathelement location="path/to/log4j-1.2.14.jar"/>
-  <pathelement location="path/to/metadata-extractor-2.4.0-beta-1.jar"/>
-</classpath>
----
 
- An easy way to gather all these libraries is to run
- "mvn dependency:copy-dependencies" in the tika-parsers source directory.
- This will copy all Tika dependencies to the <<<target/dependencies>>>
- directory.
+  <!-- either: -->
+  <pathelement location="path/to/tika-core-${tika.version}.jar"/>
+  <!-- or: -->
+  <pathelement location="path/to/tika-app-${tika.version}.jar"/>
 
- Alternatively you can simply drop the entire tika-app jar to your
- classpath to get all of the above dependencies in a single archive.
+</classpath>
+---
 
 Using Tika as a command line utility
 
- The Tika application jar (tika-app-0.8.jar) can be used as a command
+ The Tika application jar (tika-app-*.jar) can be used as a command
  line utility for extracting text content and metadata from all sorts of
  files. This runnable jar contains all the dependencies it needs, so
  you don't need to worry about classpath settings to run it.
@@ -167,16 +130,45 @@ Using Tika as a command line utility
  The usage instructions are shown below.
 
 ---
-usage: java -jar tika-app-0.8.jar [option] [file]
+usage: java -jar tika-app.jar [option...] [file|port...]
 
 Options:
-    -? or --help       Print this usage message
-    -v or --verbose    Print debug level messages
-    -g or --gui        Start the Apache Tika GUI
-    -x or --xml        Output XHTML content (default)
-    -h or --html       Output HTML content
-    -t or --text       Output plain text content
-    -m or --metadata   Output only metadata
+    -?  or --help          Print this usage message
+    -v  or --verbose       Print debug level messages
+    -V  or --version       Print the Apache Tika version number
+
+    -g  or --gui           Start the Apache Tika GUI
+    -s  or --server        Start the Apache Tika server
+    -f  or --fork          Use Fork Mode for out-of-process extraction
+
+    -x  or --xml           Output XHTML content (default)
+    -h  or --html          Output HTML content
+    -t  or --text          Output plain text content
+    -T  or --text-main     Output plain text content (main content only)
+    -m  or --metadata      Output only metadata
+    -j  or --json          Output metadata in JSON
+    -y  or --xmp           Output metadata in XMP
+    -l  or --language      Output only language
+    -d  or --detect        Detect document type
+    -eX or --encoding=X    Use output encoding X
+    -pX or --password=X    Use document password X
+    -z  or --extract       Extract all attachements into current directory
+    --extract-dir=<dir>    Specify target directory for -z
+    -r  or --pretty-print  For XML and XHTML outputs, adds newlines and
+                           whitespace, for better readability
+
+    --create-profile=X
+         Create NGram profile, where X is a profile name
+    --list-parsers
+         List the available document parsers
+    --list-parser-details
+         List the available document parsers, and their supported mime types
+    --list-detectors
+         List the available document detectors
+    --list-met-models
+         List the available metadata models, and their supported keys
+    --list-supported-types
+         List all known media types and related information
 
 Description:
     Apache Tika will parse the file(s) specified on the
@@ -188,12 +180,21 @@ Description:
 
     If no file name or URL is specified (or the special
     name "-" is used), then the standard input stream
-    is parsed.
+    is parsed. If no arguments were given and no input
+    data is available, the GUI is started instead.
+
+- GUI mode
+
+    Use the "--gui" (or "-g") option to start the
+    Apache Tika GUI. You can drag and drop files from
+    a normal file explorer to the GUI window to extract
+    text content and metadata from the files.
+
+- Server mode
 
-    Use the "--gui" (or "-g") option to start
-    the Apache Tika GUI. You can drag and drop files
-    from a normal file explorer to the GUI window to
-    extract text content and metadata from the files.
+    Use the "--server" (or "-s") option to start the
+    Apache Tika server. The server will listen to the
+    ports you specify as one or more arguments.
 ---
 
  You can also use the jar as a component in a Unix pipeline or
@@ -202,6 +203,6 @@ Description:
 ---
 # Check if an Internet resource contains a specific keyword
 curl http://.../document.doc \
-  | java -jar tika-app-0.8.jar --text \
+  | java -jar tika-app.jar --text \
   | grep -q keyword
 ---

Modified: tika/trunk/src/site/apt/index.apt
URL: http://svn.apache.org/viewvc/tika/trunk/src/site/apt/index.apt?rev=1369613&r1=1369612&r2=1369613&view=diff
==============================================================================
--- tika/trunk/src/site/apt/index.apt (original)
+++ tika/trunk/src/site/apt/index.apt Sun Aug  5 16:08:09 2012
@@ -1,5 +1,5 @@
                        ---------------
-                       Apache Tika 0.8
+                       Apache Tika 1.3
                        ---------------
 
 ~~ Licensed to the Apache Software Foundation (ASF) under one or more
@@ -17,18 +17,15 @@
 ~~ See the License for the specific language governing permissions and
 ~~ limitations under the License.
 
-Apache Tika 1.1
+Apache Tika 1.3
 
-
-   The most notable changes in Tika 1.1 over the previous release are:
+   The most notable changes in Tika 1.3 over the previous release are:
 
       * TBD
-   
 
-   The following people have contributed to Tika 1.1 by submitting or
+   The following people have contributed to Tika 1.3 by submitting or
    commenting on the issues resolved in this release:
 
       * TBD
 
-
    See TBD for more details on these contributions.