You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ju...@apache.org on 2012/08/05 18:08:10 UTC
svn commit: r1369613 - in /tika/trunk/src/site/apt: gettingstarted.apt
index.apt
Author: jukka
Date: Sun Aug 5 16:08:09 2012
New Revision: 1369613
URL: http://svn.apache.org/viewvc?rev=1369613&view=rev
Log:
TIKA-966: org.apache.tika.Tika missing from tika-bundle-1.2.jar
Update documentation on tika-bundle.
Also some other documentation improvements.
Modified:
tika/trunk/src/site/apt/gettingstarted.apt
tika/trunk/src/site/apt/index.apt
Modified: tika/trunk/src/site/apt/gettingstarted.apt
URL: http://svn.apache.org/viewvc/tika/trunk/src/site/apt/gettingstarted.apt?rev=1369613&r1=1369612&r2=1369613&view=diff
==============================================================================
--- tika/trunk/src/site/apt/gettingstarted.apt (original)
+++ tika/trunk/src/site/apt/gettingstarted.apt Sun Aug 5 16:08:09 2012
@@ -45,25 +45,25 @@ mvn install
Build artifacts
- The Tika 0.8 build consists of a number of components and produces
+ The Tika build consists of a number of components and produces
the following main binaries:
- [tika-core/target/tika-core-0.8.jar]
+ [tika-core/target/tika-core-*.jar]
Tika core library. Contains the core interfaces and classes of Tika,
but none of the parser implementations. Depends only on Java 5.
- [tika-parsers/target/tika-parsers-0.8.jar]
+ [tika-parsers/target/tika-parsers-*.jar]
Tika parsers. Collection of classes that implement the Tika Parser
interface based on various external parser libraries.
- [tika-app/target/tika-app-0.8.jar]
- Tika application. Combines the above libraries and all the external
+ [tika-app/target/tika-app-*.jar]
+ Tika application. Combines the above components and all the external
parser libraries into a single runnable jar with a GUI and a command
line interface.
- [tika-bundle/target/tika-bundle-0.8.jar]
- Tika bundle. An OSGi bundle that includes everything you need to use all
- Tika functionality in an OSGi environment.
+ [tika-bundle/target/tika-bundle-*.jar]
+ Tika bundle. An OSGi bundle that combines tika-parsers with non-OSGified
+ parser libraries to make them easy to deploy in an OSGi environment.
Using Tika as a Maven dependency
@@ -75,7 +75,7 @@ Using Tika as a Maven dependency
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
- <version>0.8</version>
+ <version>...</version>
</dependency>
---
@@ -86,80 +86,43 @@ Using Tika as a Maven dependency
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
- <version>0.8</version>
+ <version>...</version>
</dependency>
---
Note that adding this dependency will introduce a number of
transitive dependencies to your project, including one on tika-core.
You need to make sure that these dependencies won't conflict with your
- existing project dependencies. The listing below shows all the
- compile-scope dependencies of tika-parsers in the Tika 0.8 release.
+ existing project dependencies. You can use the following command in
+ the tika-parsers directory to get a full listing of all the dependencies.
---
-org.apache.tika:tika-parsers:bundle:0.8
-+- org.apache.tika:tika-core:jar:0.8:compile
-+- org.apache.commons:commons-compress:jar:1.0:compile
-+- org.apache.pdfbox:pdfbox:jar:0.8.0-incubating:compile
-| +- org.apache.pdfbox:fontbox:jar:0.8.0-incubator:compile
-| \- org.apache.pdfbox:jempbox:jar:0.8.0-incubator:compile
-+- org.apache.poi:poi:jar:3.6:compile
-+- org.apache.poi:poi-scratchpad:jar:3.6:compile
-+- org.apache.poi:poi-ooxml:jar:3.6:compile
-| +- org.apache.poi:poi-ooxml-schemas:jar:3.6:compile
-| | \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
-| \- dom4j:dom4j:jar:1.6.1:compile
-| \- xml-apis:xml-apis:jar:1.0.b2:compile
-+- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
-+- commons-logging:commons-logging:jar:1.1.1:compile
-+- org.ccil.cowan.tagsoup:tagsoup:jar:1.2:compile
-+- asm:asm:jar:3.1:compile
-+- log4j:log4j:jar:1.2.14:compile
-\- com.drewnoakes:metadata-extractor:jar:2.4.0-beta-1:compile
+$ mvn dependency:tree | grep :compile
---
Using Tika in an Ant project
Unless you use a dependency manager tool like
- {{{http://ant.apache.org/ivy/}Apache Ivy}}, to use Tika in you application
- you can include the Tika jar files and the dependencies individually.
+ {{{http://ant.apache.org/ivy/}Apache Ivy}}, the easiest way to use
+ Tika is to include either the tika-core or the tika-app jar in your
+ classpath, depending on whether you want just the core functionality
+ or also all the parser implementations.
---
<classpath>
... <!-- your other classpath entries -->
- <pathelement location="path/to/tika-core-0.8.jar"/>
- <pathelement location="path/to/tika-parsers-0.8.jar"/>
- <pathelement location="path/to/commons-logging-1.1.1.jar"/>
- <pathelement location="path/to/commons-compress-1.0.jar"/>
- <pathelement location="path/to/pdfbox-0.8.0-incubating.jar"/>
- <pathelement location="path/to/fontbox-0.8.0-incubator.jar"/>
- <pathelement location="path/to/jempbox-0.8.0-incubator.jar"/>
- <pathelement location="path/to/poi-3.6.jar"/>
- <pathelement location="path/to/poi-scratchpad-3.6.jar"/>
- <pathelement location="path/to/poi-ooxml-3.6.jar"/>
- <pathelement location="path/to/poi-ooxml-schemas-3.6.jar"/>
- <pathelement location="path/to/xmlbeans-2.3.0.jar"/>
- <pathelement location="path/to/dom4j-1.6.1.jar"/>
- <pathelement location="path/to/xml-apis-1.0.b2.jar"/>
- <pathelement location="path/to/geronimo-stax-api_1.0_spec-1.0.jar"/>
- <pathelement location="path/to/tagsoup-1.2.jar"/>
- <pathelement location="path/to/asm-3.1.jar"/>
- <pathelement location="path/to/log4j-1.2.14.jar"/>
- <pathelement location="path/to/metadata-extractor-2.4.0-beta-1.jar"/>
-</classpath>
----
- An easy way to gather all these libraries is to run
- "mvn dependency:copy-dependencies" in the tika-parsers source directory.
- This will copy all Tika dependencies to the <<<target/dependencies>>>
- directory.
+ <!-- either: -->
+ <pathelement location="path/to/tika-core-${tika.version}.jar"/>
+ <!-- or: -->
+ <pathelement location="path/to/tika-app-${tika.version}.jar"/>
- Alternatively you can simply drop the entire tika-app jar to your
- classpath to get all of the above dependencies in a single archive.
+</classpath>
+---
Using Tika as a command line utility
- The Tika application jar (tika-app-0.8.jar) can be used as a command
+ The Tika application jar (tika-app-*.jar) can be used as a command
line utility for extracting text content and metadata from all sorts of
files. This runnable jar contains all the dependencies it needs, so
you don't need to worry about classpath settings to run it.
@@ -167,16 +130,45 @@ Using Tika as a command line utility
The usage instructions are shown below.
---
-usage: java -jar tika-app-0.8.jar [option] [file]
+usage: java -jar tika-app.jar [option...] [file|port...]
Options:
- -? or --help Print this usage message
- -v or --verbose Print debug level messages
- -g or --gui Start the Apache Tika GUI
- -x or --xml Output XHTML content (default)
- -h or --html Output HTML content
- -t or --text Output plain text content
- -m or --metadata Output only metadata
+ -? or --help Print this usage message
+ -v or --verbose Print debug level messages
+ -V or --version Print the Apache Tika version number
+
+ -g or --gui Start the Apache Tika GUI
+ -s or --server Start the Apache Tika server
+ -f or --fork Use Fork Mode for out-of-process extraction
+
+ -x or --xml Output XHTML content (default)
+ -h or --html Output HTML content
+ -t or --text Output plain text content
+ -T or --text-main Output plain text content (main content only)
+ -m or --metadata Output only metadata
+ -j or --json Output metadata in JSON
+ -y or --xmp Output metadata in XMP
+ -l or --language Output only language
+ -d or --detect Detect document type
+ -eX or --encoding=X Use output encoding X
+ -pX or --password=X Use document password X
+ -z or --extract Extract all attachements into current directory
+ --extract-dir=<dir> Specify target directory for -z
+ -r or --pretty-print For XML and XHTML outputs, adds newlines and
+ whitespace, for better readability
+
+ --create-profile=X
+ Create NGram profile, where X is a profile name
+ --list-parsers
+ List the available document parsers
+ --list-parser-details
+ List the available document parsers, and their supported mime types
+ --list-detectors
+ List the available document detectors
+ --list-met-models
+ List the available metadata models, and their supported keys
+ --list-supported-types
+ List all known media types and related information
Description:
Apache Tika will parse the file(s) specified on the
@@ -188,12 +180,21 @@ Description:
If no file name or URL is specified (or the special
name "-" is used), then the standard input stream
- is parsed.
+ is parsed. If no arguments were given and no input
+ data is available, the GUI is started instead.
+
+- GUI mode
+
+ Use the "--gui" (or "-g") option to start the
+ Apache Tika GUI. You can drag and drop files from
+ a normal file explorer to the GUI window to extract
+ text content and metadata from the files.
+
+- Server mode
- Use the "--gui" (or "-g") option to start
- the Apache Tika GUI. You can drag and drop files
- from a normal file explorer to the GUI window to
- extract text content and metadata from the files.
+ Use the "--server" (or "-s") option to start the
+ Apache Tika server. The server will listen to the
+ ports you specify as one or more arguments.
---
You can also use the jar as a component in a Unix pipeline or
@@ -202,6 +203,6 @@ Description:
---
# Check if an Internet resource contains a specific keyword
curl http://.../document.doc \
- | java -jar tika-app-0.8.jar --text \
+ | java -jar tika-app.jar --text \
| grep -q keyword
---
Modified: tika/trunk/src/site/apt/index.apt
URL: http://svn.apache.org/viewvc/tika/trunk/src/site/apt/index.apt?rev=1369613&r1=1369612&r2=1369613&view=diff
==============================================================================
--- tika/trunk/src/site/apt/index.apt (original)
+++ tika/trunk/src/site/apt/index.apt Sun Aug 5 16:08:09 2012
@@ -1,5 +1,5 @@
---------------
- Apache Tika 0.8
+ Apache Tika 1.3
---------------
~~ Licensed to the Apache Software Foundation (ASF) under one or more
@@ -17,18 +17,15 @@
~~ See the License for the specific language governing permissions and
~~ limitations under the License.
-Apache Tika 1.1
+Apache Tika 1.3
-
- The most notable changes in Tika 1.1 over the previous release are:
+ The most notable changes in Tika 1.3 over the previous release are:
* TBD
-
- The following people have contributed to Tika 1.1 by submitting or
+ The following people have contributed to Tika 1.3 by submitting or
commenting on the issues resolved in this release:
* TBD
-
See TBD for more details on these contributions.