You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oodt.apache.org by ma...@apache.org on 2017/02/18 23:39:49 UTC
[3/6] oodt git commit: update files for new curator
http://git-wip-us.apache.org/repos/asf/oodt/blob/a47b088a/curator2/src/site/xdoc/development/maven.xml
----------------------------------------------------------------------
diff --git a/curator2/src/site/xdoc/development/maven.xml b/curator2/src/site/xdoc/development/maven.xml
new file mode 100755
index 0000000..4207fa0
--- /dev/null
+++ b/curator2/src/site/xdoc/development/maven.xml
@@ -0,0 +1,175 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more contributor
+license agreements. See the NOTICE.txt file distributed with this work for
+additional information regarding copyright ownership. The ASF licenses this
+file to you under the Apache License, Version 2.0 (the "License"); you may not
+use this file except in compliance with the License. You may obtain a copy of
+the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+License for the specific language governing permissions and limitations under
+the License.
+-->
+<document>
+ <properties>
+ <title>Using Maven</title>
+ <author email="woollard@jpl.nasa.gov">David Woollard</author>
+ </properties>
+
+ <body>
+ <section name="Using Maven">
+ <p>Apache OODT uses <a href="http://maven.apache.org/">Maven</a> for
+ managing our build environment. Maven is an open source product from the
+ <a href="http://www.apache.org/">Apache Software Foundation</a> that improves
+ on <a href="http://ant.apache.org/">Ant</a> in the area of build management,
+ which it turn was an improvement on Make. This document describes the use of
+ Maven for OODT build management.</p>
+ </section>
+
+ <section name="Setup">
+ <p>Maven can be downloaded from the
+ <a href="http://maven.apache.org/download.html">Maven Download</a>
+ page. OODT is using version 2.0 and above. Maven was developed in Java so it
+ will run on the popular platforms (e.g., Windows, Mac OSX, etc.). Beyond
+ making sure the <i>mvn</i> executable is in your path, there is very little
+ setup required.</p>
+
+ <p>Maven is based on the concept of a Project Object Model (POM) which is
+ contained in the <i>pom.xml</i> file found at the root of each project.
+ The POM allows Maven to manage a project's build, reporting and documentation.
+ For OODT, much of the default information for managing the projects is
+ contained in a parent POM, which is located in the <i>oodt-core</i> project. So,
+ in order to build any of the other projects (e.g., cas-curator, cas-filemgr,
+ etc.) the parent POM must be downloaded from the OODT Maven repository. The
+ local <i>pom.xml</i> files for each of the projects have been configured to
+ retrieve the parent POM automatically.</p>
+
+ <p>Once Maven has been setup, the first step to building a project with Maven
+ is to checkout a project's source code into the developer's work area. See the
+ <a href="../development/subversion.html">Using Subversion</a> document for how to
+ check out projects from the CM repository.</p>
+ </section>
+
+ <section name="Project Structure">
+ <p>In order for default Maven functions to operate properly, there is a
+ suggested project directory structure. The structure is as follows:</p>
+
+ <source>
+/
+ src/ Source Code (everything)
+ main/ Program Source
+ assembly/ Package Descriptor
+ java/ Java Source
+ resources/ Scripts, Config File, etc.
+ ...
+ test/ Test Source
+ java/
+ resources/
+ ...
+ site/ Site Documentation
+ apt/ Docs in APT Format
+ index.apt
+ ...
+ xdoc/ Docs in XDOC Format
+ index.xml
+ ...
+ resources/
+ images/
+ site.xml Menu Structure
+
+ target/ Build Results (binaries, docs and packages)
+ ...
+
+ LICENSE.txt
+ README.txt
+ pom.xml Project Object Model (POM)
+ </source>
+ </section>
+
+ <section name="Standard Commands">
+ <p>There are few standard commands that developers will use on a daily basis
+ and they are related to building and cleaning a project.</p>
+ <subsection name="Build a Project">
+ <p>Build the project's libraries and executables with the following
+ command:</p>
+ <source>
+mvn compile
+ </source>
+ <p>The above command will generate the artifacts in the <i>target/</i>
+ directory.</p>
+ </subsection>
+ <subsection name="Install a Project">
+ <p>Install the project's artifacts locally with the following command:</p>
+ <source>
+mvn install
+ </source>
+ <p>Prior to installation, the above command will compile the source code,
+ if necessary, and execute the unit tests. The result of the above command
+ is to install the generated artifacts (e.g. pom, jar, etc.) in the user's
+ local Maven repository ($HOME/.m2/repository/). This is useful when the
+ artifact is a dependency for another project but has yet to be deployed
+ to the SWSA Maven repository.</p>
+ </subsection>
+ <subsection name="Package a Project">
+ <p>Create the project's distribution package with the following command:</p>
+ <source>
+mvn package
+ </source>
+ <p>Prior to package creation, the above command will compile the source
+ code, if necessary, and execute the unit tests. The above command will
+ create the package(s) in the target/ directory.</p>
+ </subsection>
+ <subsection name="Build a Project's Web Site">
+ <p>Build the project's web site with the following command:</p>
+ <source>
+mvn site
+ </source>
+ <p>The above command will generate the web site in the <i>target/site/</i>
+ directory. View the site by pointing your web browser at the
+ <i>index.html</i> file within that directory.</p>
+ </subsection>
+ <subsection name="Clean a Project">
+ <p>Clean out the project directory of generated artifacts with the
+ following command:</p>
+ <source>
+mvn clean
+ </source>
+ <p>The above command will remove the <i>target/</i> directory and its
+ contents.</p>
+ </subsection>
+ <subsection name="Useful Command Arguments">
+ <p>There a couple of useful arguments which can be appended to the
+ commands above to limit the scope of the command.</p>
+ <p>In order to skip unit test execution, add the following argument:</p>
+ <source>
+mvn [command] -Dmaven.test.skip=true
+ </source>
+ <p>The above command is most useful with the <i>install</i>,
+ <i>package</i> and <i>site</i> commands.</p>
+ <p>When a project has modules defined in the POM, the command can be
+ performed against the top level of the project instead of the modules by
+ adding the following argument:</p>
+ <source>
+mvn [command] --non-recursive
+ </source>
+ </subsection>
+ </section>
+ <section name="Acknowledgments">
+ <p>Much of the material in this Maven guide was originally authored
+ by Sean Hardman under the sponsorship of NASA Jet Propulsion
+ Laboratory's Planetary Data System. </p>
+ </section>
+ <section name="References">
+ <p>Here is a list of Maven resources:</p>
+ <ul>
+ <li><a href="http://maven.apache.org/guides/index.html">Online
+ Documentation Index</a></li>
+ </ul>
+ </section>
+ </body>
+</document>
http://git-wip-us.apache.org/repos/asf/oodt/blob/a47b088a/curator2/src/site/xdoc/user/advanced.xml
----------------------------------------------------------------------
diff --git a/curator2/src/site/xdoc/user/advanced.xml b/curator2/src/site/xdoc/user/advanced.xml
new file mode 100644
index 0000000..707fe78
--- /dev/null
+++ b/curator2/src/site/xdoc/user/advanced.xml
@@ -0,0 +1,56 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more contributor
+license agreements. See the NOTICE.txt file distributed with this work for
+additional information regarding copyright ownership. The ASF licenses this
+file to you under the Apache License, Version 2.0 (the "License"); you may not
+use this file except in compliance with the License. You may obtain a copy of
+the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+License for the specific language governing permissions and limitations under
+the License.
+-->
+<document>
+ <properties>
+ <title>Setting Up the CAS-Curator</title>
+ <author email="woollard@jpl.nasa.gov">David Woollard</author>
+ </properties>
+
+ <body>
+ <section name="Introduction">
+
+ <p>This document serves as an advanced user's guide for the CAS-Curator
+ project. The goal of the document is to explore advanced topics such as
+ security setup and changing the look and feel of the CAS-Curator
+ to match your project. For basic topics, such as checking out,
+ building, and installing the base version of the CAS-Curator, as well
+ as performing basic configuration tasks, please see our
+ <a href="../user/basic.html">Basic Guide.</a></p>
+
+ <p>The remainder of this guide is separated into the following
+ sections:</p>
+
+ <ul>
+ <li><a href="#section1">Security Setup</a></li>
+ <li><a href="#section2">Look and Feel</a></li>
+ </ul>
+ </section>
+
+
+ <a name="section1"/>
+ <section name="Security Setup">
+ <p>Coming Soon...</p>
+ </section>
+
+ <a name="section2"/>
+ <section name="Look and Feel">
+ <p>Coming Soon...</p>
+ </section>
+
+ </body>
+</document>
http://git-wip-us.apache.org/repos/asf/oodt/blob/a47b088a/curator2/src/site/xdoc/user/basic.xml
----------------------------------------------------------------------
diff --git a/curator2/src/site/xdoc/user/basic.xml b/curator2/src/site/xdoc/user/basic.xml
new file mode 100644
index 0000000..65195d4
--- /dev/null
+++ b/curator2/src/site/xdoc/user/basic.xml
@@ -0,0 +1,690 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more contributor
+license agreements. See the NOTICE.txt file distributed with this work for
+additional information regarding copyright ownership. The ASF licenses this
+file to you under the Apache License, Version 2.0 (the "License"); you may not
+use this file except in compliance with the License. You may obtain a copy of
+the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+License for the specific language governing permissions and limitations under
+the License.
+-->
+<document>
+ <properties>
+ <title>Setting Up the CAS-Curator</title>
+ <author email="woollard@jpl.nasa.gov">David Woollard</author>
+ </properties>
+
+ <body>
+ <section name="Introduction">
+ <p>This document serves as a basic user's guide for the CAS-Curator
+ project. The goal of the document is to allow users to check out,
+ build, and install the base version of the CAS-Curator, as well
+ as perform basic configuration tasks. For advanced topics, such
+ as customizing the look and feel of the CAS-Curator for your
+ project, please see our <a href="../user/advanced.html">Advanced
+ Guide.</a></p>
+
+ <p>The remainder of this guide is separated into the following
+ sections:</p>
+ <ul>
+ <li><a href="#section1">Download and Build</a></li>
+ <li><a href="#section2">Tomcat Deployment</a></li>
+ <li><a href="#section3">Staging Area Setup</a></li>
+ <li><a href="#section4">Extractor Setup</a></li>
+ <li><a href="#section5">File Manager Configuration</a></li>
+ </ul>
+
+ </section>
+
+ <a name="section1"/>
+ <section name="Download And Build">
+ <p>The most recent CAS-Curator project can be downloaded from
+ the OODT <a href="http://oodt.apache.org/">website</a> or it can
+ be checked out from the OODT repository using Subversion. The
+ We recommend checking
+ out the latest released version (v1.0.0 at the time of writing).
+ </p>
+
+ <p>Maven is the build management system used for OODT projects. We
+ currently support Maven 2.0 and later. For more information on
+ Maven, see our <a href="../development/maven.html">Maven Guide.</a>
+ </p>
+
+ <p>Assuming a *nix-like environment, with both Maven and Subversion
+ clients installed and on your path, an example of the checkout and
+ build process is presented below:</p>
+
+ <source>
+> mkdir /usr/local/src
+> cd /usr/local/src
+> svn checkout http://oodt/repo/cas-curator/tags/1_0_0_release \
+ cas-curator-v1.0.0
+ </source>
+
+ <p>After the Subversion command completes, you will have the source
+ for the CAS-Curator project in the <code>/usr/local/src/cas-curator-v1.0.0</code>
+ directory.</p>
+
+ <p>In order to build the WAR (Web ARchive) file from this source,
+ issue the following commands:</p>
+
+ <source>
+> cd /usr/local/src/cas-curator-v1.0.0
+> mvn package
+ </source>
+
+ <p>Once the Maven command completes successfully, you should have a
+ <code>target</code> directory under <code>cas-curator-v1.0.0/</code>. The
+ WAR file, called <code>cas-curator-1.0.0.war</code>, can be found under
+ <code>target/</code>.</p>
+
+ <p>In the next section, we will discuss deploying this WAR file to
+ a Tomcat instance.</p>
+
+ </section>
+
+ <a name="section2"/>
+ <section name="Tomcat Deployment">
+ <p>Once you have built a war file, it is necessary to deploy the web
+ application using a servlet container such as
+ <a href="http://tomcat.apache.org/">Tomcat</a> or
+ <a href="http://www.mortbay.org/jetty/">Jetty</a>. For the purposes of
+ this guide, we will assume that you are using Tomcat. Tomcat can be
+ installed in a user account or at the system level. The base configuration
+ launches a web server on port 8080. You can learn more about Tomcat and
+ download the latest release from their
+ <a href="http://tomcat.apache.org/">website</a>. NOTE: There are two
+ concurrent versions of Tomcat: 5.5.X and 6.0.X. CAS-Curator is compatible
+ with both versions.</p>
+
+ <p>We will assume that you have downloaded Tomcat to an appropriate
+ directory, are using the default configuration, and have taken the
+ appropriate steps to allow access to port 8080. See your System
+ Administrator is you have any questions about firewall security and policy
+ regarding port access. We will further assume that you have set an
+ environment variable, <code>$TOMCAT_HOME</code>, to the base directory
+ of your Tomcat installation.</p>
+
+ <p>There are a number of ways to deploy a WAR file to Tomcat, though we
+ recommend using a context file. A context file is a XML file that provides
+ Tomcat with "context" for using a particular web application. In order to
+ create a context file for the CAS-Curator, open your favorite text editor
+ and copy and paste the following:</p>
+
+ <source><![CDATA[<Context path="/my-curator"
+docBase="/usr/local/src/cas-curator-v1.0.0/target/cas-curator-1.0.0.war">
+ <Parameter name="org.apache.oodt.security.sso.implClass"
+ value="org.apache.oodt.security.sso.DummyImpl"/>
+ <Parameter name="org.apache.oodt.cas.curator.projectName"
+ value="My Project"/>
+</Context>
+ ]]></source>
+
+ <p>Save the context file to
+ <code>$TOMCAT_HOME/conf/Catalina/localhost/my-curator.xml</code>. Now you
+ can point a web browser to <a href="http://localhost:8080/my-curator/">
+ http://localhost:8080/my-curator</a> and you should see a log-in screen
+ for CAS-Curator. <em>Note</em>: Tomcat will only use the path attribute
+ if the context is defined in server.xml. Tomcat uses the xml file name
+ instead. See the
+ <a href="http://tomcat.apache.org/tomcat-5.5-doc/config/context.html" class="externalLink">
+ Tomcat documentation</a> for further information</p>
+
+ <img src="../images/basic_login.jpg"/>
+
+ <p>The <code>org.apache.oodt.security.sso.implClass</code> parameter
+ that we set in the context file configures the CAS-Curator for a "dummy"
+ log-in to its Single Sign On service. Because of this, we are able to
+ log into the web application with a blank user name and a blank password.
+ For help in implementing security with CAS-Curator, see our
+ <a href="../user/advanced.html">Advanced Guide.</a></p>
+
+ <img src="../images/basic_page.jpg"/>
+
+ <p>In the next sections, we will talk about setting up staging areas,
+ metadata extractors, and launching a CAS-Filemgr instance into which
+ CAS-Curator will ingest data products.</p>
+
+ </section>
+
+ <a name="section3"/>
+ <section name="Staging Area Setup">
+ <p>Staging areas are directories on your local machine that hold data
+ products to be curated. The staging area can have arbitrary structure.
+ The only requirement that CAS-Curator has with regard to this structure
+ is that the directory structure be mirrored in a metadata generation
+ area. This generation area is used by CAS-Curator to create metadata
+ files to associate with data products.</p>
+
+ <p>For example, if there is a product, say an MP3 file of Bach's <i>Der
+ Geist hilft unsrer Schwachheit auf</i>, in the staging area at:</p>
+
+ <source>
+[staging_area_base]/audio/classical/bach/Der_Geist_hilft.mp3
+ </source>
+
+ <p>Then the CAS-Curator will generate all associated metadata products
+ in <code>[metadata_gen_base]/audio/classical/bach/</code>.</p>
+
+ <p>In order to set up the staging area and the metadata generation area,
+ we first create base directories for each, shown below:</p>
+
+ <source>
+> mkdir /usr/local/staging
+> mkdir /usr/local/staging/products
+> mkdir /usr/local/staging/metadata
+ </source>
+
+ <p>Next, we will set the following parameters in the CAS-Curator context file:</p>
+
+<source><![CDATA[<Parameter name="org.apache.oodt.cas.curator.stagingAreaPath"
+ value="/usr/local/staging/products"/>
+
+<Parameter name="org.apache.oodt.cas.curator.metAreaPath"
+ value="/usr/local/staging/metadata"/>
+
+<Parameter name="org.apache.oodt.cas.curator.metExtension"
+ value=".met"/>]]></source>
+
+ <p>The <code>org.apache.oodt.cas.curator.stagingAreaPath</code> parameter should
+ be set to the product staging area and the
+ <code>org.apache.oodt.cas.curator.metAreaPath</code> should be set to the metedata
+ generation area. Additionally, we specified the parameter
+ <code>org.apache.oodt.cas.curator.metExtension</code> to be <code>.met</code>.
+ This parameter specifies the extension for all of the metadata files produced in
+ the metadata generation area.</p>
+
+ <p>For illustrative purposes, we will load an mp3 file into the staging area:</p>
+
+ <source>
+> mkdir /usr/local/staging/products/mp3
+> cd /usr/local/staging/products/mp3
+> curl -LO http://oodt.apache.org/components/maven/curator/media/Bach-SuiteNo2.mp3
+ </source>
+
+ <p>We should note that this music file was produced by the
+ <a href="http://www.fuldaer-symphonisches-orchester.de/">Fulda Symphonic
+ Orchestra</a> and is freely distributed under the
+ <a href="http://www.eff.org/about/">EFF Open Audio License</a>, version 1.0. We
+ have edited the ID3 tag of this file (in order to make the later metadata extraction
+ example more interesting), but original authorship is retained. Now back to the
+ tutorial...</p>
+
+ <p>Remember that we need to mirror the product staging area and the metadata
+ generation area, so will also need to create the matching directory structure
+ there:</p>
+
+ <source>
+> mkdir /usr/local/staging/metadata/mp3
+ </source>
+
+ <p>Once you restart Tomcat, the changes you have made to the context file will be
+ used. The staging area will now be set to <code>/usr/local/staging/products</code>.
+ See the screenshot below:</p>
+
+ <img src="../images/basic_staging.jpg"/>
+
+ <p>Double-clicking on "mp3", we can see that the staging area path in the top left
+ is now <code>/mp3</code> and <code>Bach-SuiteNo2.mp3</code> can be seen the main
+ left staging pane. For the time-being, there is no metadata detected (as reported
+ in the main right staging pane), but in the next section, we will be setting up a
+ basic, command-line metadata extractor in order to show how extractors are
+ integrated into CAS-Curator.</p>
+
+ </section>
+
+ <a name="section4"/>
+ <section name="Extractor Setup">
+ <p>The CAS-Curator uses ancillary programs called metadata extractors to produce
+ the metadata that it associates with products. More information about metadata
+ extractors can be found in the
+ <a href="../../metadata/user/extractorBasics.html">
+ Extractor Basics</a> User's Guide.</p>
+
+ <p>Like the staging area, we first need to set up an area in the file system for
+ metadata extractors. We will call this directory <code>extractors</code>:</p>
+
+ <source>
+ > mkdir /usr/local/extractors
+ </source>
+
+ <p>In order to register the metadata extractor path with the CAS-Curator, we will
+ need to add another parameter to the web application's context file. Add the
+ following parameter:</p>
+
+<source><![CDATA[<Parameter name="org.apache.oodt.cas.curator.metExtractorConf.uploadPath"
+ value="/usr/local/extractors" />
+ ]]></source>
+
+ <p>We are going to make a metadata extractor that will extractor ID3 tag metadata,
+ such as author, title, resource type, etc from mp3s. As a first step, we will create
+ a directory for the new extractor. The name of this directory is important, because
+ CAS-Curator will use the directory name to register the extractor. We will name this
+ directory <code>mp3extractor</code></p>
+
+<source>
+> mkdir /usr/local/extractors/mp3extractor
+</source>
+
+ <p>While we could write a custom extractor in Java for the Cas-Curator, there are
+ multiple existing software packages that read mp3 ID3 tags. For these situations,
+ where an external, command-line extractor exists, we have developed the
+ <code>ExternMetExtractor</code> class in the CAS-Metadata project.</p>
+
+ <p>For this example, we are going to leaverage an existing, open source mime-type
+ detector with text and metadata parsing capabilities called
+ <a href="http://lucene.apache.org/tika/">Apache Tika</a>. Tika parses a number of
+ different common data formats, including a number of audio formats like mp3.
+ I'll leave it to the reader of this guide to download and install Tika. We
+ will assume that the latest release of the tika-app jar is in the
+ <code>mp3extractor</code> directory.</p>
+
+ <p>We have a little work to do to convert the output of Tika into a metadata file
+ compatible with CAS-Curator. By default, Tika produces metadata in a "key: value"
+ format as shown in the command-line session below:</p>
+
+<source><![CDATA[
+> java -jar tika-app-0.5-SNAPSHOT.jar -m \
+ /usr/local/staging/products/mp3/Bach-SuiteNo2.mp3
+Author: Johann Sebastian Bach
+Content-Type: audio/mpeg
+resourceName: Bach-SuiteNo2.mp3
+title: Bach Cello Suite No 2
+ ]]></source>
+
+ <p>With a little AWK magic, we can convert this output to the Cas-Metadata xml
+ format:</p>
+ <!-- FIXME: change namespace URI? -->
+<source><![CDATA[
+> java -jar tika-app-0.5-SNAPSHOT.jar -m \
+ /usr/local/staging/products/mp3/Bach-SuiteNo2.mp3 | awk -F:\
+ 'BEGIN \
+ {print "<cas:metadata xmlns:cas=\"http://oodt.jpl.nasa.gov/1.0/cas\">"}\
+ {print "<keyval><key>"$1"</key><val>"substr($2,2)"</val></keyval>"}\
+ END {print "</cas:metadata>"}'
+<cas:metadata xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
+<keyval><key>Author</key><val>Johann Sebastian Bach</val></keyval>
+<keyval><key>Content-Type</key><val>audio/mpeg</val></keyval>
+<keyval><key>resourceName</key><val>Bach-SuiteNo2.mp3</val></keyval>
+<keyval><key>title</key><val>Bach Cello Suite No 2</val></keyval>
+</cas:metadata>
+ ]]></source>
+
+ <p>Cool as a one line format translater is, we are actually going to have to
+ do a little more work to create an extractor capable of producing metadata
+ for CAS-Curator. A requirement for metadata extractors that are to be integrated
+ with CAS-Curator is that they product three pieces of metadata:</p>
+
+ <ul>
+ <li>ProductType</li>
+ <li>FileLocation</li>
+ <li>Filename</li>
+ </ul>
+
+ <p>We should note that this is NOT a general requirement of all metadata
+ extractors, but a ramification of the current implementation of CAS-Curator.
+ In order to product this extra metadata, we will develop a small Python
+ script:</p>
+
+<source><![CDATA[
+#!/usr/bin/python
+
+import os
+import sys
+
+fullPath = sys.argv[1]
+pathElements = fullPath.split("/");
+fileName = pathElements[len(pathElements)-1]
+fileLocation = fullPath[:(len(fullPath)-len(fileName))]
+productType = "MP3"
+
+cmd = "java -jar /Users/woollard/Desktop/extractors/mp3extractor/"
+cmd += "tika-app-0.5-SNAPSHOT.jar -m "+fullPath+" | awk -F:"
+cmd += " 'BEGIN {print \"<cas:metadata xmlns:cas="
+cmd += "\\\"http://oodt.jpl.nasa.gov/1.0/cas\\\">\"}"
+cmd += " {print \"<keyval><key>\"$1\"</key><val>\"substr($2,2)\""
+cmd += "</val></keyval>\"}' > "+fileName+".met"
+
+os.system(cmd)
+
+f = open(fileName+".met", 'a')
+f.write('<keyval><key>ProductType</key><val>+productType)
+f.write('</val></keyval>\n<keyval><key>Filename</key><val>')
+f.write(fileName+'</val></keyval>\n'<keyval><key>FileLocation')
+f.write('</key><val>'+fileLocation+'</val></keyval>\n')
+f.write('</cas:metadata>')
+f.close()
+]]></source>
+
+ <p>We'll assume that you have Python installed at <code>/usr/bin/python</code>
+ and you have named this script <code>mp3PythonExtractor.py</code> and placed
+ it in <code>/usr/local/extractors/mp3extractor</code>. We'll need
+ to make sure it is executable from the command-line:</p>
+
+<source><![CDATA[
+> cd /usr/local/extractors/mp3extractor
+> chmod +x mp3PythonExtractor.py
+> ./mp3PythonExtractor.py \
+ /usr/local/staging/products/mp3/Bach-SuiteNo2.mp3
+<cas:metadata xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
+<keyval><key>Author</key><val>Johann Sebastian Bach</val></keyval>
+<keyval><key>Content-Type</key><val>audio/mpeg</val></keyval>
+<keyval><key>resourceName</key><val>Bach-SuiteNo2.mp3</val></keyval>
+<keyval><key>title</key><val>Bach Cello Suite No 2</val></keyval>
+<keyval><key>ProductType</key><val>MP3</val></keyval>
+<keyval><key>Filename</key><val>Bach-SuiteNo2.mp3</val></keyval>
+<keyval><key>FileLocation</key><val>/usr/local/staging/products/mp3
+</val></keyval>
+</cas:metadata>
+]]></source>
+
+ <p>Now that we have a metadata extractor that meets our requirements (it's
+ callable from the command-line, it produces CAS-Metadata compatible XML, and
+ it extracts <i>ProductType</i>, <i>Filename</i>, and <i>FileLocation</i>),
+ the next step is to create an <code>ExternMetExtractor</code> configuration
+ file. This file will configure CAS-Metadata's <code>ExternMetExtractor</code>
+ to call the <code>mp3PythonExtractor.py</code> script correctly.</p>
+
+ <p>There is more information about <code>ExternMetExtractor</code>
+ configuration available in CAS-Metadata's
+ <a href="http://oodt.jpl.nasa.gov/cas-metadata/user/extractorBasics.html">
+ Extractor Basics</a> User's Guide. For the purposes of this guide, we will
+ assume that the reader is familiar with configuration of this extractor, so we
+ will just present the configuration below (we assume that you name this file
+ <code>mp3PythonExtractor.config</code>):</p>
+
+<source><![CDATA[
+<?xml version="1.0" encoding="UTF-8"?>
+<cas:externextractor xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
+ <exec workingDir="">
+ <extractorBinPath>
+/usr/local/extractors/mp3extractor/mp3PythonExtractor.py
+ </extractorBinPath>
+ <args>
+ <arg isDataFile="true"/>
+ </args>
+ </exec>
+</cas:externextractor>
+]]></source>
+
+ <p>The last step in configuring our mp3 metadata extractor is to provide a
+ properties file for CAS-Curator so that it knows how to call the
+ <code>ExternMetExtractor</code>. Each extractor used by CAS-Curator needs
+ a <code>config.properties</code> file. This file sets two properties:</p>
+
+ <ul>
+ <li><code>extractor.classname</code></li>
+ <li><code>extractor.config.files</code></li>
+ </ul>
+
+ <p>Create a <code>config.properties</code> file (this name is important for
+ CAS-Curator to pick up the cofiguration) in the
+ <code>/usr/local/extractors/mp3extractor</code> directory. This file should
+ consist of the following parameters:</p>
+
+<source>
+extractor.classname=org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
+extractor.config.files=/usr/local/extractors/mp3extractor/mp3PythonExtractor.config
+</source>
+
+ <p>To recap, we first created a Python script that calls
+ <a href="http://lucene.apache.org/tika/">Apache Tika</a> to extract metadata
+ from mp3 files. Then we created a configuration file that configures
+ CAS-Metadata's <code>ExternMetExtractor</code> to call this python script.
+ Finally, we created a properties file for the CAS-Curator to call the
+ <code>ExternMetExtractor</code>. To confirm the configuration of this
+ extractor, we can long list the extractor directory:</p>
+
+ <source>
+> cd /usr/local/extractors/mp3extractor
+> ls -l
+total 51448
+-rw-r--r-- 1 - - 167 Nov 27 13:50 config.properties
+-rw-r--r-- 1 - - 328 Nov 27 13:49 mp3PythonExtractor.config
+-rwxr-xr-x 1 - - 702 Nov 27 13:49 mp3PythonExtractor.py
+-rw-r--r-- 1 - - 26325155 Nov 27 13:46 tika-app-0.5-SNAPSHOT.jar
+ </source>
+
+ <p>Once you restart Tomcat, the change you have made to the context file will be
+ used. The extractor area will now be set to <code>/usr/local/extractors</code>.
+ See the screenshot below:</p>
+
+ <img src="../images/basic_extractor.jpg"/>
+
+ <p>In the above screenshot, we see that, upon clicking on the mp3 file,
+ metadata produced by the <code>mp3extractor</code> is shown in the main right
+ staging pane. Now staging and extraction are set up. In the next section, we
+ will set up a CAS-Filemgr instance and show how CAS-Curator can be used to
+ ingest products.</p>
+
+ </section>
+
+ <a name="section5"/>
+ <section name="File Manager Configuration">
+
+ <p>The final step in our basic configuration of CAS-Curator is to configure a
+ CAS-Filemgr instance into which we will ingest our mp3s. There is a lot of
+ information on configuring the CAS-Filemgr in its
+ <a href="../../filemgr/user/">User's Guide</a>. We will
+ assume familiarity with the CAS-Filemgr for the remainder of this guide.</p>
+
+ <p>In this guide, we will focus on the basic configuration necessary to tailor
+ a vanilla build of the CAS-Filemgr for use with our CAS-Curator. We will assume
+ that you have built the latest release of the CAS-Filemgr (v1.8.0 at the time of
+ this writing) and installed it at:</p>
+
+ <source>
+/usr/local/src/cas-filemgr-1.8.0/
+ </source>
+
+ <p>The first step in configuring the CAS-Filemgr is to edit the
+ <code>filemgr.properties</code> file in the <code>etc</code> directory. This
+ file controls the basic configuration of the CAS-Filemgr, including its
+ various extension points. For this example, we are going to run the CAS-Filemgr
+ in a very basic configuration, with both its repository and validation layer
+ controlled by XML configuration, a local data transfer factory, and a
+ <a href="http://lucene.apache.org/java/docs/">Lucene</a>-based metadata
+ catalog.</p>
+
+ <p>In order to create this configuration, we will change the following
+ parameters in the <code>filemgr.properties</code> file:</p>
+
+ <ul>
+ <li>Set <code>org.apache.oodt.cas.filemgr.catalog.lucene.idxPath</code>
+ to <code>/usr/local/src/cas-filemgr-1.8.0/catalog</code>. This parameter
+ tells CAS-Filemgr where to create the Lucene index. The first time you start
+ the CAS-Filemgr, make sure that this file does NOT exist. The CAS-Filemgr
+ will take care of creating it and populating it with the appropriate files.
+ </li>
+ <li>Set <code>org.apache.oodt.cas.filemgr.repositorymgr.dirs</code> to
+ <code>file:///usr/local/src/cas-filemgr-1.8.0/policy/mp3</code>. The value needs
+ to be a URL and we are pointing to a policy folder we will create.</li>
+ <li>Set <code>org.apache.oodt.cas.filemgr.validation.dirs</code> to
+ <code>file:///usr/local/src/cas-filemgr-1.8.0/policy/mp3</code>. Like the last
+ parameter we configured, this parameter should be a URL and point to the
+ same policy folder.</li>
+ </ul>
+
+ <p>With these changes, you are ready to run the basic configuration of the
+ CAS-Filemgr. In order to make this install of CAS-Filemgr work with our
+ CAS-Curator, however, we will also need to augment the basic policy for both
+ the repository manager and validation layer.</p>
+
+ <p>First, we will create a policy directory for our mp3 curator. We can do this
+ by moving the current policy files from the base <code>policy</code> directory to
+ a <code>mp3</code> directory:</p>
+
+ <source>
+> cd /usr/local/src/cas-filemgr-1.8.0/policy
+> mkdir mp3
+> mv *.xml mp3/
+ </source>
+
+ <p>Next, we will add a product type to our instance of the CAS-Filemgr. In order
+ to do this, we will edit the <code>product-types.xml</code> file in the
+ <code>policy/mp3</code> directory. We will add the following as a child of the
+ <code><cas:producttypes></code> node (we purposefully elide any
+ commentary on the details of this configuration and leave it to the
+ reader):</p>
+
+<source><![CDATA[
+<type id="urn:example:MP3" name="MP3">
+ <repository path="file:///usr/local/archive"/>
+ <versioner class="org.apache.oodt.cas.filemgr.versioning.BasicVersioner"/>
+ <description>A product type for mp3 audio files.</description>
+ <metExtractors>
+ <extractor
+ class="org.apache.oodt.cas.filemgr.metadata.extractors.CoreMetExtractor">
+ <configuration>
+ <property name="nsAware" value="true" />
+ <property name="elementNs" value="CAS" />
+ <property name="elements"
+ value="ProductReceivedTime,ProductName,ProductId" />
+ </configuration>
+ </extractor>
+ </metExtractors>
+</type>
+]]></source>
+
+ <p>Next, we will create a number of elements in the <code>elements.xml</code>
+ file. There will be an element node for each of the metadata elements we
+ want to associate with MP3 products. We can do this be adding the following
+ as children nodes of <code><cas:elements></code> tag:</p>
+
+<source><![CDATA[
+<element id="urn:example:FileLocation" name="FileLocation">
+ <dcElement/>
+ <description/>
+</element>
+<element id="urn:example:ProductType" name="ProductType">
+ <dcElement/>
+ <description/>
+</element>
+<element id="urn:example:Author" name="Author">
+ <dcElement/>
+ <description/>
+</element>
+<element id="urn:example:Filename" name="Filename">
+ <dcElement/>
+ <description/>
+</element>
+<element id="urn:example:resourceName" name="resourceName">
+ <dcElement/>
+ <description/>
+</element>
+<element id="urn:example:title" name="title">
+ <dcElement/>
+ <description/>
+</element>
+<element id="urn:example:Content-Type" name="tContent-Type">
+ <dcElement/>
+ <description/>
+</element>
+]]></source>
+
+ <p>After we have configured the new metadata elements, we will need to map
+ these elements to our MP3 product. We do this by editing the
+ <code>product-type-element-map.xml</code> file in the <code>policy/mp3</code>
+ directory to add the following as a child node to
+ <code><cas:producttypemap></code>:</p>
+
+<source><![CDATA[
+<type id="urn:example:MP3">
+ <element id="urn:example:FileLocation"/>
+ <element id="urn:example:ProductType"/>
+ <element id="urn:example:Author"/>
+ <element id="urn:example:Filename"/>
+ <element id="urn:example:resourceName"/>
+ <element id="urn:example:title"/>
+ <element id="urn:example:Content-Type"/>
+</type>
+]]></source>
+
+ <p>A final configuration step will be to create the archive area for the
+ CAS-Filemgr (You'll remember that we set the repository path for MP3 products
+ in the <code>product-types.xml</code> file). In order to do this, we will just
+ make the directory:</p>
+
+ <source>
+> mkdir /usr/local/archive
+ </source>
+
+ <p>We will now start the CAS-Filemgr instance. This instance will run on
+ port 9000 by default. In order to start the Filemgr, we will issue the
+ following commands:</p>
+
+ <source>
+> cd /usr/local/src/cas-filemgr-1.8.0/bin
+> ./filemgr start
+ </source>
+
+ <p>Now that we have started the CAS-Filemgr, we will need to configure the
+ CAS-Curator to use this Filemgr instance. In order to do this, we will add
+ the following parameters to the CAS-Curator context file:</p>
+
+<source><![CDATA[
+<Parameter name="org.apache.oodt.cas.fm.url"
+ value="http://localhost:9000"/>
+
+<Parameter name="org.apache.oodt.cas.curator.dataDefinition.uploadPath"
+ value="/usr/local/src/cas-filemgr-1.8.0/policy" />
+
+<Parameter name="org.apache.oodt.cas.curator.fmProps"
+ value="/usr/local/src/cas-filemgr-1.8.0/etc/filemgr.properties"/>
+]]></source>
+
+ <p>Once we restart Tomcat, the CAS-Curator will now recognize the policy
+ and properties of the configured CAS-Filemgr instance and use this
+ instance during the ingest process.</p>
+
+ <img src="../images/basic_filemgr.jpg"/>
+
+ <p>From the above image, you can see that the CAS-Filemgr configuration
+ has been picked up by CAS-Curator. If you double-click on MP3 in the left
+ filemgr main pane, you will see the product types that are contained in
+ the mp3 policy: <code>GenericFile</code> which was part of the default
+ configuration, and <code>MP3</code> which we added. Clicking on MP3,
+ we bring up the ingest interface in the right filemgr main pane.</p>
+
+ <img src="../images/basic_ingest.jpg"/>
+
+ <p>Once we drag the Bach-SuiteNo2.mp3 from the staging pane to the green
+ box in the right filemgr main pane, we can then select a metadata extractor
+ from the pulldown menu and click on the "Save as Ingestion Task." This will
+ add the Ingest task to the bottom pane as illustrated in the above
+ screenshot. In order to test file ingestion, we will click on the "Start"
+ button.</p>
+
+ <p>As a final step, we will confirm that the mp3 file was archived. We
+ can do this by listing the archive:</p>
+
+ <source>
+> ls -lR /usr/local/archive
+total 0
+drwxr-xr-x 3 - - 102 Nov 27 23:53 Bach-SuiteNo2.mp3
+
+/usr/local/archive//Bach-SuiteNo2.mp3:
+total 9344
+-rw-r--r-- 1 - - 4781079 Nov 25 20:14 Bach-SuiteNo2.mp3
+ </source>
+
+ <p>Worth noting is the fact that our configuration of the CAS-Filemgr
+ included a selection of the <code>BasicVersioner</code> as the MP3
+ product type versioner. This means that mp3s are placed at
+ [archive_base]/[filename]/[filename] during ingest.</p>
+
+ <p>We have now completed a base configuration of the CAS-Curator. In
+ the <a href="../user/advanced.html">Advanced Guide</a>, we will cover
+ topics like changing the look and feel of the Curator, and security
+ configuration.</p>
+
+ </section>
+ </body>
+</document>
http://git-wip-us.apache.org/repos/asf/oodt/blob/a47b088a/filemgr/pom.xml
----------------------------------------------------------------------
diff --git a/filemgr/pom.xml b/filemgr/pom.xml
index cfa1327..52c1e53 100644
--- a/filemgr/pom.xml
+++ b/filemgr/pom.xml
@@ -75,8 +75,30 @@
<artifactId>commons-dbcp</artifactId>
</dependency>
<dependency>
- <groupId>commons-httpclient</groupId>
- <artifactId>commons-httpclient</artifactId>
+ <groupId>org.apache.httpcomponents</groupId>
+ <artifactId>httpclient</artifactId>
+ <exclusions>
+ <exclusion>
+ <artifactId>commons-logging</artifactId>
+ <groupId>commons-logging</groupId>
+ </exclusion>
+ </exclusions>
+ </dependency>
+ <dependency>
+ <groupId>org.slf4j</groupId>
+ <artifactId>slf4j-api</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>org.slf4j</groupId>
+ <artifactId>slf4j-log4j12</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>org.slf4j</groupId>
+ <artifactId>jcl-over-slf4j</artifactId>
+ </dependency>
+ <dependency>
+ <groupId>org.slf4j</groupId>
+ <artifactId>slf4j-simple</artifactId>
</dependency>
<dependency>
<groupId>commons-io</groupId>
http://git-wip-us.apache.org/repos/asf/oodt/blob/a47b088a/filemgr/src/main/java/org/apache/oodt/cas/filemgr/catalog/Catalog.java
----------------------------------------------------------------------
diff --git a/filemgr/src/main/java/org/apache/oodt/cas/filemgr/catalog/Catalog.java b/filemgr/src/main/java/org/apache/oodt/cas/filemgr/catalog/Catalog.java
index f056336..0a10fed 100644
--- a/filemgr/src/main/java/org/apache/oodt/cas/filemgr/catalog/Catalog.java
+++ b/filemgr/src/main/java/org/apache/oodt/cas/filemgr/catalog/Catalog.java
@@ -324,7 +324,7 @@ public interface Catalog extends Pagination {
* @throws CatalogException
* If any error occurs (e.g., the layer isn't initialized).
*/
- ValidationLayer getValidationLayer();
+ ValidationLayer getValidationLayer() throws CatalogException;
/**
*