You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2011/07/20 23:49:55 UTC

svn commit: r1148947 - in /incubator/stanbol/trunk: commons/opennlp/ commons/opennlp/src/test/java/org/apache/commons/opennlp/ data/ data/opennlp/ data/opennlp/lang/ data/opennlp/lang/en/ data/opennlp/lang/en/src/ data/opennlp/lang/en/src/main/ data/op...

Author: rwesten
Date: Wed Jul 20 21:49:50 2011
New Revision: 1148947

URL: http://svn.apache.org/viewvc?rev=1148947&view=rev
Log:
Changes in the Stanbol data and defaultdata bundles:

* The single defaultdata bundle was replaced by three new bundles to make the Stanbol configuration more flexible. The new bundles are now located under /data. The /defaultdata was removed completely. /data bundles that are included in the Stanbol Launchers are collected in the "defaultdata" profile of the /data reactor pom.
* Some adaptions to the parent pom, opennlp-ner engine, commons.opennlp bundle to reflect this restructuring of the default data bundles.
* replaced the defaultdata bundles with the three new one in the different Stanbol launchers
* The shell files previously used to download data files from the internet are replaces by ant files that are called by using the maven-ant-plugin. Therefore the download of the required bundles is now platform independent and done during the normal maven build process.
* The modules for DBPedia.org and DBLP where removed, because this functionality is now covered by the according indexing tools. The folders are still there. A readme file tells users to use the Indexing tool.
* Added a site that configures DBPedia.org as referenced site using the remote services and a local cache for retrieved entities.
* Added a parent pom for data currently only setting the <Bundle-Category> to "Stanbol Data"

open:

* Add configurations for additional sites
* Add bundles for OpenNLP models of languages other than english

The goal is to use the /data bundles as an easy way to add additional functionality (more referenced sites, support for other languages) to stanbol. All such Bundles need only be installed to the Stanbol OSGI environment.

Added:
    incubator/stanbol/trunk/data/opennlp/
    incubator/stanbol/trunk/data/opennlp/lang/
    incubator/stanbol/trunk/data/opennlp/lang/en/
    incubator/stanbol/trunk/data/opennlp/lang/en/README.md
    incubator/stanbol/trunk/data/opennlp/lang/en/download_models.xml   (with props)
    incubator/stanbol/trunk/data/opennlp/lang/en/pom.xml   (with props)
    incubator/stanbol/trunk/data/opennlp/lang/en/src/
    incubator/stanbol/trunk/data/opennlp/lang/en/src/main/
    incubator/stanbol/trunk/data/opennlp/lang/en/src/main/resources/
    incubator/stanbol/trunk/data/opennlp/ner/
    incubator/stanbol/trunk/data/opennlp/ner/en/
    incubator/stanbol/trunk/data/opennlp/ner/en/README.md
    incubator/stanbol/trunk/data/opennlp/ner/en/download_models.xml   (with props)
    incubator/stanbol/trunk/data/opennlp/ner/en/pom.xml   (with props)
    incubator/stanbol/trunk/data/opennlp/ner/en/src/
    incubator/stanbol/trunk/data/opennlp/ner/en/src/main/
    incubator/stanbol/trunk/data/opennlp/ner/en/src/main/resources/
    incubator/stanbol/trunk/data/parent/
    incubator/stanbol/trunk/data/sites/dbpediacached/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/main/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/main/resources/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/main/resources/org/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/main/resources/org/apache/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/main/resources/org/apache/stanbol/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/main/resources/org/apache/stanbol/data/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/main/resources/org/apache/stanbol/data/site/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/main/resources/org/apache/stanbol/data/site/dbpedia/
    incubator/stanbol/trunk/data/sites/dbpediacached/src/main/resources/org/apache/stanbol/data/site/dbpedia/cached/
    incubator/stanbol/trunk/data/sites/dbpediadefault/
    incubator/stanbol/trunk/data/sites/dbpediadefault/.classpath   (with props)
    incubator/stanbol/trunk/data/sites/dbpediadefault/.project   (with props)
    incubator/stanbol/trunk/data/sites/dbpediadefault/README.md
    incubator/stanbol/trunk/data/sites/dbpediadefault/download_index.xml   (with props)
    incubator/stanbol/trunk/data/sites/dbpediadefault/pom.xml   (with props)
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/dbpedia_43k.solrindex.ref
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-dbpedia.config
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.core.site.CacheImpl-dbpedia.config
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.site.referencedSite-dbpedia.config
    incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.yard.solr.impl.SolrYard-dbpedia.config
Removed:
    incubator/stanbol/trunk/data/sites/dblp/pom.xml
    incubator/stanbol/trunk/data/sites/dblp/src/
    incubator/stanbol/trunk/data/sites/dbpedia/pom.xml
    incubator/stanbol/trunk/data/sites/dbpedia/src/
    incubator/stanbol/trunk/data/sites/pom.xml
    incubator/stanbol/trunk/defaultdata/
Modified:
    incubator/stanbol/trunk/commons/opennlp/pom.xml
    incubator/stanbol/trunk/commons/opennlp/src/test/java/org/apache/commons/opennlp/ClasspathDataFileProvider.java
    incubator/stanbol/trunk/data/README.md
    incubator/stanbol/trunk/data/pom.xml
    incubator/stanbol/trunk/data/sites/dblp/README.md
    incubator/stanbol/trunk/data/sites/dbpedia/README.md
    incubator/stanbol/trunk/enhancer/engines/opennlp-ner/pom.xml
    incubator/stanbol/trunk/enhancer/engines/opennlp-ner/src/test/java/org/apache/stanbol/enhancer/engines/opennlp/impl/ClasspathDataFileProvider.java
    incubator/stanbol/trunk/enhancer/engines/taxonomylinking/pom.xml
    incubator/stanbol/trunk/launchers/full/src/main/bundles/list.xml
    incubator/stanbol/trunk/launchers/kres/src/main/bundles/list.xml
    incubator/stanbol/trunk/launchers/stable/src/main/bundles/list.xml
    incubator/stanbol/trunk/launchers/stateless/src/main/bundles/list.xml
    incubator/stanbol/trunk/parent/pom.xml

Modified: incubator/stanbol/trunk/commons/opennlp/pom.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/commons/opennlp/pom.xml?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/commons/opennlp/pom.xml (original)
+++ incubator/stanbol/trunk/commons/opennlp/pom.xml Wed Jul 20 21:49:50 2011
@@ -150,7 +150,7 @@
     </dependency>
     <dependency> <!-- used to provide the English models while testing -->
       <groupId>org.apache.stanbol</groupId>
-      <artifactId>org.apache.stanbol.defaultdata</artifactId>
+      <artifactId>org.apache.stanbol.data.opennlp.lang.en</artifactId>
       <scope>test</scope>
     </dependency>
     

Modified: incubator/stanbol/trunk/commons/opennlp/src/test/java/org/apache/commons/opennlp/ClasspathDataFileProvider.java
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/commons/opennlp/src/test/java/org/apache/commons/opennlp/ClasspathDataFileProvider.java?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/commons/opennlp/src/test/java/org/apache/commons/opennlp/ClasspathDataFileProvider.java (original)
+++ incubator/stanbol/trunk/commons/opennlp/src/test/java/org/apache/commons/opennlp/ClasspathDataFileProvider.java Wed Jul 20 21:49:50 2011
@@ -28,7 +28,12 @@ import org.slf4j.LoggerFactory;
 public class ClasspathDataFileProvider implements DataFileProvider {
 
     private final Logger log = LoggerFactory.getLogger(getClass());
-    public static final String RESOURCE_BASE_PATH = "org/apache/stanbol/defaultdata/opennlp/";
+    /*
+     * NOTE: This path needs to be the same as the one used by the
+     *       org.apache.stanbol.data.opennlp.lang.en bundle to store the 
+     *       OpenNLP models
+     */
+    public static final String RESOURCE_BASE_PATH = "org/apache/stanbol/data/opennlp/";
     
     private final String symbolicName;
     

Modified: incubator/stanbol/trunk/data/README.md
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/README.md?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/data/README.md (original)
+++ incubator/stanbol/trunk/data/README.md Wed Jul 20 21:49:50 2011
@@ -1,20 +1,49 @@
-# Data files for optional extensions of the Stanbol distributions
+# Data files and configurations for Stanbol
 
-This source repository holds the pom.xml file and folder structure to build
-optional packages for Apache Stanbol.
+This source repository holds artifacts that are used to load 
+
+* OSGI configurations and/or
+* data files
+
+to a Stanbol Environment.
 
 To avoid loading subversion repository with large binary files this artifacts
 are typically not included but need to be build/precomputed or downloaded
 form other sites.
 The the documentations of the according module for details.
 
-## DataFileProvider Service
+Modules of this repository tree are typically NOT part of the Stanbol reactor.
+Because they are considered optional and typically it is necessary to download/
+precompute some resources users might not want to do for each build.
+
+Bundles used as default configuration by the Stanbol Launchers are also
+available by included Maven repositories and will be downloaded during the
+normal Stanbol build (if not yet available in the local cache). 
+
+## OpenNLP
+
+This sub-folder contains bundles that contain several OpenNLP models. Such
+bundles will contribute such files to the Stanbol DataFileProvider.
+
+## Sites
+
+This sub-folder contains bundles that install ReferencedSites to the
+Stanbol Entityhub. Typically such bundles only contain the configuration but
+do not include the actual data. However for small data sets the index might
+also be included in the bundle.
+See the README.md files for details.
+
+## Notes
+
+Bundles created by the various modules depend on the following two components:
+
+### DataFileProvider Service
 
 The DataFileProvoder Service is typically used by components that need to load
 big binary files to Apache Stanbol.
 See {stanbol-root}/commons/stanboltools/datafileprovider for details
 
-## Bundleprovider
+### Bundleprovider
 
 The Bundleprovider is an extension to the Apache Sling installer framework
 and supports to load multiple configuration files form a single bundle.
@@ -26,3 +55,4 @@ Services).
 See {stanbol-root}/commons/installer/bundleprovider for details.
 
 
+

Added: incubator/stanbol/trunk/data/opennlp/lang/en/README.md
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/opennlp/lang/en/README.md?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/opennlp/lang/en/README.md (added)
+++ incubator/stanbol/trunk/data/opennlp/lang/en/README.md Wed Jul 20 21:49:50 2011
@@ -0,0 +1,21 @@
+# Data files Bundles for OpenNLP
+
+This source repository only holds the pom.xml file and folder structure of this bundle.
+
+To avoid loading subversion repository with large binary files this artifact has to be build and deployed manually to retrieve precomputed models from other sites.
+
+
+## Downloading the OpenNLP statistical model 
+
+The OpenNLP models are downloaded from 
+
+    http://opennlp.sourceforge.net/models-1.5
+
+This url is defined as property in the 'pom.xml'
+The list of downloaded file is defined within the 'download_models.xml'
+
+## NOTE
+
+Using this bundles is only an alternative of manually copying the required OpenNLP models to the '{stanbol-installation}/sling/datafiles'.
+
+In addition model files in this folder have precedence to models provided by this bundle.

Added: incubator/stanbol/trunk/data/opennlp/lang/en/download_models.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/opennlp/lang/en/download_models.xml?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/opennlp/lang/en/download_models.xml (added)
+++ incubator/stanbol/trunk/data/opennlp/lang/en/download_models.xml Wed Jul 20 21:49:50 2011
@@ -0,0 +1,33 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<project name="OpenNLP Model Download Helper" default="download" basedir=".">
+  <description>
+    Contains only a singel target that is used by the Maven Ant
+    Plugin to download OpenNLP Models from the Web
+  </description>
+   
+  <target name="download">
+    <copy todir="${target.directory}" flatten="true">
+      <resources>
+        <url url="${model.url}/en-sent.bin"/>
+        <url url="${model.url}/en-pos-perceptron.bin"/>
+        <url url="${model.url}/en-chunker.bin"/>
+      </resources>
+    </copy>
+  </target>
+</project>
\ No newline at end of file

Propchange: incubator/stanbol/trunk/data/opennlp/lang/en/download_models.xml
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Added: incubator/stanbol/trunk/data/opennlp/lang/en/pom.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/opennlp/lang/en/pom.xml?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/opennlp/lang/en/pom.xml (added)
+++ incubator/stanbol/trunk/data/opennlp/lang/en/pom.xml Wed Jul 20 21:49:50 2011
@@ -0,0 +1,122 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
+
+  <modelVersion>4.0.0</modelVersion>
+  <parent>
+    <groupId>org.apache.stanbol</groupId>
+    <artifactId>org.apache.stanbol.data.parent</artifactId>
+    <version>0.9.0-incubating-SNAPSHOT</version>
+    <relativePath>../../../parent</relativePath>
+  </parent>
+
+  <groupId>org.apache.stanbol</groupId>
+  <artifactId>org.apache.stanbol.data.opennlp.lang.en</artifactId>
+  <version>1.0.0-incubating</version>
+  <packaging>bundle</packaging>
+
+  <name>Apache Stanbol Data: OpenNLP Models for English</name>
+  <description>
+    Bundle containing all necessary/available models for parsing English language texts. This does not include Models for named entity recocnition (NER).
+  </description>
+  <inceptionYear>2011</inceptionYear>
+
+  <scm>
+    <connection>
+      scm:svn:http://svn.apache.org/repos/asf/incubator/stanbol/trunk/data/opennlp/lang/en
+    </connection>
+    <developerConnection>
+      scm:svn:https://svn.apache.org/repos/asf/incubator/stanbol/trunk/data/opennlp/lang/en
+    </developerConnection>
+    <url>http://incubator.apache.org/stanbol/</url>
+  </scm>
+  <properties>
+    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+    <!-- define the path to/home of the OpenNLP modles-->
+    <opennlp.model.path>org/apache/stanbol/data/opennlp</opennlp.model.path>
+    <opennlp.model.home>http://opennlp.sourceforge.net/models-1.5</opennlp.model.home>
+  </properties>
+
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.apache.felix</groupId>
+        <artifactId>maven-bundle-plugin</artifactId>
+        <configuration>
+          <instructions>
+            <_versionpolicy>$${version;===;${@}}</_versionpolicy>
+
+            <!-- 
+              Extension used to provide files in that directory to the
+              DataFileProvider
+              -->
+            <Data-Files>${opennlp.model.path}</Data-Files>
+            <!-- 
+              Use a priority lower than 0 to allow providers without a
+              defined ranking to override this default data.
+             -->
+            <Data-Files-Priority>
+              -100
+            </Data-Files-Priority>
+          </instructions>
+        </configuration>
+      </plugin>
+      <plugin>
+        <!-- 
+          Ant is used to download the models from the
+          http://opennlp.sourceforge.net site.
+        -->
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-antrun-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>compile</id>
+            <phase>compile</phase>
+            <configuration>
+              <!--
+                TODO: I would like to add an "unless" constraint to the
+                target that prevents execution if Maven operates in offline
+                mode. However I was not able to find out how to obtain this
+                information. ${settings.offline} (as noted by several
+                resources) does not work.
+                Until fixed builds will fail if no internetconnection is
+                available!
+              -->
+              <target>
+                <property name="target.directory" value="${project.basedir}/src/main/resources/${opennlp.model.path}"/>
+                <property name="model.url" value="${opennlp.model.home}"/>
+                                
+                <echo message="copy OpenNLP models"/>
+                <echo message="  FROM ${model.url} "/>
+                <echo message="  TO ${target.directory}"/>
+
+                <ant antfile="${basedir}/download_models.xml">
+                  <target name="download"/>
+                </ant>
+              </target>
+            </configuration>
+            <goals>
+              <goal>run</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+    </plugins>
+  </build>
+</project>

Propchange: incubator/stanbol/trunk/data/opennlp/lang/en/pom.xml
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Added: incubator/stanbol/trunk/data/opennlp/ner/en/README.md
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/opennlp/ner/en/README.md?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/opennlp/ner/en/README.md (added)
+++ incubator/stanbol/trunk/data/opennlp/ner/en/README.md Wed Jul 20 21:49:50 2011
@@ -0,0 +1,21 @@
+# Data files Bundles for OpenNLP
+
+This source repository only holds the pom.xml file and folder structure of this bundle.
+
+To avoid loading subversion repository with large binary files this artifact has to be build and deployed manually to retrieve precomputed models from other sites.
+
+
+## Downloading the OpenNLP statistical model 
+
+The OpenNLP models are downloaded from 
+
+    http://opennlp.sourceforge.net/models-1.5
+
+This url is defined as property in the 'pom.xml'
+The list of downloaded file is defined within the 'download_models.xml'
+
+## NOTE
+
+Using this bundles is only an alternative of manually copying the required OpenNLP models to the '{stanbol-installation}/sling/datafiles'.
+
+In addition model files in this folder have precedence to models provided by this bundle.

Added: incubator/stanbol/trunk/data/opennlp/ner/en/download_models.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/opennlp/ner/en/download_models.xml?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/opennlp/ner/en/download_models.xml (added)
+++ incubator/stanbol/trunk/data/opennlp/ner/en/download_models.xml Wed Jul 20 21:49:50 2011
@@ -0,0 +1,33 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<project name="OpenNLP Model Download Helper" default="download" basedir=".">
+  <description>
+    Contains only a singel target that is used by the Maven Ant
+    Plugin to download OpenNLP Models from the Web
+  </description>
+   
+  <target name="download">
+    <copy todir="${target.directory}" flatten="true">
+      <resources>
+        <url url="${model.url}/en-ner-person.bin"/>
+        <url url="${model.url}/en-ner-location.bin"/>
+        <url url="${model.url}/en-ner-organization.bin"/>
+      </resources>
+    </copy>
+  </target>
+</project>
\ No newline at end of file

Propchange: incubator/stanbol/trunk/data/opennlp/ner/en/download_models.xml
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Added: incubator/stanbol/trunk/data/opennlp/ner/en/pom.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/opennlp/ner/en/pom.xml?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/opennlp/ner/en/pom.xml (added)
+++ incubator/stanbol/trunk/data/opennlp/ner/en/pom.xml Wed Jul 20 21:49:50 2011
@@ -0,0 +1,123 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
+
+  <modelVersion>4.0.0</modelVersion>
+  <parent>
+    <groupId>org.apache.stanbol</groupId>
+    <artifactId>org.apache.stanbol.data.parent</artifactId>
+    <version>0.9.0-incubating-SNAPSHOT</version>
+    <relativePath>../../../parent</relativePath>
+  </parent>
+
+  <groupId>org.apache.stanbol</groupId>
+  <artifactId>org.apache.stanbol.data.opennlp.ner.en</artifactId>
+  <version>1.0.0-incubating</version>
+  <packaging>bundle</packaging>
+
+  <name>Apache Stanbol Data: OpenNLP NER Models for English</name>
+  <description>
+    Bundle containing the NER models for finding Persons, Organizations
+    and Places for English language texts.
+  </description>
+  <inceptionYear>2011</inceptionYear>
+
+  <scm>
+    <connection>
+      scm:svn:http://svn.apache.org/repos/asf/incubator/stanbol/trunk/data/opennlp/ner/en
+    </connection>
+    <developerConnection>
+      scm:svn:https://svn.apache.org/repos/asf/incubator/stanbol/trunk/data/opennlp/ner/en
+    </developerConnection>
+    <url>http://incubator.apache.org/stanbol/</url>
+  </scm>
+  <properties>
+    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+    <!-- define the path to/home of the OpenNLP modles-->
+    <opennlp.model.path>org/apache/stanbol/data/opennlp</opennlp.model.path>
+    <opennlp.model.home>http://opennlp.sourceforge.net/models-1.5</opennlp.model.home>
+  </properties>
+
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.apache.felix</groupId>
+        <artifactId>maven-bundle-plugin</artifactId>
+        <configuration>
+          <instructions>
+            <_versionpolicy>$${version;===;${@}}</_versionpolicy>
+
+            <!-- 
+              Extension used to provide files in that directory to the
+              DataFileProvider
+              -->
+            <Data-Files>${opennlp.model.path}</Data-Files>
+            <!-- 
+              Use a priority lower than 0 to allow providers without a
+              defined ranking to override this default data.
+             -->
+            <Data-Files-Priority>
+              -100
+            </Data-Files-Priority>
+          </instructions>
+        </configuration>
+      </plugin>
+      <plugin>
+        <!-- 
+          Ant is used to download the models from the
+          http://opennlp.sourceforge.net site.
+        -->
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-antrun-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>compile</id>
+            <phase>compile</phase>
+            <configuration>
+              <!--
+                TODO: I would like to add an "unless" constraint to the
+                target that prevents execution if Maven operates in offline
+                mode. However I was not able to find out how to obtain this
+                information. ${settings.offline} (as noted by several
+                resources) does not work.
+                Until fixed builds will fail if no internetconnection is
+                available!
+              -->
+              <target>
+                <property name="target.directory" value="${project.basedir}/src/main/resources/${opennlp.model.path}"/>
+                <property name="model.url" value="${opennlp.model.home}"/>
+                                
+                <echo message="copy OpenNLP models"/>
+                <echo message="  FROM ${model.url} "/>
+                <echo message="  TO ${target.directory}"/>
+
+                <ant antfile="${basedir}/download_models.xml">
+                  <target name="download"/>
+                </ant>
+              </target>
+            </configuration>
+            <goals>
+              <goal>run</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+    </plugins>
+  </build>
+</project>

Propchange: incubator/stanbol/trunk/data/opennlp/ner/en/pom.xml
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Modified: incubator/stanbol/trunk/data/pom.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/pom.xml?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/data/pom.xml (original)
+++ incubator/stanbol/trunk/data/pom.xml Wed Jul 20 21:49:50 2011
@@ -18,6 +18,13 @@
 <project>
     <modelVersion>4.0.0</modelVersion>
 
+    <parent>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>stanbol-parent</artifactId>
+      <version>0.9.0-incubating-SNAPSHOT</version>
+      <relativePath>parent</relativePath>
+    </parent>
+
     <groupId>org.apache.stanbol</groupId>
     <artifactId>org.apache.stanbol.data.reactor</artifactId>
     <version>0.9.0-incubating-SNAPSHOT</version>
@@ -34,7 +41,42 @@
         <url>http://incubator.apache.org/stanbol/</url>
     </scm>
 
-    <modules>
-        <module>sites</module>
-    </modules>
+  <profiles>
+    <profile>
+      <!--
+        Provile that includes all the data modules used as default data
+        within the Stanbol Launchers.
+        
+        This profile is activated by default because this bundles are
+        referenced by other stanbol modules and the launchers.
+      -->
+      <id>defaultdata</id>
+      <activation>
+        <activeByDefault>true</activeByDefault>
+      </activation>
+      <modules>
+        <module>sites/dbpediadefault</module>
+        <module>opennlp/lang/en</module>
+        <module>opennlp/ner/en</module>
+      </modules>
+    </profile>
+    <profile>
+      <!--
+        Profile including data modules that are predefined configurations of 
+        referenced sites that do not use a precomputed local index, but directly 
+        access a remote service for query and retrievel. However they they will
+        use a local cache to store retrieved entities.
+        
+        This profile is activated by default because it does not need to download
+        remote resources
+      -->
+      <id>siteconfigs</id>
+      <activation>
+        <activeByDefault>true</activeByDefault>
+      </activation>
+      <modules>
+        <module>sites/dbpediacached</module>
+      </modules>
+    </profile>
+  </profiles>
 </project>

Modified: incubator/stanbol/trunk/data/sites/dblp/README.md
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dblp/README.md?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/data/sites/dblp/README.md (original)
+++ incubator/stanbol/trunk/data/sites/dblp/README.md Wed Jul 20 21:49:50 2011
@@ -1,50 +1,12 @@
 # DBLP with local index for the Apache Stanbol Entityhub
 
-This build a bundle that can be installed to add the [DBLP](http://dblp.uni-trier.de/) 
-data set as a ReferencedSite to the Apache Entityhub.
-
-The binary data for the local cache are not included but need to be
-downloaded (TODO: add download location as soon as available) or built locally
-by using the DBLP indexing utility.
-
-PLEASE NOTE that the DBLP dataset does not provide any License information. 
-
-
-## Installation
-
-First build the bundle by calling
-
-    mvn install
-
-It the command succeeds the bundle is available in the target folder
-    
-    target/org.apache.stanbol.data.sites.dblp-.*.jar
-
-This bundle can now be installed to a running Stanbol instance e.g. by using
-the Apache Felix Webconsole.
-
-NOTE: This steps requires the Sling Installer Framework as well as the 
-Stanbol BundleInstaller extension to be active. Both are typically included
-within the Stanbol Launcher.
-
-After installing and starting this Bundle the Stanbol Data File Provider (a
-tab within the Apache Felix Webconsole) will show a request for the binary
-file for the local index.
-
-To finalise the installation you need to copy the requested file to the
-directory used by the Stanbol Data File Provider
-
-    sling/datafiles/
-    
-and that restart the SolrYard instance with the name
-    
-    dblpIndex
-    
+This module is no longer available. It was replaced by the Indexing Tool for
+DBLP that now automatically creates all required resources after the indexing
+completes.
  
 ## Building the DBLP index
 
-To build a local Index for DBLP the Apache Entityhub provides an own utility
-The module is located at
+The Indexing Tool for DBLP is located at
 
     {stanbol}/entityhub/indexing/dblp
 

Modified: incubator/stanbol/trunk/data/sites/dbpedia/README.md
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpedia/README.md?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpedia/README.md (original)
+++ incubator/stanbol/trunk/data/sites/dbpedia/README.md Wed Jul 20 21:49:50 2011
@@ -1,45 +1,8 @@
 # DBpedia.org with local index for the Apache Stanbol Entityhub
 
-This build a bundle that can be installed to add DBpedia.org as a
-ReferencedSite to the Apache Entityhub.
-
-It will override the "dbpedia" referenced site included in the default
-configuration of the "full" launcher of Apache Stanbol.
-
-The binary data for the local cache are not included but need to be
-downloaded (TODO: add download location as soon as available) or built locally
-by using the DBpedia.org indexing utility.
-
-
-## Installation
-
-First build the bundle by calling
-
-    mvn install
-
-It the command succeeds the bundle is available in the target folder
-
-    target/org.apache.stanbol.data.sites.dbpedia-.*.jar
-
-This bundle can now be installed to a running Stanbol instance e.g. by using
-the Apache Felix Webconsole.
-
-NOTE: This steps requires the Sling Installer Framework as well as the
-Stanbol BundleInstaller extension to be active. Both are typically included
-within the Stanbol Launcher.
-
-After installing and starting this Bundle the Stanbol Data File Provider (a
-tab within the Apache Felix Webconsole) will show a request for the binary
-file for the local index.
-
-To finalise the installation you need to copy the requested file to the
-directory used by the Stanbol Data File Provider
-
-    sling/datafiles/
-
-and that restart the SolrYard instance with the name
-
-    DBpediaIndex
+This module is no longer available. It was replaced by the Indexing Tool for
+DBpedia.org that now automatically creates all required resources after the 
+indexing completes.
 
 
 ## Building the DBpedia.org index
@@ -52,4 +15,19 @@ The module is located at
 A detailed documentation on how to use this utility is provided by the
 README file.
 
+### Note 
 
+The indexing tool for DBPedia.org now creates also a bundle that can be used as
+replacement for this one. However it will not contain the configuration for the
+NamedEntityTaggingEngine.
+This service needs to be manually configured by using the following values:
+
+    org.apache.stanbol.enhancer.engines.entitytagging.nameField="rdfs:label"
+    org.apache.stanbol.enhancer.engines.entitytagging.personType="dbp-ont:Person"
+    org.apache.stanbol.enhancer.engines.entitytagging.personState=B"true"
+    org.apache.stanbol.enhancer.engines.entitytagging.referencedSiteId="dbpedia"
+    org.apache.stanbol.enhancer.engines.entitytagging.placeState=B"true"
+    org.apache.stanbol.enhancer.engines.entitytagging.organisationState=B"true"
+    org.apache.stanbol.enhancer.engines.entitytagging.organisationType="dbp-ont:Organisation"
+    org.apache.stanbol.enhancer.engines.entitytagging.placeType="dbp-ont:Place"
+    
\ No newline at end of file

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/.classpath
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/.classpath?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/.classpath (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/.classpath Wed Jul 20 21:49:50 2011
@@ -0,0 +1,5 @@
+<classpath>
+  <classpathentry kind="src" path="src/main/resources" excluding="**/*.java"/>
+  <classpathentry kind="output" path="target/classes"/>
+  <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
+</classpath>
\ No newline at end of file

Propchange: incubator/stanbol/trunk/data/sites/dbpediadefault/.classpath
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/.project
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/.project?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/.project (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/.project Wed Jul 20 21:49:50 2011
@@ -0,0 +1,17 @@
+<projectDescription>
+  <name>org.apache.stanbol.data.sites.dbpedia</name>
+  <comment>This bundle installs DBpedia as Referenced Site with a full local cache to
+    the Apache Stanbol Entityhub.
+    The data of the local cache are not included but MUST be either downloaded
+    or precomputed by using the DBpedia indexing utility (see 
+    &quot;{stanbol}/entityhub/indexing/dbpedia&quot;). NO_M2ECLIPSE_SUPPORT: Project files created with the maven-eclipse-plugin are not supported in M2Eclipse.</comment>
+  <projects/>
+  <buildSpec>
+    <buildCommand>
+      <name>org.eclipse.jdt.core.javabuilder</name>
+    </buildCommand>
+  </buildSpec>
+  <natures>
+    <nature>org.eclipse.jdt.core.javanature</nature>
+  </natures>
+</projectDescription>
\ No newline at end of file

Propchange: incubator/stanbol/trunk/data/sites/dbpediadefault/.project
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/README.md
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/README.md?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/README.md (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/README.md Wed Jul 20 21:49:50 2011
@@ -0,0 +1,16 @@
+# Data files for the default DBPedia.org Site shipped with the Stanbol distributions
+
+This source repository only holds the pom.xml file and folder structure to build
+the org.apache.stanbol.data.site.dbpedia.default artifact to be included in the standard distributions of stanbol.
+
+To avoid loading subversion repository with large binary files this artifact has to be build and deployed manually to retrieve the precomputed dbpedia index from the web.
+
+## Download the precomputed local cache for DBPedia.org
+
+This bundle needs to include a small local index of DBPedia.org that includes the 43k entities with the most incoming Wiki links.
+
+This index is not in the subversion but needs to be downloaded by using the
+
+    download_index.sh
+
+This shell script needs to be executed before this bundle is build.

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/download_index.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/download_index.xml?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/download_index.xml (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/download_index.xml Wed Jul 20 21:49:50 2011
@@ -0,0 +1,32 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<project name="Index Download Helper" default="download" basedir=".">
+  <description>
+    Contains only a singel target that is used by the Maven Ant
+    Plugin to download the Index parsed via 'index.url'
+    to 'target.directory'
+  </description>
+   
+  <target name="download">
+    <copy todir="${target.directory}" flatten="true">
+      <resources>
+        <url url="${index.url}"/>
+      </resources>
+    </copy>
+  </target>
+</project>
\ No newline at end of file

Propchange: incubator/stanbol/trunk/data/sites/dbpediadefault/download_index.xml
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/pom.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/pom.xml?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/pom.xml (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/pom.xml Wed Jul 20 21:49:50 2011
@@ -0,0 +1,135 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
+
+  <modelVersion>4.0.0</modelVersion>
+
+  <parent>
+    <groupId>org.apache.stanbol</groupId>
+    <artifactId>org.apache.stanbol.data.parent</artifactId>
+    <version>0.9.0-incubating-SNAPSHOT</version>
+    <relativePath>../../parent</relativePath>
+  </parent>
+
+  <groupId>org.apache.stanbol</groupId>
+  <artifactId>org.apache.stanbol.data.sites.dbpedia.default</artifactId>
+  <version>1.0.0-incubating</version>
+  <packaging>bundle</packaging>
+
+  <name>Apache Stanbol Data: DBpedia.org defaultdata version</name>
+  <description>
+    This bundle installs DBpedia as Referenced Site with a small
+    local index. This bundle is typically distributed as default with
+    the different Stanbol Launchers.
+    Users that do not need DBPedia.org or want to upgrade to a more
+    complete local index will need to stop/uninstall this bundle.
+  </description>
+
+  <inceptionYear>2011</inceptionYear>
+
+  <scm>
+    <connection>
+      scm:svn:http://svn.apache.org/repos/asf/incubator/stanbol/trunk/data/sites/dbpediadefault
+    </connection>
+    <developerConnection>
+      scm:svn:https://svn.apache.org/repos/asf/incubator/stanbol/trunk/data/sites/dbpediadefault
+    </developerConnection>
+    <url>http://incubator.apache.org/stanbol/</url>
+  </scm>
+  <properties>
+    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+    <dbpedia.default.index.url>http://dl.dropbox.com/u/5743203/IKS/dbpedia/3.6/dbpedia_43k.solrindex.zip</dbpedia.default.index.url>
+    <dbpedia.default.path>org/apache/stanbol/data/site/dbpedia/default</dbpedia.default.path>
+    <dbpedia.default.index.path>${dbpedia.default.path}/index</dbpedia.default.index.path>
+    <dbpedia.default.config.path>${dbpedia.default.path}/config</dbpedia.default.config.path>
+  </properties>
+
+  <build>
+    <plugins>
+      <plugin>
+        <groupId>org.apache.felix</groupId>
+        <artifactId>maven-bundle-plugin</artifactId>
+        <version>2.0.1</version>
+        <inherited>true</inherited>
+        <extensions>true</extensions>
+        <configuration>
+          <instructions>
+            <!-- 
+              Extension used to provide files in that directory to the
+              DataFileProvider
+              -->
+            <Data-Files>${dbpedia.default.index.path}</Data-Files>
+            <!-- 
+              Use a priority lower than 0 to allow providers without a
+              defined ranking to override this default data.
+             -->
+            <Data-Files-Priority>-100</Data-Files-Priority>
+            <!-- 
+              Extension used by the Bundle-Installer to load OSGI 
+              component configuration  
+             -->
+            <Install-Path>${dbpedia.default.config.path}</Install-Path>
+            <_versionpolicy>$${version;===;${@}}</_versionpolicy>
+          </instructions>
+        </configuration>
+      </plugin>
+      <plugin>
+        <!-- 
+          Ant is used to download the models from the
+          http://opennlp.sourceforge.net site.
+        -->
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-antrun-plugin</artifactId>
+        <executions>
+          <execution>
+            <id>compile</id>
+            <phase>compile</phase>
+            <configuration>
+              <!--
+                TODO: I would like to add an "unless" constraint to the
+                target that prevents execution if Maven operates in offline
+                mode. However I was not able to find out how to obtain this
+                information. ${settings.offline} (as noted by several
+                resources) does not work.
+                Until fixed builds will fail if no internetconnection is
+                available!
+              -->
+              <target>
+                <property name="target.directory" value="${project.basedir}/src/main/resources/${dbpedia.default.index.path}"/>
+                <property name="index.url" value="${dbpedia.default.index.url}"/>
+                                
+                <echo message="copy Solr Index "/>
+                <echo message="  FROM ${index.url} "/>
+                <echo message="  TO ${target.directory}"/>
+
+                <ant antfile="${basedir}/download_index.xml">
+                  <target name="download"/>
+                </ant>
+              </target>
+            </configuration>
+            <goals>
+              <goal>run</goal>
+            </goals>
+          </execution>
+        </executions>
+      </plugin>
+    </plugins>
+  </build>
+
+</project>

Propchange: incubator/stanbol/trunk/data/sites/dbpediadefault/pom.xml
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/dbpedia_43k.solrindex.ref
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/dbpedia_43k.solrindex.ref?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/dbpedia_43k.solrindex.ref (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/dbpedia_43k.solrindex.ref Wed Jul 20 21:49:50 2011
@@ -0,0 +1,3 @@
+Name=SolrIndex for dbpedia
+Description=DBpedia.org 
+Index-Archive=dbpedia_43k.solrindex.zip

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-dbpedia.config
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-dbpedia.config?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-dbpedia.config (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-dbpedia.config Wed Jul 20 21:49:50 2011
@@ -0,0 +1,8 @@
+org.apache.stanbol.enhancer.engines.entitytagging.nameField="rdfs:label"
+org.apache.stanbol.enhancer.engines.entitytagging.personType="dbp-ont:Person"
+org.apache.stanbol.enhancer.engines.entitytagging.personState=B"true"
+org.apache.stanbol.enhancer.engines.entitytagging.referencedSiteId="dbpedia"
+org.apache.stanbol.enhancer.engines.entitytagging.placeState=B"true"
+org.apache.stanbol.enhancer.engines.entitytagging.organisationState=B"true"
+org.apache.stanbol.enhancer.engines.entitytagging.organisationType="dbp-ont:Organisation"
+org.apache.stanbol.enhancer.engines.entitytagging.placeType="dbp-ont:Place"

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.core.site.CacheImpl-dbpedia.config
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.core.site.CacheImpl-dbpedia.config?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.core.site.CacheImpl-dbpedia.config (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.core.site.CacheImpl-dbpedia.config Wed Jul 20 21:49:50 2011
@@ -0,0 +1,4 @@
+org.apache.stanbol.entityhub.yard.name="dbpedia\ Cache"
+org.apache.stanbol.entityhub.yard.cacheYardId="dbpediaDefaultdataIndex"
+org.apache.stanbol.entityhub.yard.id="dbpediaDefaultdataIndex"
+org.apache.stanbol.entityhub.yard.description="Cache\ for\ the\ dbpedia\ Referenced\ Site\ using\ the\ dbpediaIndex."

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.site.referencedSite-dbpedia.config
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.site.referencedSite-dbpedia.config?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.site.referencedSite-dbpedia.config (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.site.referencedSite-dbpedia.config Wed Jul 20 21:49:50 2011
@@ -0,0 +1,18 @@
+org.apache.stanbol.entityhub.site.attributionUrl="http://wiki.dbpedia.org/About"
+org.apache.stanbol.entityhub.site.cacheId="dbpediaDefaultdataIndex"
+org.apache.stanbol.entityhub.site.name="dbpedia"
+org.apache.stanbol.entityhub.site.dereferencerType="org.apache.stanbol.entityhub.dereferencer.SparqlDereferencer"
+org.apache.stanbol.entityhub.site.defaultMappedEntityState="proposed"
+org.apache.stanbol.entityhub.site.fieldMappings=["dbp-ont:*","dbp-ont:thumbnail\ |\ d\=xsd:anyURI\ >\ foaf:depiction","dbp-prop:latitude\ |\ d\=xsd:decimal\ >\ geo:lat","dbp-prop:longitude\ |\ d\=xsd:decimal\ >\ geo:long","dbp-prop:population\ |\ d\=xsd:integer","dbp-prop:website\ |\ d\=xsd:anyURI\ >\ foaf:homepage"]
+org.apache.stanbol.entityhub.site.licenseName=["Creative\ Commons\ Attribution-ShareAlike\ 3.0","GNU\ Free\ Documentation\ License"]
+org.apache.stanbol.entityhub.site.defaultSymbolState="proposed"
+org.apache.stanbol.entityhub.site.searcherType="org.apache.stanbol.entityhub.searcher.VirtuosoSearcher"
+org.apache.stanbol.entityhub.site.defaultExpireDuration=I"0"
+org.apache.stanbol.entityhub.site.cacheStrategy="all"
+org.apache.stanbol.entityhub.site.attribution="DBpedia.org"
+org.apache.stanbol.entityhub.site.accessUri="http://dbpedia.org/sparql/"
+org.apache.stanbol.entityhub.site.id="dbpedia"
+org.apache.stanbol.entityhub.site.entityPrefix=["http://dbpedia.org/resource/","http://dbpedia.org/ontology/"]
+org.apache.stanbol.entityhub.site.licenseUrl=["http://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License","http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License"]
+org.apache.stanbol.entityhub.site.queryUri="http://dbpedia.org/sparql"
+org.apache.stanbol.entityhub.site.description="DBpedia.org\ "

Added: incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.yard.solr.impl.SolrYard-dbpedia.config
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.yard.solr.impl.SolrYard-dbpedia.config?rev=1148947&view=auto
==============================================================================
--- incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.yard.solr.impl.SolrYard-dbpedia.config (added)
+++ incubator/stanbol/trunk/data/sites/dbpediadefault/src/main/resources/org/apache/stanbol/data/site/dbpedia/default/config/org.apache.stanbol.entityhub.yard.solr.impl.SolrYard-dbpedia.config Wed Jul 20 21:49:50 2011
@@ -0,0 +1,7 @@
+org.apache.stanbol.entityhub.yard.solr.solrUri="dbpedia_43k"
+org.apache.stanbol.entityhub.yard.name="dbpedia\ default\ data\ index"
+org.apache.stanbol.entityhub.yard.solr.multiYardIndexLayout=B"false"
+org.apache.stanbol.entityhub.yard.solr.useDefaultConfig=B"false"
+org.apache.stanbol.entityhub.yard.solr.documentBoost="http://www.iks-project.eu/ontology/rick/model/entityRank"
+org.apache.stanbol.entityhub.yard.id="dbpediaDefaultdataIndex"
+org.apache.stanbol.entityhub.yard.description="Small\ local\ index\ with\ 43000\ entities\ for\ the\ Referenced\ Site\ \"dbpedia\"."

Modified: incubator/stanbol/trunk/enhancer/engines/opennlp-ner/pom.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/opennlp-ner/pom.xml?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/enhancer/engines/opennlp-ner/pom.xml (original)
+++ incubator/stanbol/trunk/enhancer/engines/opennlp-ner/pom.xml Wed Jul 20 21:49:50 2011
@@ -57,11 +57,6 @@
     </dependency>
 
     <dependency>
-      <groupId>org.apache.stanbol</groupId>
-      <artifactId>org.apache.stanbol.defaultdata</artifactId>
-    </dependency>
-
-    <dependency>
         <groupId>org.apache.stanbol</groupId>
         <artifactId>org.apache.stanbol.commons.stanboltools.datafileprovider</artifactId>
     </dependency>
@@ -100,10 +95,22 @@
     <dependency>
       <groupId>junit</groupId>
       <artifactId>junit</artifactId>
+      <scope>test</scope>
     </dependency>
     <dependency>
       <groupId>org.slf4j</groupId>
       <artifactId>slf4j-simple</artifactId>
+      <scope>test</scope>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.data.opennlp.lang.en</artifactId>
+      <scope>test</scope>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.data.opennlp.ner.en</artifactId>
+      <scope>test</scope>
     </dependency>
   </dependencies>
 
@@ -121,7 +128,6 @@
 <!--            <Embed-Dependency></Embed-Dependency>
             <Embed-Transitive>true</Embed-Transitive>  -->
             <Import-Package>
-              org.apache.stanbol.defaultdata.opennlp,
               !net.didion.*,
               *
             </Import-Package>

Modified: incubator/stanbol/trunk/enhancer/engines/opennlp-ner/src/test/java/org/apache/stanbol/enhancer/engines/opennlp/impl/ClasspathDataFileProvider.java
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/opennlp-ner/src/test/java/org/apache/stanbol/enhancer/engines/opennlp/impl/ClasspathDataFileProvider.java?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/enhancer/engines/opennlp-ner/src/test/java/org/apache/stanbol/enhancer/engines/opennlp/impl/ClasspathDataFileProvider.java (original)
+++ incubator/stanbol/trunk/enhancer/engines/opennlp-ner/src/test/java/org/apache/stanbol/enhancer/engines/opennlp/impl/ClasspathDataFileProvider.java Wed Jul 20 21:49:50 2011
@@ -28,7 +28,13 @@ import org.slf4j.LoggerFactory;
 public class ClasspathDataFileProvider implements DataFileProvider {
 
     private final Logger log = LoggerFactory.getLogger(getClass());
-    public static final String RESOURCE_BASE_PATH = "org/apache/stanbol/defaultdata/opennlp/";
+    /*
+     * NOTE: This path needs to be the same as the one used by the
+     *       org.apache.stanbol.data.opennlp.lang.en and the
+     *       org.apache.stanbol.data.opennlp.ner.en bundle to store the 
+     *       OpenNLP models
+     */
+    public static final String RESOURCE_BASE_PATH = "org/apache/stanbol/data/opennlp/";
     
     private final String symbolicName;
     

Modified: incubator/stanbol/trunk/enhancer/engines/taxonomylinking/pom.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/taxonomylinking/pom.xml?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/enhancer/engines/taxonomylinking/pom.xml (original)
+++ incubator/stanbol/trunk/enhancer/engines/taxonomylinking/pom.xml Wed Jul 20 21:49:50 2011
@@ -106,11 +106,13 @@
 	  <groupId>org.apache.stanbol</groupId>
 	  <artifactId>org.apache.stanbol.commons.opennlp</artifactId>
     </dependency>
+
+<!--
     <dependency>
       <groupId>org.apache.stanbol</groupId>
-      <artifactId>org.apache.stanbol.defaultdata</artifactId>
+      <artifactId>org.apache.stanbol.data.opennlp.lang.en</artifactId>
     </dependency>
-
+-->
     <dependency>
       <groupId>commons-io</groupId>
       <artifactId>commons-io</artifactId>

Modified: incubator/stanbol/trunk/launchers/full/src/main/bundles/list.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/full/src/main/bundles/list.xml?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/launchers/full/src/main/bundles/list.xml (original)
+++ incubator/stanbol/trunk/launchers/full/src/main/bundles/list.xml Wed Jul 20 21:49:50 2011
@@ -88,13 +88,11 @@
       <artifactId>org.apache.stanbol.commons.installer.bundleprovider</artifactId>
       <version>0.9.0-incubating-SNAPSHOT</version>
     </bundle>
-    <!--
     <bundle>
       <groupId>org.apache.sling</groupId>
       <artifactId>org.apache.sling.installer.provider.file</artifactId>
       <version>1.0.0</version>
     </bundle>
-    -->
   </startLevel>
 
   <!-- Felix web console and plugins -->
@@ -392,8 +390,18 @@
   <startLevel level="19">
     <bundle>
       <groupId>org.apache.stanbol</groupId>
-      <artifactId>org.apache.stanbol.defaultdata</artifactId>
-      <version>0.0.3</version>
+      <artifactId>org.apache.stanbol.data.opennlp.ner.en</artifactId>
+      <version>1.0.0-incubating</version>
+    </bundle>
+    <bundle>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.data.opennlp.lang.en</artifactId>
+      <version>1.0.0-incubating</version>
+    </bundle>
+    <bundle>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.data.sites.dbpedia.default</artifactId>
+      <version>1.0.0-incubating</version>
     </bundle>
   </startLevel>
 

Modified: incubator/stanbol/trunk/launchers/kres/src/main/bundles/list.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/kres/src/main/bundles/list.xml?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/launchers/kres/src/main/bundles/list.xml (original)
+++ incubator/stanbol/trunk/launchers/kres/src/main/bundles/list.xml Wed Jul 20 21:49:50 2011
@@ -429,11 +429,21 @@
 			<artifactId>org.apache.stanbol.enhancer.engines.autotagging</artifactId>
 			<version>0.9.0-incubating-SNAPSHOT</version>
 		</bundle>
-		<bundle>
-			<groupId>org.apache.stanbol</groupId>
-			<artifactId>org.apache.stanbol.defaultdata</artifactId>
-			<version>0.0.3</version>
-		</bundle>
+	    <bundle>
+            <groupId>org.apache.stanbol</groupId>
+		    <artifactId>org.apache.stanbol.data.opennlp.ner.en</artifactId>
+		    <version>1.0.0-incubating</version>
+	    </bundle>
+	    <bundle>
+	        <groupId>org.apache.stanbol</groupId>
+	        <artifactId>org.apache.stanbol.data.opennlp.lang.en</artifactId>
+	        <version>1.0.0-incubating</version>
+	    </bundle>
+	    <bundle>
+	        <groupId>org.apache.stanbol</groupId>
+	        <artifactId>org.apache.stanbol.data.sites.dbpedia.default</artifactId>
+	        <version>1.0.0-incubating</version>
+	    </bundle>
 		<bundle>
 			<groupId>org.apache.stanbol</groupId>
 			<artifactId>org.apache.stanbol.enhancer.engines.opennlp.ner</artifactId>

Modified: incubator/stanbol/trunk/launchers/stable/src/main/bundles/list.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/bundles/list.xml?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/launchers/stable/src/main/bundles/list.xml (original)
+++ incubator/stanbol/trunk/launchers/stable/src/main/bundles/list.xml Wed Jul 20 21:49:50 2011
@@ -88,6 +88,11 @@
       <artifactId>org.apache.stanbol.commons.installer.bundleprovider</artifactId>
       <version>0.9.0-incubating-SNAPSHOT</version>
     </bundle>
+    <bundle>
+      <groupId>org.apache.sling</groupId>
+      <artifactId>org.apache.sling.installer.provider.file</artifactId>
+      <version>1.0.0</version>
+    </bundle>
   </startLevel>
 
   <!-- Felix web console and plugins -->
@@ -375,8 +380,18 @@
   <startLevel level="19">
     <bundle>
       <groupId>org.apache.stanbol</groupId>
-      <artifactId>org.apache.stanbol.defaultdata</artifactId>
-      <version>0.0.3</version>
+      <artifactId>org.apache.stanbol.data.opennlp.ner.en</artifactId>
+      <version>1.0.0-incubating</version>
+    </bundle>
+    <bundle>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.data.opennlp.lang.en</artifactId>
+      <version>1.0.0-incubating</version>
+    </bundle>
+    <bundle>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.data.sites.dbpedia.default</artifactId>
+      <version>1.0.0-incubating</version>
     </bundle>
   </startLevel>
 

Modified: incubator/stanbol/trunk/launchers/stateless/src/main/bundles/list.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stateless/src/main/bundles/list.xml?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/launchers/stateless/src/main/bundles/list.xml (original)
+++ incubator/stanbol/trunk/launchers/stateless/src/main/bundles/list.xml Wed Jul 20 21:49:50 2011
@@ -362,8 +362,18 @@
   <startLevel level="19">
     <bundle>
       <groupId>org.apache.stanbol</groupId>
-      <artifactId>org.apache.stanbol.defaultdata</artifactId>
-      <version>0.0.3</version>
+      <artifactId>org.apache.stanbol.data.opennlp.ner.en</artifactId>
+      <version>1.0.0-incubating</version>
+    </bundle>
+    <bundle>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.data.opennlp.lang.en</artifactId>
+      <version>1.0.0-incubating</version>
+    </bundle>
+    <bundle>
+      <groupId>org.apache.stanbol</groupId>
+      <artifactId>org.apache.stanbol.data.sites.dbpedia.default</artifactId>
+      <version>1.0.0-incubating</version>
     </bundle>
   </startLevel>
 

Modified: incubator/stanbol/trunk/parent/pom.xml
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/parent/pom.xml?rev=1148947&r1=1148946&r2=1148947&view=diff
==============================================================================
--- incubator/stanbol/trunk/parent/pom.xml (original)
+++ incubator/stanbol/trunk/parent/pom.xml Wed Jul 20 21:49:50 2011
@@ -293,10 +293,24 @@
         <version>0.9.0-incubating-SNAPSHOT</version>
         <scope>provided</scope>
       </dependency>
+      
+      <!-- Data Bundles included in the standard Stanbol Distribution -->
+      <dependency>
+        <groupId>org.apache.stanbol</groupId>
+        <artifactId>org.apache.stanbol.data.opennlp.ner.en</artifactId>
+        <version>1.0.0-incubating</version>
+        <scope>provided</scope>
+      </dependency>
+      <dependency>
+        <groupId>org.apache.stanbol</groupId>
+        <artifactId>org.apache.stanbol.data.opennlp.lang.en</artifactId>
+        <version>1.0.0-incubating</version>
+        <scope>provided</scope>
+      </dependency>
       <dependency>
         <groupId>org.apache.stanbol</groupId>
-        <artifactId>org.apache.stanbol.defaultdata</artifactId>
-        <version>0.0.3</version>
+        <artifactId>org.apache.stanbol.data.sites.dbpedia.default</artifactId>
+        <version>1.0.0-incubating</version>
         <scope>provided</scope>
       </dependency>
 
@@ -418,12 +432,6 @@
         <version>0.9.0-incubating-SNAPSHOT</version>
         <scope>provided</scope>
       </dependency>
-      <dependency>
-        <groupId>org.apache.stanbol</groupId>
-        <artifactId>org.apache.stanbol.commons.web.base</artifactId>
-        <version>0.9.0-incubating-SNAPSHOT</version>
-        <scope>provided</scope>
-      </dependency>
       
       <dependency>
         <groupId>org.apache.stanbol</groupId>



What is this instruction ?

Posted by florent andré <fl...@4sengines.com>.
Hi,

On 07/20/2011 11:49 PM, rwesten@apache.org wrote:
> <plugins>
> +<plugin>
> +<groupId>org.apache.felix</groupId>
> +<artifactId>maven-bundle-plugin</artifactId>
> +<configuration>
> +<instructions>
> +<_versionpolicy>$${version;===;${@}}</_versionpolicy>

Can you explain what mean this _versionpolicy tag please ?
Seems like a cool maven/bundle incantation ! :)

I googled a little bit around this but don't find clear statement.
Thanks

++