You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Gniewosław Rzepka <Gn...@makolab.net> on 2012/10/01 16:57:13 UTC

FW: Problems with Stanbol custom vocabularies

Rupert, thanks for advice. I got the idea to try to index dataset from ehealth index source-file called diseasome_dump.nt [9]. This is a good example - I thought.

Referring to your previous message:



>As long as your RDF data do define values for rdfs:label the default config is fine. This is also true if you use SKOS, FOAF, Dublin Core Elements/Terms and some other well known RDF schemas. Changing the name field is good. Just make sure you do not change it to "local" or "entityhub" as those names are protected for the use of the Entityhub.

This file includes rdfs:label, so I changed only ‘name’ property in indexing.properties[5] config file to "diseasome".





>Does the path to the indexing folder contain a space (e.g. "/stanbol/my indexes/")? This may cause problems with the Apache CLI library used by the Entityhub indexing too. If this is the case please change the path accordingly.

My indexing-dir don't contain white spaces : " D:\tomcat7\bin\stanbol\Indexing".





>Make sure your RDF files are located in the correct directory "{indexing-dir}/indexing/resource/rdfdata" where {indexing-dir} is the working directory (the directory where the "org.apache.stanbol.entityhub.indexing.genericrdf-0.10.1-incubating-SNAPSHOT-jar-with-dependencies.jar" is located.

RDF file is located properly  I think: ". \stanbol\Indexing\indexing\resources\rdfdata\diseasome.nt".





>You can check if your RDF data are loaded by searching in the log for the file name of your rdf file. You should see the following loggings

>    jenatdb.RdfIndexingSource -  > {path}/{rdf-file}

>    source.ResourceLoader -  > loading '{path}/{rdf-file}' ...

In attachments is file index.log [4] with log from java -jar org.apache.stanbol.entityhub.indexing.genericrdf-0.10.1-incubating-SNAPSHOT-jar-with-dependencies.jar init process, it include lines:

jenatdb.RdfIndexingSource -  > D:\tomcat7\bin\stanbol\Indexing\indexing\resources\rdfdata\diseasome_dump.nt

source.ResourceLoader -  > loading 'D:\tomcat7\bin\stanbol\Indexing\indexing\resources\rdfdata\diseasome_dump.nt' ...



>Indexing Process: The indexing process is started after the line

>     [Indexing: Entity Source Reader Deamon] INFO impl.EntityDataBasedIndexingDaemon - ...start iterating over Entity data

Index.log [4] also contain this line.





>After indexing completed you should see two files in   {indexing-dir}/indexing/disc

>1. {name}.solrindex.zip : This is basically a ZIP archive of the Apache Solr Core that contains the indexed data. You will need to copy this in the "datafiles" directory of your Stanbol instance

>2. org.apache.stanbol.data.site.{name}-1.0.0.jar: This is an OSGI Bundle containing the configurations for the ReferencedSite, Cache and SolrYard. You will need to install this Bundle by using the Bundle Tab

>of the Apache Felix Webconsole

> (http://{stanbol-instance}/system/console/bundles<http://%7bstanbol-instance%7d/system/console/bundles>). Look for the  [Install/Update...] button. Click it and in the dialog activate the "Start Bundle" option and add the Bundle. The suggested Start Level is fine.

These steps are done correctly.



>As soon as you complete (2) you should see your referenced Site at http://{stanbol-instance}/entityhub/site/{name}<http://%7bstanbol-instance%7d/entityhub/site/%7bname%7d> and some seconds after completing (1) the Site should be functional.

Here I have problem. I can't reference site at http://{stanbol-instance}/entityhub/site/{name}<http://%7bstanbol-instance%7d/entityhub/site/%7bname%7d>. Stanbol error.log [2] after bundle installation log only a warning (file error.log [2]):



01.10.2012 13:11:40.206 *WARN* [FelixDispatchQueue] org.apache.stanbol.commons.installer.provider.bundle.impl.BundleInstaller  ... no Entries found in path 'org\apache\stanbol\data\site\diseasome' configured for Bundle 'org.apache.stanbol.data.site.diseasome' with Manifest header field 'Install-Path'!



>If your RDF data do define rdfs:label's than using the default configuration should be fine. To reset previous changes you can delete the "{indexing-dir}/indexing" and reinitialize to the default by calling

>    java -jar org.apache.stanbol.entityhub.indexing.genericrdf-0.10.1-incubating-SNAPSHOT-jar-with-dependencies.jar init

In this point, this method generates exception:

Exception in thread "main" java.lang.IllegalArgumentException: Unable to find configuration file 'indexing.properties'!

        at org.apache.stanbol.entityhub.indexing.core.config.IndexingConfig.loadConfig(IndexingConfig.java:599)

        at org.apache.stanbol.entityhub.indexing.core.config.IndexingConfig.<init>(IndexingConfig.java:280)

        at org.apache.stanbol.entityhub.indexing.core.IndexerFactory.create(IndexerFactory.java:80)

        at org.apache.stanbol.entityhub.indexing.core.IndexerFactory.create(IndexerFactory.java:65)

        at org.apache.stanbol.entityhub.indexing.Main.main(Main.java:66)

File 'indexing.properties' [5]must be created manually, same as 'mappings.txt' [8], 'fieldboots.properties' [3], 'entityTypes.properties' [1]. This files I've got from Stanbol trunk repository.

Log containing this process is: init2.log [7].

Log file containing indexing process when I create the configuration files manually: init.log [6].



> If you like you can also sent me your RDF file so that I can try to reproduce your issues.

My RDF file has more then 16 MB:

2012-03-22  10:56        16 997 067 diseasome.nt

But this file is provided in Stanbol trunk repositories [9], so I do not send it.



===========================================

Contents of the indexing.7z file:

Directory of E:\Semantic\indexing



[1]. 2012-09-10  17:43             1 095 entityTypes.properties            (index config file)

[2]. 2012-10-01  13:11            19 953 error.log                                       (Stanbol error log)

[3]. 2012-09-10  17:39             1 976 fieldboosts.properties             (index config file)

[4]. 2012-10-01  12:00            13 017 index.log                                      (index process log)

[5]. 2012-10-01  11:30             7 611 indexing.properties                  (index config file)

[6]. 2012-10-01  11:58             3 893 init.log                                             (init process log no. 1)

[7]. 2012-10-01  13:02             1 644 init2.log                                          (init process log no. 2)

[8]. 2012-09-28  14:08             5 784 mappings.txt                                (index config file)

===========================================

[9]. http://dev.iks-project.eu/downloads/stanbol-indices/ehealth/source-files/



I would be very thankful if you could look at it.



Best

Gniewoslaw