You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by João Pedro Oliveira <jo...@metatheke.com> on 2012/01/06 10:17:27 UTC

Stanbol's Solr

Hello All.
I´m currently using Stanbol to store objects in the entityHub. My goal is
to use the Solr of the Apache Stanbol to perform searches.
Everything works fine, however I need to change the definition of the
fields in the schema.xml file of the Solr.

The problem is that all the fields are created as dynamic fields (ei with
type=text and multivalued=true). But I need to change some fields type to
String and the multivalued value to false.

I already know all the metafields that will be indexed. So is it possible
to edit the schema.xml of the solr, defining already all the fields and its
type and values, instead of using the dynamic fields?

Regrets
João

Re: Stanbol's Solr

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

On 06.01.2012, at 10:17, João Pedro Oliveira wrote:

> Hello All.
> I´m currently using Stanbol to store objects in the entityHub. My goal is
> to use the Solr of the Apache Stanbol to perform searches.
> Everything works fine, however I need to change the definition of the
> fields in the schema.xml file of the Solr.
> 
> The problem is that all the fields are created as dynamic fields (ei with
> type=text and multivalued=true). But I need to change some fields type to
> String and the multivalued value to false.
> 
> I already know all the metafields that will be indexed. So is it possible
> to edit the schema.xml of the solr, defining already all the fields and its
> type and values, instead of using the dynamic fields?
> 

In principle it is possible to adapt the schema.xml used by the Entityhub.
However the steps to provide this configuration is different if you use the
indexing tools or if you manually configure a referencedSite.

Based on you Request I am assuming that you use the indexing tools
however I will explain both cases in this response.

## Changing the Solr Configuration when using the Indexing Tools

If you use the indexing utilities you can adapt the used Solr configuration
by providing it within the indexing/config directory. This feature is used by
the dbpedia indexer [1] so you can use this as an example.

The directory with the SolrCore configuration MUST BE located within the+
"indexing/config" and defaults to the value of the "name" property of the
"indexing.properties".  However you can change this default by using the
"solrConf" parameter with the "indexingDestination" configuration.

e.g. this will set the solrCore configuration name to "myCoreConfig". This 
line MUST BE defined within the "indexing.properties" file.

    indexingDestination=org.apache.stanbol.entityhub.indexing.destination.solryard.SolrYardIndexingDestination,solrConf:myCoreConfig,boosts:fieldboosts

## Changing the Solr Configuration for an SolrYard

To manually create an Referenced Site that locally stores the entity data
within Solr one needs to create three OSGI component. 

1. Referenced Site
2. Cache
3. SolrYard.

The SolrYard tries to load the configuration based on it's ID via the
DataFileProvider infrastructure. So if you want to use a custom
configuration you need to provide it to the Main DataFileProvider
before you create the SolrYard.

Here are the steps you need to follow:

1. Create the Custom Core configuration. The best is to start from [2] and make
all the necessary adaptions.
2. create a zip file for the changed configuration and name it "<coreName>.solrindex.zip".
Note that the root directory within the archive MUST HAVE the same "< coreName >". 
Otherwise the initialization will not work.
3. copy this configuration to the "/sling/datafiles" directory. This is the directory used by
the Main DataFileProvider.

Now you can create the SolrYard. Make sure that the configured Server Location is
set to the "<coreName> " of you custom configuration.

After creating the new SolrYard you should see a INFO level logging like

    org.apache.stanbol.entityhub.yard.solr.impl.SolrYard  ... initialise new SolrDirectory Index with name <coreName> by using Index Configuration <coreName>

I have also attached a screenshot of the dialog in the Apache 
Felix Web Console and the data files directory.

To use the customized SolrYard with a ReferencedSite you need still to 
create/configure a Cache (use the ID of the SolrYard as Yard for the cache)
and the ReferencedSite (make sure to use the ID of the SolrYard as the "Cache ID"
and set the Cache Strategy to Used or All.

best
Rupert

[1] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/indexing/dbpedia/
[2] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/default.solrindex.zip


> Regrets
> João