You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by João Pedro Oliveira <jo...@metatheke.com> on 2011/10/07 11:28:37 UTC

Re: Solr Endpoint

Hi Rupert.
Thanks for the reply.
The Stanbol is already configured with the external Solr Server. However I´m
having troubles to perform queries with the dynamic fields created.
For example, I have a field* *with the name *@/DC-ELEMENTS:CREATOR/* that
was created to store the values of the dc:creator field indexed.
Can you explain me please how to make a query for example to this field?

João Oliveira

2011/9/26 Rupert Westenthaler <ru...@gmail.com>

> Hi
>
> On Mon, Sep 26, 2011 at 3:16 PM, Olivier Grisel
> <ol...@ensta.org> wrote:
> > 2011/9/26 João Pedro Oliveira <jo...@metatheke.com>:
> >> Good Afternoon.
> >>
> >> Is there any way to use the Solr Service through Apache Stanbol? I need
> to
> >> make a faceting search over my entities stored in the entity hub. I´m
> >> currently using the Query and Find endpoints from Stanbol but wath I
> wanted
> >> was to make a more simple search, just dividing my files by categories
> to
> >> get the total number of indexed files in each one.
>
> Currently the only possibility us to configure the Entityhub to use an
> external SolrServer.
>
> You can use a normal SolrServer (version 3.3+). However you need to
> configure it with a core compatible to the configuration expected by
> the SolrYard.
> If you want to start from scratch you can find the default configuration at
> [1].
> If you want to reuse the current data you can find the currently used
> index under
>
>    {stanbol-root}/sling/entityhub/solrYard/indexes/{index}
>
> Just copy the {index} over to the external SolrServer.
>
> To configure the Entityhub to use an external SolrServer
>
> * go to the configuration tab of the Apache Felix Webconsole
> (http://localhost:8080/system/console/configMgr)
> * search for the "Apache Stanbol Entityhub Yard: Solr Yard Configuration"
> * open the configuration of the correct SolrYard
> * change the value of the "Solr Index/Core" to the external "http://.."
> url.
>
> Before writing Queries you need to know how the SolrYard encodes RDF
> Properties in field:
>
> In general:
>
> * All triples with the same subject are added to the same Solr
> document with the "uri":"{subject}
> * RDF properties are encoded "{prefix}/{ns-prefix}:{local-name}/"
> where the {prefix} represents the datatype/language of the value
>
> (1) namespace prefix mappings
>
> All {ns-prefix} used within the index are stored in a special document
> within the index.
> This document has the id ("uri" is the field used for ids)
>
>    "urn:eu.iksproject:rick.yard.solr:config.namespacePrefixConfig"
>
> all fields within this document start with "_config/"
>
> (2) field prefixes
>
> The schema.xml gives an good overview over the defined prefixes. This
> file can be found under "{index}/conf/schema.xml"
>
> Short overview:
>
> * "@{lang}" for languages
> * "_!@" contains all text AND string values
> * "bool", "int", "lon", "flo", "dou", "cal", "dur" for primitive datatypes
> * "ref" for references (URI values)
> * "str" for string values of the datatype xsd:string
>
> special fields:
>
> * "uri" document id field
> * "_domain" is used by the SolrYard in cases where more than one
> SolrYard instances use the same SolrServer/Core
> * "_text" stores all text AND string values of ALL fields (and the
> default search field)
> * "_ref" stores all URI values of ALL fields (can be used to semantic
> context searches)
>
> I recommend to open an index of a SolrYard within Luke [2] and have
> your own look on how the data are stored.
>
> @João: I know this is a little bit complex ... if you have any
> additional questions feel free to ask. You can also join the #stanbol
> channel on IRC and ask me directly.
>
> >
> > I also think we should make it possible to enable the raw Solr servlet
> > from the SolrYard configuration. I wonder if this would be complicated
> > to implement though.
> >
> > Rupert, any thought?
> >
>
> @Olivier
>
> In my opinion it a border line use case, because the way how the
> SolrYard encodes fields would make custom queries very complex.
> However if more users request this feature we need definitely have a
> look.
>
> In case of an external SorlServer we could simple forward requests. In
> case of an EmbeddedSolrServer I do have access to the SolrCore.
> Hopefully one can initialize the sold servlet based on that.
>
> best
> Rupert
>
>
> [1]
> http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/default.solrindex.zip
> [2] http://code.google.com/p/luke/
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Solr Endpoint

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

2011/10/7 João Pedro Oliveira <jo...@metatheke.com>:
> Hi Rupert.
> Thanks for the reply.
> The Stanbol is already configured with the external Solr Server. However I´m
> having troubles to perform queries with the dynamic fields created.
> For example, I have a field* *with the name *@/DC-ELEMENTS:CREATOR/* that
> was created to store the values of the dc:creator field indexed.
> Can you explain me please how to make a query for example to this field?
>

I assume the field is called "@/dc-elements:creator/" in lower case letters

A query to search all documents with the creator "John" would look like

  q=@/dc-elements\:creator/:john

Note that you need to escape the ':' in the field name; "q" is the
parameter for the query

However I would prefer to search the field

    q=_!@dc-elements\:creator/:john

because this would include values regardless of the language and
include even values that are stored as Literals with the type
xsd:string. The "@/dc-elements:creator/" field only contains Literal
values without an data type and without language.

Please note also the when sending a query to Solr, you need also to
URL encode it. So the final request would look like

   start%3D0%26rows%3D20%26fl%3D*%2Cscore%26q%3D_!%40dc-elements%5C%3Acreator%2F%3Ajohn

standing for

    start=0&rows=20&fl=*,score&q=_!@dc-elements\:creator/:john


If you set the Logging Level for Solr to INFO you should also be able
to see normal requests created by the Entityhub in the logs.

Here is an example of my current logs

    start=0&rows=20&fl=*%2Cscore&q=%28%28%40en%2Frdfs%5C%3Alabel%2F%3Aseating%29%29+OR+%28%28%40en%2Frdfs%5C%3Alabel%2F%3Afabric%29%29

that decodes to

    start=0&rows=20&fl=*,score&q=((@en/rdfs\:label/:seating)) OR
((@en/rdfs\:label/:fabric))


best
Rupert

> João Oliveira
>
> 2011/9/26 Rupert Westenthaler <ru...@gmail.com>
>
>> Hi
>>
>> On Mon, Sep 26, 2011 at 3:16 PM, Olivier Grisel
>> <ol...@ensta.org> wrote:
>> > 2011/9/26 João Pedro Oliveira <jo...@metatheke.com>:
>> >> Good Afternoon.
>> >>
>> >> Is there any way to use the Solr Service through Apache Stanbol? I need
>> to
>> >> make a faceting search over my entities stored in the entity hub. I´m
>> >> currently using the Query and Find endpoints from Stanbol but wath I
>> wanted
>> >> was to make a more simple search, just dividing my files by categories
>> to
>> >> get the total number of indexed files in each one.
>>
>> Currently the only possibility us to configure the Entityhub to use an
>> external SolrServer.
>>
>> You can use a normal SolrServer (version 3.3+). However you need to
>> configure it with a core compatible to the configuration expected by
>> the SolrYard.
>> If you want to start from scratch you can find the default configuration at
>> [1].
>> If you want to reuse the current data you can find the currently used
>> index under
>>
>>    {stanbol-root}/sling/entityhub/solrYard/indexes/{index}
>>
>> Just copy the {index} over to the external SolrServer.
>>
>> To configure the Entityhub to use an external SolrServer
>>
>> * go to the configuration tab of the Apache Felix Webconsole
>> (http://localhost:8080/system/console/configMgr)
>> * search for the "Apache Stanbol Entityhub Yard: Solr Yard Configuration"
>> * open the configuration of the correct SolrYard
>> * change the value of the "Solr Index/Core" to the external "http://.."
>> url.
>>
>> Before writing Queries you need to know how the SolrYard encodes RDF
>> Properties in field:
>>
>> In general:
>>
>> * All triples with the same subject are added to the same Solr
>> document with the "uri":"{subject}
>> * RDF properties are encoded "{prefix}/{ns-prefix}:{local-name}/"
>> where the {prefix} represents the datatype/language of the value
>>
>> (1) namespace prefix mappings
>>
>> All {ns-prefix} used within the index are stored in a special document
>> within the index.
>> This document has the id ("uri" is the field used for ids)
>>
>>    "urn:eu.iksproject:rick.yard.solr:config.namespacePrefixConfig"
>>
>> all fields within this document start with "_config/"
>>
>> (2) field prefixes
>>
>> The schema.xml gives an good overview over the defined prefixes. This
>> file can be found under "{index}/conf/schema.xml"
>>
>> Short overview:
>>
>> * "@{lang}" for languages
>> * "_!@" contains all text AND string values
>> * "bool", "int", "lon", "flo", "dou", "cal", "dur" for primitive datatypes
>> * "ref" for references (URI values)
>> * "str" for string values of the datatype xsd:string
>>
>> special fields:
>>
>> * "uri" document id field
>> * "_domain" is used by the SolrYard in cases where more than one
>> SolrYard instances use the same SolrServer/Core
>> * "_text" stores all text AND string values of ALL fields (and the
>> default search field)
>> * "_ref" stores all URI values of ALL fields (can be used to semantic
>> context searches)
>>
>> I recommend to open an index of a SolrYard within Luke [2] and have
>> your own look on how the data are stored.
>>
>> @João: I know this is a little bit complex ... if you have any
>> additional questions feel free to ask. You can also join the #stanbol
>> channel on IRC and ask me directly.
>>
>> >
>> > I also think we should make it possible to enable the raw Solr servlet
>> > from the SolrYard configuration. I wonder if this would be complicated
>> > to implement though.
>> >
>> > Rupert, any thought?
>> >
>>
>> @Olivier
>>
>> In my opinion it a border line use case, because the way how the
>> SolrYard encodes fields would make custom queries very complex.
>> However if more users request this feature we need definitely have a
>> look.
>>
>> In case of an external SorlServer we could simple forward requests. In
>> case of an EmbeddedSolrServer I do have access to the SolrCore.
>> Hopefully one can initialize the sold servlet based on that.
>>
>> best
>> Rupert
>>
>>
>> [1]
>> http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/default.solrindex.zip
>> [2] http://code.google.com/p/luke/
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen