You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Stephane Gamard <st...@salsadev.com> on 2010/12/02 18:01:46 UTC

Search Package in FISE

Hi all, 

First things first: congrats for the work on FISE! It looks so great (almost sexy we could say) since my last checkout a few months ago! And since then we've finally implemented our enhancementEngine (patch on the way) and we are now finalizing a "search-stack" very much inspired from the enhancement-stack present in trunk. 

As per for enhancement, the idea is to provide multiple searchers within the same ease and efficiency as the enhancement engine paradigm. To do so we thought of a simple "searchEngine" interface (attached in mail). 

Engines must respond wether or not they can process the Query (will be if different types: KWQuery, SparqlQuery, ...) and might execute a search. Based on the implementation of the current Sparql Search I am puzzled as what should these search engine return. While a ResultSet object makes most sense, the Sparql Implementation can return Boolean, Graph & ResultSet. 

My question is, in simple terms, what would you expect to have as a result of search in such schema?

Thank you for your inputs, 

_Stephane




-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
St�phane Gamard
CTO, Co-Founder
http://www.salsadev.com
http://www.linkedin.com/in/sgamard
http://twitter.com/sgamard

Mobile:  +41 (0)78 914 8802
Office:  +41 (0)22 884 8302
Skype:  stephane.gamard
Email: stephane.gamard@salsadev.com

Re: Search Package in FISE

Posted by Rupert Westenthaler <rw...@apache.org>.

Hi all,

As requested by Olivier I will try to provide some additional information.

The ReferencedSiteManager provides functionality that is similar to
what you describe as "searchEngine".
It manages all "referenced sites" that are configured for the RICK. A
"referenced Site" in RICK is something like an "entity hub" - a
service that provides knowledge and content for entities. That what
linked data endpoints do - e.g. dbPedia provides knowledge and links
to content for entities as described by Wikipedia; geonames.org
provides knowledge for spatial entities, musicbrainz.org does the same
for albums, bands ...  but also within an organization there are a lot
of potential entity hubs providing information about employees,
customers, contacts, projects, tasks ... that one would like to use as
"referenced site".
So if you want to search all your referenced sites - your whole
knowledge network - via a single access point, than the
ReferencedSiteManager is what you are looking for.

Queries and different Query Types:

When you post a Query to the ReferencedSiteManager, than it forwards
the query to all the currently active referenced sites and collects
the results. So each "referenced site" acts like a "searchEngine" in
that case.
Currently RICK uses so called "FieldQueries" to search for entities.
Such queries can select and constrain values of fields (This allows
requests like "select the name, description, lat and long for all
entities with a name starting with "Par", or the type "city" and with
a population > 10.000").
But it was planed from the beginning to add support for multiple query
types (this feature will be  supported within the next 2-3 month).
The RICK will define several query types - specialized for different
use cases - by its own, but it also planed to support std. query
languages (especially SPARQL).
Some of this query types will be required by all "referenced sites",
but most will be optional. The ReferencedSiteManager and
ReferencedSites will provide services to ask for supported query
types.

The "return type" problem:

The RICK currently used two return types that for queries. Also in
future it is planed, that every Query Tyoe needs to support this two
types:
 - ID: return the URI of matching entities
 - Representation: Thats an interface similar to SolrDocument. Each
Representation represents a selected entity and includes fields
(properties) with any number of values.
So when parsing a query, than you can always ask to return the results
as one of these two types. To give an example: Parsing a SPARQL SELECT
query with the return type "ID" will ignore all selected fields other
than the resources and just return a list of Strings; Parsing a SPARQL
CONSTRUCT query with return type "Representation" will create a
Representation for each Resource in the resulting graph and provide
all outgoing relations as fields. If the implementation can not parse
the necessary information form the query it will throw an Exception.
For all other "return types" the idea is to use InputStream or Object
in the Java API and Accept headers (content negotiation) for the
RESTful services. But that also means that the receiver of the Results
need to know how to parse it.

Finally I would like to mention, that Anreas Gruber an myself have
done some work (maybe one should better say - some thinking) about how
to provide Semantic Search by using LATCH principles (see
http://wiki.iks-project.eu/index.php/Latch-annotation for details).
Currently I do have more important stuff to work on in stanbol, but I
am still planing to start working on a semantic search service like
that within apache stanbol.
The Solr based storage implementation for the RICK (SolrYard) already
implemented with that things is mind and will be the starting point
for this feature.

Finally some Links to mentioned components:

- ReferencedSiteManager: Sorry no real documentation yet
    http://svn.apache.org/repos/asf/incubator/stanbol/trunk/rick/generic/servicesapi/src/main/java/eu/iksproject/rick/servicesapi/site/ReferencedSiteManager.java
    However you might have a look at the RESTful services
    http://wiki.iks-project.eu/index.php/RickRESTfullApi#SITES_Service_Endpoint_.22.2Fsites.22
- ReferencedSite: There is a wiki page that explains how to configure
referenced Sites
    http://wiki.iks-project.eu/index.php/ReferencedSiteConfiguration
    To integrate site one need to provide an implementation of the
EntityDereferencer
    and the EntitySearcher interface. As alternative one can also use
a local cache
    that holds all information of that site (e.g. based on an RDF dump)
- FieldQuery: There is no documentation for the Query API, because that will
    be extended a lot in the near future.
    http://svn.apache.org/repos/asf/incubator/stanbol/trunk/rick/generic/servicesapi/src/main/java/eu/iksproject/rick/servicesapi/query/FieldQuery.java
- SolrYard: The "Yard" interface describes the storage component in RICK. Yards
    are used for local caches as well as data stored by the RICK. The SolrYard
    is a Solr Server based implementation of that interface.
    This wiki page describes the configuration of this yard.
    http://wiki.iks-project.eu/index.php/SolrYardConfiguration

best
Rupert Westenthaler

On Thu, Dec 2, 2010 at 6:17 PM, Olivier Grisel <ol...@ensta.org> wrote:
> 2010/12/2 Stephane Gamard <st...@salsadev.com>:
>> Hi all,
>> First things first: congrats for the work on FISE! It looks so great (almost
>> sexy we could say) since my last checkout a few months ago! And since then
>> we've finally implemented our enhancementEngine (patch on the way) and we
>> are now finalizing a "search-stack" very much inspired from the
>> enhancement-stack present in trunk.
>> As per for enhancement, the idea is to provide multiple searchers within the
>> same ease and efficiency as the enhancement engine paradigm. To do so we
>> thought of a simple "searchEngine" interface (attached in mail).
>> Engines must respond wether or not they can process the Query (will be if
>> different types: KWQuery, SparqlQuery, ...) and might execute a search.
>> Based on the implementation of the current Sparql Search I am puzzled as
>> what should these search engine return. While a ResultSet object makes most
>> sense, the Sparql Implementation can return Boolean, Graph & ResultSet.
>> My question is, in simple terms, what would you expect to have as a result
>> of search in such schema?
>
> Have you seen the content of the rick subproject? This is pretty much
> what you are describing:
>
>  http://svn.apache.org/repos/asf/incubator/stanbol/trunk/rick/
>
> Right now the documentation is still on the old site but is scheduled
> to move to the apache server soon:
>
>  http://wiki.iks-project.eu/index.php/RICK
>
> I let Rupert give a more detailed answer on the search query result API.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Search Package in FISE

Posted by Olivier Grisel <ol...@ensta.org>.

2010/12/2 Stephane Gamard <st...@salsadev.com>:
> Hi all,
> First things first: congrats for the work on FISE! It looks so great (almost
> sexy we could say) since my last checkout a few months ago! And since then
> we've finally implemented our enhancementEngine (patch on the way) and we
> are now finalizing a "search-stack" very much inspired from the
> enhancement-stack present in trunk.
> As per for enhancement, the idea is to provide multiple searchers within the
> same ease and efficiency as the enhancement engine paradigm. To do so we
> thought of a simple "searchEngine" interface (attached in mail).
> Engines must respond wether or not they can process the Query (will be if
> different types: KWQuery, SparqlQuery, ...) and might execute a search.
> Based on the implementation of the current Sparql Search I am puzzled as
> what should these search engine return. While a ResultSet object makes most
> sense, the Sparql Implementation can return Boolean, Graph & ResultSet.
> My question is, in simple terms, what would you expect to have as a result
> of search in such schema?

Have you seen the content of the rick subproject? This is pretty much
what you are describing:

  http://svn.apache.org/repos/asf/incubator/stanbol/trunk/rick/

Right now the documentation is still on the old site but is scheduled
to move to the apache server soon:

  http://wiki.iks-project.eu/index.php/RICK

I let Rupert give a more detailed answer on the search query result API.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel