You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Jean-Marc Vanel <je...@gmail.com> on 2016/11/01 09:01:28 UTC

Re: completion with Lucene: desirable from SPARQL

I's too bad that the * joker feature, and other details of the SPARQL to
Lucene query translation, are not documented on the Jena text search page.

Anyway, it works for my use case, I now have on my laptop a (kind of)
replacement of dbPedia lookup service.

To experiment with the original dbPedia lookup service, you can go to
semantic_forms sandbox:
http://163.172.179.125:9111/create?uri=&uri=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson
and type a few letters in the dct:subject field.

I don't need the full original literal value, because the URI results of
the query are labelled in the application: a foaf:Person is labelled by
given and family names, etc.

BUT, there is a "but", the dbPedia lookup service are apropriately ordered
by "notoriety".
Instead, I currently get with http://localhost:9000/lookup?q=*Pari*
on my TDB that mirrors dbPedia.

<ArrayOfResult>
        <Result>
          <Label>Université Pierre-et-Marie-Curie</Label>
          <URI>http://dbpedia.org/resource/Pierre_and_Marie_Curie_University
</URI>
        </Result><Result>
          <Label>Guillaume Le Gentil</Label>
          <URI>http://dbpedia.org/resource/Guillaume_Le_Gentil</URI>
        </Result><Result>
          <Label>1 E1 m</Label>
          <URI>http://dbpedia.org/resource/1_decametre</URI>
        </Result><Result>
          <Label>1 E4 m</Label>
          <URI>http://dbpedia.org/resource/1_myriametre</URI>
        </Result><Result>
          <Label>Nadia Boulanger</Label>
          <URI>http://dbpedia.org/resource/Nadia_Boulanger</URI>
        </Result><Result>
          <Label>Luis Mariano</Label>
          <URI>http://dbpedia.org/resource/Luis_Mariano</URI>
        </Result><Result>
          <Label>Paul Chemetov</Label>
          <URI>http://dbpedia.org/resource/Paul_Chemetov</URI>
        </Result><Result>
          <Label>Marc Boegner</Label>
          <URI>http://dbpedia.org/resource/Marc_Boegner</URI>
        </Result><Result>
          <Label>Cassandre (graphiste)</Label>
          <URI>http://dbpedia.org/resource/Cassandre_(artist)</URI>
        </Result><Result>
          <Label>La Norville</Label>
          <URI>http://dbpedia.org/resource/La_Norville</URI>
        </Result>
    </ArrayOfResult>

My understanding is that I need to set a weight on URI's in Lucene to
reflect their "notoriety".
I see 2 ways:

   1. easy to implement: just count the triples from and to the URI
   2. also take in account the the URI's consulted by user in my
   application (but currently I don't record that information); there is
   also the issue of combining weights 1) and 2)

Google search does both weightings.

So, in the short term I have to figure out how to add weights to the Lucene
- Jena index.

Then I have to read what dbPedia lookup does, and other background material.



2016-10-31 16:42 GMT+01:00 Osma Suominen <os...@helsinki.fi>:

> Hi Jean-Marc,
>
> Depending on what exactly you want from such a service, this may be
> already possible with jena-text.
>
> I'm assuming that you want to perform a prefix search such as "édu*" and
> get possible completions for that, such as "éducation".
>
> You can of course already do a prefix search with jena-text. What you will
> get back will be the RDF resources which have labels that contain this
> prefix. If the text index is configured to store literal values, you can
> ask for the actual values as well.
>
> E.g. with this data:
>
> ex:cse rdfs:label "Conseil supérieur de l'éducation"@fr .
>
> and a suitably configured jena-text index, you can perform this query:
>
> (?s ?score ?literal) text:query (rdfs:label "édu*") .
>
> and get back these bindings:
>
> ?s=ex:cse ?literal="Conseil supérieur de l'éducation"@fr
>
> However, you will get the full original literal value, not just the
> individual word that matched ("éducation"). If you want just the matched
> word, you will need special support that jena-text doesn't currently have.
>
> -Osma
>
> On 17/10/16 11:37, Jean-Marc Vanel wrote:
>
>> Hi
>>
>> I'm implementing an equivalent of dbPedia lookup service [1] in
>> semantic_forms, leveraging on Lucene integration in TDB, and dbPedia
>> mirror
>> with TDB [2] .
>>
>> The dbPedia lookup service is really nice but:
>>
>>     - the hosted service is often down
>>     - completion is in english only
>>
>> A lookup service with TDB and Lucene would overcome these 2 problems.
>>
>> So I would need completion with Lucene from SPARQL.
>> According to Jena doc., this does not seems to be implemented:
>> https://jena.apache.org/documentation/query/text-query.html#
>> query-with-sparql
>>
>> There are plenty of pages when searching for
>> lucene completion
>>
>>  From these pages there is a code snippet here
>> http://stackoverflow.com/questions/120180/how-to-do-query-
>> auto-completion-suggestions-in-lucene
>> but a regular Lucene API may exist.
>>
>> [1] https://github.com/dbpedia/lookup
>> [2]
>> https://github.com/jmvanel/semantic_forms/blob/master/doc/
>> en/administration.md#populating-with-dbpedia-mirroring-dbpedia
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
Jean-Marc Vanel
Profil:
http://163.172.179.125:9111/display?displayuri=http%3A%2F%2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui