You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Jean-Marc Vanel <je...@gmail.com> on 2016/11/01 09:01:28 UTC

Re: completion with Lucene: desirable from SPARQL

I's too bad that the * joker feature, and other details of the SPARQL to
Lucene query translation, are not documented on the Jena text search page.

Anyway, it works for my use case, I now have on my laptop a (kind of)
replacement of dbPedia lookup service.

To experiment with the original dbPedia lookup service, you can go to
semantic_forms sandbox:
http://163.172.179.125:9111/create?uri=&uri=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson
and type a few letters in the dct:subject field.

I don't need the full original literal value, because the URI results of
the query are labelled in the application: a foaf:Person is labelled by
given and family names, etc.

BUT, there is a "but", the dbPedia lookup service are apropriately ordered
by "notoriety".
Instead, I currently get with http://localhost:9000/lookup?q=*Pari*
on my TDB that mirrors dbPedia.

<ArrayOfResult>
        <Result>
          <Label>Université Pierre-et-Marie-Curie</Label>
          <URI>http://dbpedia.org/resource/Pierre_and_Marie_Curie_University
</URI>
        </Result><Result>
          <Label>Guillaume Le Gentil</Label>
          <URI>http://dbpedia.org/resource/Guillaume_Le_Gentil</URI>
        </Result><Result>
          <Label>1 E1 m</Label>
          <URI>http://dbpedia.org/resource/1_decametre</URI>
        </Result><Result>
          <Label>1 E4 m</Label>
          <URI>http://dbpedia.org/resource/1_myriametre</URI>
        </Result><Result>
          <Label>Nadia Boulanger</Label>
          <URI>http://dbpedia.org/resource/Nadia_Boulanger</URI>
        </Result><Result>
          <Label>Luis Mariano</Label>
          <URI>http://dbpedia.org/resource/Luis_Mariano</URI>
        </Result><Result>
          <Label>Paul Chemetov</Label>
          <URI>http://dbpedia.org/resource/Paul_Chemetov</URI>
        </Result><Result>
          <Label>Marc Boegner</Label>
          <URI>http://dbpedia.org/resource/Marc_Boegner</URI>
        </Result><Result>
          <Label>Cassandre (graphiste)</Label>
          <URI>http://dbpedia.org/resource/Cassandre_(artist)</URI>
        </Result><Result>
          <Label>La Norville</Label>
          <URI>http://dbpedia.org/resource/La_Norville</URI>
        </Result>
    </ArrayOfResult>

My understanding is that I need to set a weight on URI's in Lucene to
reflect their "notoriety".
I see 2 ways:

   1. easy to implement: just count the triples from and to the URI
   2. also take in account the the URI's consulted by user in my
   application (but currently I don't record that information); there is
   also the issue of combining weights 1) and 2)

Google search does both weightings.

So, in the short term I have to figure out how to add weights to the Lucene
- Jena index.

Then I have to read what dbPedia lookup does, and other background material.



2016-10-31 16:42 GMT+01:00 Osma Suominen <os...@helsinki.fi>:

> Hi Jean-Marc,
>
> Depending on what exactly you want from such a service, this may be
> already possible with jena-text.
>
> I'm assuming that you want to perform a prefix search such as "édu*" and
> get possible completions for that, such as "éducation".
>
> You can of course already do a prefix search with jena-text. What you will
> get back will be the RDF resources which have labels that contain this
> prefix. If the text index is configured to store literal values, you can
> ask for the actual values as well.
>
> E.g. with this data:
>
> ex:cse rdfs:label "Conseil supérieur de l'éducation"@fr .
>
> and a suitably configured jena-text index, you can perform this query:
>
> (?s ?score ?literal) text:query (rdfs:label "édu*") .
>
> and get back these bindings:
>
> ?s=ex:cse ?literal="Conseil supérieur de l'éducation"@fr
>
> However, you will get the full original literal value, not just the
> individual word that matched ("éducation"). If you want just the matched
> word, you will need special support that jena-text doesn't currently have.
>
> -Osma
>
> On 17/10/16 11:37, Jean-Marc Vanel wrote:
>
>> Hi
>>
>> I'm implementing an equivalent of dbPedia lookup service [1] in
>> semantic_forms, leveraging on Lucene integration in TDB, and dbPedia
>> mirror
>> with TDB [2] .
>>
>> The dbPedia lookup service is really nice but:
>>
>>     - the hosted service is often down
>>     - completion is in english only
>>
>> A lookup service with TDB and Lucene would overcome these 2 problems.
>>
>> So I would need completion with Lucene from SPARQL.
>> According to Jena doc., this does not seems to be implemented:
>> https://jena.apache.org/documentation/query/text-query.html#
>> query-with-sparql
>>
>> There are plenty of pages when searching for
>> lucene completion
>>
>>  From these pages there is a code snippet here
>> http://stackoverflow.com/questions/120180/how-to-do-query-
>> auto-completion-suggestions-in-lucene
>> but a regular Lucene API may exist.
>>
>> [1] https://github.com/dbpedia/lookup
>> [2]
>> https://github.com/jmvanel/semantic_forms/blob/master/doc/
>> en/administration.md#populating-with-dbpedia-mirroring-dbpedia
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
Jean-Marc Vanel
Profil:
http://163.172.179.125:9111/display?displayuri=http%3A%2F%2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: completion with Lucene: desirable from SPARQL

Posted by "Lorenz B." <bu...@informatik.uni-leipzig.de>.
Hi Jean-Marc,


> Guten Tag Lorenz !
Good job! German is a very difficult language.
>
> I don't know what is "IR" .
IR = Information Retrieval, which is what Lucene is basically made for.
>
> And reusing Lucene is the plan.
> The current code is here (as I mentionned earlier in this thread):
> https://github.com/jmvanel/semantic_forms/blob/master/
> scala/forms/src/main/scala/deductions/runtime/jena/
> lucene/TextIndexerWeight.scala
>
> I don't know how to combine TF-IDF with ranking based on links.
> I'm not even sure that, in an RDF world, term frequency is bringing much
> useful information.
> If you have some synthesis articles to recommend on search in RDF world, or
> in general, that would help.
There has been some discussion how to combine ranking metrics like
pagerank with the standard Lucene score, e.g. [1], [2]
I think this can be done via boosting during indexing or by some
user-defined sort.

There has been a lots of research regrading entity ranking, among
others, you can have a look at [3]

[1]
http://blog.trifork.com/2011/11/16/apache-lucene-flexiblescoring-with-indexdocvalues/
[2]
http://stackoverflow.com/questions/22473498/solr-boost-score-based-on-wikipedia-pagerank-and-solr-score
[3] http://ceur-ws.org/Vol-1586/know2.pdf
>
>
> I put on the sandbox the ranking in research (counting the links à la Google
> rank), so my FOAF profile is now first, due to many cco:expertise links :
> http://163.172.179.125:9111/wordsearch?q=Jean-Marc
> In good Company with Jean Sablon, Jean Moulin, and pope JP 2.
>
> The TDB was populated with dbpedia with these scripts :
> https://github.com/jmvanel/semantic_forms/blob/master/scala/forms_play/scripts/download-dbpedia.sh
> https://github.com/jmvanel/semantic_forms/blob/master/scala/forms_play/scripts/populate_with_dbpedia.sh
>
>
> 2016-11-04 10:05 GMT+01:00 Lorenz B. <bu...@informatik.uni-leipzig.de>:
>
>> Hello Jean-Marc,
>>
>> I think adding something like a pagerank score would improve the
>> results. Lucene itself just uses more or less the standard IR measure
>> TF/IDF.
>>
>>
>>
>> Cheers,
>> Lorenz
>>
>>> Osma,
>>>
>>> That makes sense,
>>> and the first tests are not bad.
>>>
>>> Although I'm surprised that "par*" does not get dbpedia:Paris in the
>> first
>>> 10;
>>> but "pari*" does get dbpedia:Paris in the first position:
>>>
>>> "count" "s"
>>> "3090"^^http://www.w3.org/2001/XMLSchema#integer
>>> http://dbpedia.org/resource/Paris
>>> "2676"^^http://www.w3.org/2001/XMLSchema#integer
>>> http://dbpedia.org/resource/London
>>> "72"^^http://www.w3.org/2001/XMLSchema#integer
>>> http://dbpedia.org/resource/Émile_Durkheim
>>> "68"^^http://www.w3.org/2001/XMLSchema#integer
>> http://dbpedia.org/resource/
>>> Henri_Bergson
>>> "66"^^http://www.w3.org/2001/XMLSchema#integer
>> http://dbpedia.org/resource/
>>> 20th_arrondissement_of_Paris
>>> "64"^^http://www.w3.org/2001/XMLSchema#integer
>> http://dbpedia.org/resource/
>>> Cornelius_Castoriadis
>>> "64"^^http://www.w3.org/2001/XMLSchema#integer
>> http://dbpedia.org/resource/
>>> Jacques_Derrida
>>> "63"^^http://www.w3.org/2001/XMLSchema#integer
>> http://dbpedia.org/resource/
>>> Michel_Foucault "62"^^http://www.w3.org/2001/XMLSchema#integer
>>> http://dbpedia.org/resource/Louis,_Grand_Condé
>>> "60"^^http://www.w3.org/2001/XMLSchema#integer
>> http://dbpedia.org/resource/
>>> Jean-Jacques_Rousseau
>>>
>>>
>>> I'll add that SPARQL in my sandbox as a replacement of dbpedia lookup
>>> service,
>>> and tell you how it goes.
>>> But I foresee that using the Lucene implementation after adding the
>> weights
>>> will be more efficient. But that demands more work...
>>>
>>>
>>> 2016-11-03 14:30 GMT+01:00 Osma Suominen <os...@helsinki.fi>:
>>>
>>>> Hi Jean-Marc!
>>>>
>>>> AFAIK using the weights to order results is intimately linked to the
>> text
>>>>> index querying.
>>>>> If I want the top 10 results, the search must have the weights
>> beforehand
>>>>> otherwise I must get all the results to filter later.
>>>>> This is the reason for using AnalyzingInfixSuggester.
>>>>> Lucene 4_9_1
>>>>> https://lucene.apache.org/core/4_9_1/suggest/org/apache/luce
>>>>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>>>>> Lucene 6_2_1
>>>>> https://lucene.apache.org/core/6_2_1/suggest/org/apache/luce
>>>>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>>>>>
>>>>> I guess this is what you call "performance reasons" .
>>>>>
>>>> I don't see why you couldn't, in principle, do something like this:
>>>>
>>>> SELECT ?s (COUNT(*) as ?count)
>>>> WHERE {
>>>>   ?s text:query "édu*" .
>>>>   ?s ?p ?o .
>>>> }
>>>> GROUP BY ?s
>>>> ORDER BY DESC(?count)
>>>> LIMIT 10
>>>>
>>>> (note: untested query)
>>>>
>>>> I'm sure it will get slow if the number of hits from the text index is
>>>> more than a few dozen. But for a small number of results at a time, it
>>>> might work.
>>>>
>>>> As I wrote in the original post, "I'll have to implement also the
>> callback
>>>>> for updates
>>>>> like class TextDocProducerTriples in Jena-text." .
>>>>> http://jena.apache.org/documentation/javadoc/text/org/apache
>>>>> /jena/query/text/TextDocProducerTriples.html
>>>>>
>>>> Isn't that called only when the indexed triple changes (e.g. the one
>> with
>>>> rdfs:label or skos:prefLabel or whatever property you are indexing), but
>>>> not when other data related to the same subject changes? So if new
>> triples
>>>> are added for the same subject, but its label is unchanged, then the
>> text
>>>> index won't see the update and thus the count of references/triples
>> won't
>>>> be updated either.
>>>>
>>>> I may be wrong here, I'm not sure how the update tracking works.
>>>>
>>>> -Osma
>>>>
>>>>
>>>>
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suominen@helsinki.fi
>>>> http://www.nationallibrary.fi
>>>>
>>>
>> --
>> Lorenz Bühmann
>> AKSW group, University of Leipzig
>> Group: http://aksw.org - semantic web research center
>>
>>
>
-- 
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center


Re: completion with Lucene: desirable from SPARQL

Posted by Jean-Marc Vanel <je...@gmail.com>.
Guten Tag Lorenz !

I don't know what is "IR" .

And reusing Lucene is the plan.
The current code is here (as I mentionned earlier in this thread):
https://github.com/jmvanel/semantic_forms/blob/master/
scala/forms/src/main/scala/deductions/runtime/jena/
lucene/TextIndexerWeight.scala

I don't know how to combine TF-IDF with ranking based on links.
I'm not even sure that, in an RDF world, term frequency is bringing much
useful information.
If you have some synthesis articles to recommend on search in RDF world, or
in general, that would help.


I put on the sandbox the ranking in research (counting the links à la Google
rank), so my FOAF profile is now first, due to many cco:expertise links :
http://163.172.179.125:9111/wordsearch?q=Jean-Marc
In good Company with Jean Sablon, Jean Moulin, and pope JP 2.

The TDB was populated with dbpedia with these scripts :
https://github.com/jmvanel/semantic_forms/blob/master/scala/forms_play/scripts/download-dbpedia.sh
https://github.com/jmvanel/semantic_forms/blob/master/scala/forms_play/scripts/populate_with_dbpedia.sh


2016-11-04 10:05 GMT+01:00 Lorenz B. <bu...@informatik.uni-leipzig.de>:

> Hello Jean-Marc,
>
> I think adding something like a pagerank score would improve the
> results. Lucene itself just uses more or less the standard IR measure
> TF/IDF.
>
>
>
> Cheers,
> Lorenz
>
> > Osma,
> >
> > That makes sense,
> > and the first tests are not bad.
> >
> > Although I'm surprised that "par*" does not get dbpedia:Paris in the
> first
> > 10;
> > but "pari*" does get dbpedia:Paris in the first position:
> >
> > "count" "s"
> > "3090"^^http://www.w3.org/2001/XMLSchema#integer
> > http://dbpedia.org/resource/Paris
> > "2676"^^http://www.w3.org/2001/XMLSchema#integer
> > http://dbpedia.org/resource/London
> > "72"^^http://www.w3.org/2001/XMLSchema#integer
> > http://dbpedia.org/resource/Émile_Durkheim
> > "68"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/
> > Henri_Bergson
> > "66"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/
> > 20th_arrondissement_of_Paris
> > "64"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/
> > Cornelius_Castoriadis
> > "64"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/
> > Jacques_Derrida
> > "63"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/
> > Michel_Foucault "62"^^http://www.w3.org/2001/XMLSchema#integer
> > http://dbpedia.org/resource/Louis,_Grand_Condé
> > "60"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/
> > Jean-Jacques_Rousseau
> >
> >
> > I'll add that SPARQL in my sandbox as a replacement of dbpedia lookup
> > service,
> > and tell you how it goes.
> > But I foresee that using the Lucene implementation after adding the
> weights
> > will be more efficient. But that demands more work...
> >
> >
> > 2016-11-03 14:30 GMT+01:00 Osma Suominen <os...@helsinki.fi>:
> >
> >> Hi Jean-Marc!
> >>
> >> AFAIK using the weights to order results is intimately linked to the
> text
> >>> index querying.
> >>> If I want the top 10 results, the search must have the weights
> beforehand
> >>> otherwise I must get all the results to filter later.
> >>> This is the reason for using AnalyzingInfixSuggester.
> >>> Lucene 4_9_1
> >>> https://lucene.apache.org/core/4_9_1/suggest/org/apache/luce
> >>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
> >>> Lucene 6_2_1
> >>> https://lucene.apache.org/core/6_2_1/suggest/org/apache/luce
> >>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
> >>>
> >>> I guess this is what you call "performance reasons" .
> >>>
> >> I don't see why you couldn't, in principle, do something like this:
> >>
> >> SELECT ?s (COUNT(*) as ?count)
> >> WHERE {
> >>   ?s text:query "édu*" .
> >>   ?s ?p ?o .
> >> }
> >> GROUP BY ?s
> >> ORDER BY DESC(?count)
> >> LIMIT 10
> >>
> >> (note: untested query)
> >>
> >> I'm sure it will get slow if the number of hits from the text index is
> >> more than a few dozen. But for a small number of results at a time, it
> >> might work.
> >>
> >> As I wrote in the original post, "I'll have to implement also the
> callback
> >>> for updates
> >>> like class TextDocProducerTriples in Jena-text." .
> >>> http://jena.apache.org/documentation/javadoc/text/org/apache
> >>> /jena/query/text/TextDocProducerTriples.html
> >>>
> >> Isn't that called only when the indexed triple changes (e.g. the one
> with
> >> rdfs:label or skos:prefLabel or whatever property you are indexing), but
> >> not when other data related to the same subject changes? So if new
> triples
> >> are added for the same subject, but its label is unchanged, then the
> text
> >> index won't see the update and thus the count of references/triples
> won't
> >> be updated either.
> >>
> >> I may be wrong here, I'm not sure how the update tracking works.
> >>
> >> -Osma
> >>
> >>
> >>
> >> --
> >> Osma Suominen
> >> D.Sc. (Tech), Information Systems Specialist
> >> National Library of Finland
> >> P.O. Box 26 (Kaikukatu 4)
> >> 00014 HELSINGIN YLIOPISTO
> >> Tel. +358 50 3199529
> >> osma.suominen@helsinki.fi
> >> http://www.nationallibrary.fi
> >>
> >
> >
> --
> Lorenz Bühmann
> AKSW group, University of Leipzig
> Group: http://aksw.org - semantic web research center
>
>


-- 
Jean-Marc Vanel
Profil:
http://163.172.179.125:9111/display?displayuri=http%3A%2F%2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: completion with Lucene: desirable from SPARQL

Posted by "Lorenz B." <bu...@informatik.uni-leipzig.de>.
Hello Jean-Marc,

I think adding something like a pagerank score would improve the
results. Lucene itself just uses more or less the standard IR measure
TF/IDF.



Cheers,
Lorenz

> Osma,
>
> That makes sense,
> and the first tests are not bad.
>
> Although I'm surprised that "par*" does not get dbpedia:Paris in the first
> 10;
> but "pari*" does get dbpedia:Paris in the first position:
>
> "count" "s"
> "3090"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/Paris
> "2676"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/London
> "72"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/�mile_Durkheim
> "68"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
> Henri_Bergson
> "66"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
> 20th_arrondissement_of_Paris
> "64"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
> Cornelius_Castoriadis
> "64"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
> Jacques_Derrida
> "63"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
> Michel_Foucault "62"^^http://www.w3.org/2001/XMLSchema#integer
> http://dbpedia.org/resource/Louis,_Grand_Cond�
> "60"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
> Jean-Jacques_Rousseau
>
>
> I'll add that SPARQL in my sandbox as a replacement of dbpedia lookup
> service,
> and tell you how it goes.
> But I foresee that using the Lucene implementation after adding the weights
> will be more efficient. But that demands more work...
>
>
> 2016-11-03 14:30 GMT+01:00 Osma Suominen <os...@helsinki.fi>:
>
>> Hi Jean-Marc!
>>
>> AFAIK using the weights to order results is intimately linked to the text
>>> index querying.
>>> If I want the top 10 results, the search must have the weights beforehand
>>> otherwise I must get all the results to filter later.
>>> This is the reason for using AnalyzingInfixSuggester.
>>> Lucene 4_9_1
>>> https://lucene.apache.org/core/4_9_1/suggest/org/apache/luce
>>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>>> Lucene 6_2_1
>>> https://lucene.apache.org/core/6_2_1/suggest/org/apache/luce
>>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>>>
>>> I guess this is what you call "performance reasons" .
>>>
>> I don't see why you couldn't, in principle, do something like this:
>>
>> SELECT ?s (COUNT(*) as ?count)
>> WHERE {
>>   ?s text:query "�du*" .
>>   ?s ?p ?o .
>> }
>> GROUP BY ?s
>> ORDER BY DESC(?count)
>> LIMIT 10
>>
>> (note: untested query)
>>
>> I'm sure it will get slow if the number of hits from the text index is
>> more than a few dozen. But for a small number of results at a time, it
>> might work.
>>
>> As I wrote in the original post, "I'll have to implement also the callback
>>> for updates
>>> like class TextDocProducerTriples in Jena-text." .
>>> http://jena.apache.org/documentation/javadoc/text/org/apache
>>> /jena/query/text/TextDocProducerTriples.html
>>>
>> Isn't that called only when the indexed triple changes (e.g. the one with
>> rdfs:label or skos:prefLabel or whatever property you are indexing), but
>> not when other data related to the same subject changes? So if new triples
>> are added for the same subject, but its label is unchanged, then the text
>> index won't see the update and thus the count of references/triples won't
>> be updated either.
>>
>> I may be wrong here, I'm not sure how the update tracking works.
>>
>> -Osma
>>
>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
-- 
Lorenz B�hmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center


Re: completion with Lucene: desirable from SPARQL

Posted by Jean-Marc Vanel <je...@gmail.com>.
Osma,

That makes sense,
and the first tests are not bad.

Although I'm surprised that "par*" does not get dbpedia:Paris in the first
10;
but "pari*" does get dbpedia:Paris in the first position:

"count" "s"
"3090"^^http://www.w3.org/2001/XMLSchema#integer
http://dbpedia.org/resource/Paris
"2676"^^http://www.w3.org/2001/XMLSchema#integer
http://dbpedia.org/resource/London
"72"^^http://www.w3.org/2001/XMLSchema#integer
http://dbpedia.org/resource/Émile_Durkheim
"68"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Henri_Bergson
"66"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
20th_arrondissement_of_Paris
"64"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Cornelius_Castoriadis
"64"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Jacques_Derrida
"63"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Michel_Foucault "62"^^http://www.w3.org/2001/XMLSchema#integer
http://dbpedia.org/resource/Louis,_Grand_Condé
"60"^^http://www.w3.org/2001/XMLSchema#integer http://dbpedia.org/resource/
Jean-Jacques_Rousseau


I'll add that SPARQL in my sandbox as a replacement of dbpedia lookup
service,
and tell you how it goes.
But I foresee that using the Lucene implementation after adding the weights
will be more efficient. But that demands more work...


2016-11-03 14:30 GMT+01:00 Osma Suominen <os...@helsinki.fi>:

> Hi Jean-Marc!
>
> AFAIK using the weights to order results is intimately linked to the text
>> index querying.
>> If I want the top 10 results, the search must have the weights beforehand
>> otherwise I must get all the results to filter later.
>> This is the reason for using AnalyzingInfixSuggester.
>> Lucene 4_9_1
>> https://lucene.apache.org/core/4_9_1/suggest/org/apache/luce
>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>> Lucene 6_2_1
>> https://lucene.apache.org/core/6_2_1/suggest/org/apache/luce
>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>>
>> I guess this is what you call "performance reasons" .
>>
>
> I don't see why you couldn't, in principle, do something like this:
>
> SELECT ?s (COUNT(*) as ?count)
> WHERE {
>   ?s text:query "édu*" .
>   ?s ?p ?o .
> }
> GROUP BY ?s
> ORDER BY DESC(?count)
> LIMIT 10
>
> (note: untested query)
>
> I'm sure it will get slow if the number of hits from the text index is
> more than a few dozen. But for a small number of results at a time, it
> might work.
>
> As I wrote in the original post, "I'll have to implement also the callback
>> for updates
>> like class TextDocProducerTriples in Jena-text." .
>> http://jena.apache.org/documentation/javadoc/text/org/apache
>> /jena/query/text/TextDocProducerTriples.html
>>
>
> Isn't that called only when the indexed triple changes (e.g. the one with
> rdfs:label or skos:prefLabel or whatever property you are indexing), but
> not when other data related to the same subject changes? So if new triples
> are added for the same subject, but its label is unchanged, then the text
> index won't see the update and thus the count of references/triples won't
> be updated either.
>
> I may be wrong here, I'm not sure how the update tracking works.
>
> -Osma
>
>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
Jean-Marc Vanel
Profil: http://163.172.179.125:9111/display?displayuri=http%3A%2F%
2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: completion with Lucene: desirable from SPARQL

Posted by Osma Suominen <os...@helsinki.fi>.
Bonjour Jean-Marc!

04.11.2016, 09:27, Jean-Marc Vanel kirjoitti:
> Looking for Pari* with your SPARQL on dbPedia takes 4 seconds on my
> supposedly efficient laptop CPU:
[...]
> I should try with SSD.
> I don't know whether TDB can exploit multi-core CPU.
> Also I don't know whether I can pre-compile the query with a parameter for
> runtime.

The obvious problem here is that the query has to count all the triples 
with the same subject. I don't think SSD or a CPU with more cores would 
help, at least not much.

What could help is to use a narrower query pattern, for example if you 
could look at only a specific property (or a few, expressed using 
VALUES) instead of every possible property.

> Anyway, I'll implement the ordering by triple count in Semantic_forms.
> Maybe later can it be helpful within Jena-text.

Although you probably can store the triple count in the jena-text index, 
there is also an alternative way that doesn't need any new code; namely, 
to precompute the counts (I'm assuming that your DBpedia data doesn't 
change very often) and store them as triples, which could be done using 
a single SPARQL Update query. Then you could just look up that count in 
the same SPARQL query where you use jena-text and rank the results by 
the count. It should be a lot faster than the query I gave you.

-Osma

PS. Are you still interested in completing the Lucene upgrade? I wrote a 
comment in JENA-1250 about what to do so that we could at least get the 
update to version 5 merged into Jena. At the very minimum, making a PR 
against Jena would indicate (from a legal perspective) that you wish to 
contribute the work to Apache Jena, so that others can make use of it.

-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: completion with Lucene: desirable from SPARQL

Posted by Jean-Marc Vanel <je...@gmail.com>.
Looking for Pari* with your SPARQL on dbPedia takes 4 seconds on my
supposedly efficient laptop CPU:

$ lscpu
Architecture:          x86_64
Mode(s) opératoire(s) des processeurs :32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) par cœur : 2
Cœur(s) par socket : 4
Socket(s):             1
Nœud(s) NUMA :       1
Identifiant constructeur :GenuineIntel
Famille de processeur :6
Modèle :             94
Model name:            Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Révision :           3
Vitesse du processeur en MHz :2644.789
CPU max MHz:           3500,0000
CPU min MHz:           800,0000
BogoMIPS:              5181.67

I should try with SSD.
I don't know whether TDB can exploit multi-core CPU.
Also I don't know whether I can pre-compile the query with a parameter for
runtime.

Anyway, I'll implement the ordering by triple count in Semantic_forms.
Maybe later can it be helpful within Jena-text.


2016-11-03 14:30 GMT+01:00 Osma Suominen <os...@helsinki.fi>:

> Hi Jean-Marc!
>
> AFAIK using the weights to order results is intimately linked to the text
>> index querying.
>> If I want the top 10 results, the search must have the weights beforehand
>> otherwise I must get all the results to filter later.
>> This is the reason for using AnalyzingInfixSuggester.
>> Lucene 4_9_1
>> https://lucene.apache.org/core/4_9_1/suggest/org/apache/luce
>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>> Lucene 6_2_1
>> https://lucene.apache.org/core/6_2_1/suggest/org/apache/luce
>> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html
>>
>> I guess this is what you call "performance reasons" .
>>
>
> I don't see why you couldn't, in principle, do something like this:
>
> SELECT ?s (COUNT(*) as ?count)
> WHERE {
>   ?s text:query "édu*" .
>   ?s ?p ?o .
> }
> GROUP BY ?s
> ORDER BY DESC(?count)
> LIMIT 10
>
> (note: untested query)
>
> I'm sure it will get slow if the number of hits from the text index is
> more than a few dozen. But for a small number of results at a time, it
> might work.
>
> As I wrote in the original post, "I'll have to implement also the callback
>> for updates
>> like class TextDocProducerTriples in Jena-text." .
>> http://jena.apache.org/documentation/javadoc/text/org/
>> apache/jena/query/text/TextDocProducerTriples.html
>>
>
> Isn't that called only when the indexed triple changes (e.g. the one with
> rdfs:label or skos:prefLabel or whatever property you are indexing), but
> not when other data related to the same subject changes? So if new triples
> are added for the same subject, but its label is unchanged, then the text
> index won't see the update and thus the count of references/triples won't
> be updated either.
>
> I may be wrong here, I'm not sure how the update tracking works.
>
> -Osma
>
>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
Jean-Marc Vanel
Profil:
http://163.172.179.125:9111/display?displayuri=http%3A%2F%2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: completion with Lucene: desirable from SPARQL

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Jean-Marc!

> AFAIK using the weights to order results is intimately linked to the text
> index querying.
> If I want the top 10 results, the search must have the weights beforehand
> otherwise I must get all the results to filter later.
> This is the reason for using AnalyzingInfixSuggester.
> Lucene 4_9_1
> https://lucene.apache.org/core/4_9_1/suggest/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.html
> Lucene 6_2_1
> https://lucene.apache.org/core/6_2_1/suggest/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.html
>
> I guess this is what you call "performance reasons" .

I don't see why you couldn't, in principle, do something like this:

SELECT ?s (COUNT(*) as ?count)
WHERE {
   ?s text:query "�du*" .
   ?s ?p ?o .
}
GROUP BY ?s
ORDER BY DESC(?count)
LIMIT 10

(note: untested query)

I'm sure it will get slow if the number of hits from the text index is 
more than a few dozen. But for a small number of results at a time, it 
might work.

> As I wrote in the original post, "I'll have to implement also the callback
> for updates
> like class TextDocProducerTriples in Jena-text." .
> http://jena.apache.org/documentation/javadoc/text/org/apache/jena/query/text/TextDocProducerTriples.html

Isn't that called only when the indexed triple changes (e.g. the one 
with rdfs:label or skos:prefLabel or whatever property you are 
indexing), but not when other data related to the same subject changes? 
So if new triples are added for the same subject, but its label is 
unchanged, then the text index won't see the update and thus the count 
of references/triples won't be updated either.

I may be wrong here, I'm not sure how the update tracking works.

-Osma


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: completion with Lucene: desirable from SPARQL

Posted by Jean-Marc Vanel <je...@gmail.com>.
2016-11-03 13:34 GMT+01:00 Osma Suominen <os...@helsinki.fi>:

> Hi Jean-Marc,
>
> I'm not sure I understand why you need to put the weights inside the
> Lucene index. Is it done for performance reasons?
>

AFAIK using the weights to order results is intimately linked to the text
index querying.
If I want the top 10 results, the search must have the weights beforehand
otherwise I must get all the results to filter later.
This is the reason for using AnalyzingInfixSuggester.
Lucene 4_9_1
https://lucene.apache.org/core/4_9_1/suggest/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.html
Lucene 6_2_1
https://lucene.apache.org/core/6_2_1/suggest/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.html

I guess this is what you call "performance reasons" .


> What if the data changes? I mean, not the indexed subject itself, but for
> example additional triples get added to the dataset using the same subject.
> Surely the Lucene index will get out of date?
>

As I wrote in the original post, "I'll have to implement also the callback
for updates
like class TextDocProducerTriples in Jena-text." .
http://jena.apache.org/documentation/javadoc/text/org/apache/jena/query/text/TextDocProducerTriples.html




> -Osma
>
>
> 03.11.2016, 13:51, Jean-Marc Vanel kirjoitti:
>
>> Hi Osma
>>
>> First I will implement the weight by counting the triples from and to each
>> URI being indexed in Lucene by Jena-text.
>> This will give users a first ordering in results, hopefully satisfying.
>> This is quite similar to the Google page rank, except that instead of
>> counting the <a href="XXX"> , it will count the triples.
>>
>> I sketched some code here with most of the plumbing:
>> https://github.com/jmvanel/semantic_forms/blob/master/scala/
>> forms/src/main/scala/deductions/runtime/jena/lucene/
>> TextIndexerWeight.scala
>>
>> Comments welcome. It's in Scala, but it should be understandable.
>> Note that I have one more library dependency :
>> libraryDependencies += "org.apache.lucene" % "lucene-suggest" % "4.9.1"
>>
>> This is code for batch primary indexing or re-indexing.
>> If this works well, I'll have to implement also the callback for updates
>> like class TextDocProducerTriples in Jena-text.
>>
>>
>>
>> 2016-11-01 13:59 GMT+01:00 Osma Suominen <os...@helsinki.fi>:
>>
>> Hi Jean-Marc,
>>>
>>> The wildcard queries etc. are basic Lucene features, part of Lucene query
>>> syntax, so probably that's why they not documented on the jena-text page.
>>> The query string is simply passed to the Lucene query parser by jena-text
>>> and should support any features of Lucene, see:
>>> http://lucene.apache.org/core/6_2_1/queryparser/org/apache/l
>>> ucene/queryparser/classic/package-summary.html#package.description
>>>
>>> Glad you were able to get your lookup service working!
>>>
>>> Regarding the saving of weights: I think you could simply save them as
>>> triples (perhaps in a separate graph), outside the Lucene index. Then
>>> combine the results of the text:query with the weights from triples using
>>> SPARQL.
>>>
>>> The jena-text query also returns score values. I'm not sure how useful
>>> they are in your use case, but they could potentially be used as a factor
>>> in the overall "notoriety" calculation. Though if you are searching just
>>> for single words or prefixes, chances are that the score values will be
>>> the
>>> same for all results.
>>>
>>> Thanks for all the work on the Lucene 5 and 6 upgrade (JENA-1250)! I hope
>>> we can finish that work and get it merged soon after the 3.1.1 release.
>>> In
>>> any case the newer Lucene version should perform better and be easier to
>>> maintain moving forward.
>>>
>>> -Osma
>>>
>>> On 01/11/16 11:01, Jean-Marc Vanel wrote:
>>>
>>> I's too bad that the * joker feature, and other details of the SPARQL to
>>>> Lucene query translation, are not documented on the Jena text search
>>>> page.
>>>>
>>>> Anyway, it works for my use case, I now have on my laptop a (kind of)
>>>> replacement of dbPedia lookup service.
>>>>
>>>> To experiment with the original dbPedia lookup service, you can go to
>>>> semantic_forms sandbox:
>>>> http://163.172.179.125:9111/create?uri=&uri=http%3A%2F%2Fxml
>>>> ns.com%2Ffoaf%2F0.1%2FPerson
>>>> and type a few letters in the dct:subject field.
>>>>
>>>> I don't need the full original literal value, because the URI results of
>>>> the query are labelled in the application: a foaf:Person is labelled by
>>>> given and family names, etc.
>>>>
>>>> BUT, there is a "but", the dbPedia lookup service are apropriately
>>>> ordered
>>>> by "notoriety".
>>>> Instead, I currently get with http://localhost:9000/lookup?q=*Pari*
>>>>
>>>> on my TDB that mirrors dbPedia.
>>>>
>>>> <ArrayOfResult>
>>>>          <Result>
>>>>            <Label>Université Pierre-et-Marie-Curie</Label>
>>>>            <URI>http://dbpedia.org/resource/Pierre_and_Marie_Curie_
>>>> University
>>>> </URI>
>>>>          </Result><Result>
>>>>            <Label>Guillaume Le Gentil</Label>
>>>>            <URI>http://dbpedia.org/resource/Guillaume_Le_Gentil</URI>
>>>>          </Result><Result>
>>>>            <Label>1 E1 m</Label>
>>>>            <URI>http://dbpedia.org/resource/1_decametre</URI>
>>>>          </Result><Result>
>>>>            <Label>1 E4 m</Label>
>>>>            <URI>http://dbpedia.org/resource/1_myriametre</URI>
>>>>          </Result><Result>
>>>>            <Label>Nadia Boulanger</Label>
>>>>            <URI>http://dbpedia.org/resource/Nadia_Boulanger</URI>
>>>>          </Result><Result>
>>>>            <Label>Luis Mariano</Label>
>>>>            <URI>http://dbpedia.org/resource/Luis_Mariano</URI>
>>>>          </Result><Result>
>>>>            <Label>Paul Chemetov</Label>
>>>>            <URI>http://dbpedia.org/resource/Paul_Chemetov</URI>
>>>>          </Result><Result>
>>>>            <Label>Marc Boegner</Label>
>>>>            <URI>http://dbpedia.org/resource/Marc_Boegner</URI>
>>>>          </Result><Result>
>>>>            <Label>Cassandre (graphiste)</Label>
>>>>            <URI>http://dbpedia.org/resource/Cassandre_(artist)</URI>
>>>>          </Result><Result>
>>>>            <Label>La Norville</Label>
>>>>            <URI>http://dbpedia.org/resource/La_Norville</URI>
>>>>          </Result>
>>>>      </ArrayOfResult>
>>>>
>>>> My understanding is that I need to set a weight on URI's in Lucene to
>>>> reflect their "notoriety".
>>>> I see 2 ways:
>>>>
>>>>     1. easy to implement: just count the triples from and to the URI
>>>>     2. also take in account the the URI's consulted by user in my
>>>>
>>>>     application (but currently I don't record that information); there
>>>> is
>>>>     also the issue of combining weights 1) and 2)
>>>>
>>>> Google search does both weightings.
>>>>
>>>> So, in the short term I have to figure out how to add weights to the
>>>> Lucene
>>>> - Jena index.
>>>>
>>>> Then I have to read what dbPedia lookup does, and other background
>>>> material.
>>>>
>>>>
>>>>
>>>> 2016-10-31 16:42 GMT+01:00 Osma Suominen <os...@helsinki.fi>:
>>>>
>>>> Hi Jean-Marc,
>>>>
>>>>>
>>>>> Depending on what exactly you want from such a service, this may be
>>>>> already possible with jena-text.
>>>>>
>>>>> I'm assuming that you want to perform a prefix search such as "édu*"
>>>>> and
>>>>> get possible completions for that, such as "éducation".
>>>>>
>>>>> You can of course already do a prefix search with jena-text. What you
>>>>> will
>>>>> get back will be the RDF resources which have labels that contain this
>>>>> prefix. If the text index is configured to store literal values, you
>>>>> can
>>>>> ask for the actual values as well.
>>>>>
>>>>> E.g. with this data:
>>>>>
>>>>> ex:cse rdfs:label "Conseil supérieur de l'éducation"@fr .
>>>>>
>>>>> and a suitably configured jena-text index, you can perform this query:
>>>>>
>>>>> (?s ?score ?literal) text:query (rdfs:label "édu*") .
>>>>>
>>>>> and get back these bindings:
>>>>>
>>>>> ?s=ex:cse ?literal="Conseil supérieur de l'éducation"@fr
>>>>>
>>>>> However, you will get the full original literal value, not just the
>>>>> individual word that matched ("éducation"). If you want just the
>>>>> matched
>>>>> word, you will need special support that jena-text doesn't currently
>>>>> have.
>>>>>
>>>>> -Osma
>>>>>
>>>>> On 17/10/16 11:37, Jean-Marc Vanel wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>>>
>>>>>> I'm implementing an equivalent of dbPedia lookup service [1] in
>>>>>> semantic_forms, leveraging on Lucene integration in TDB, and dbPedia
>>>>>> mirror
>>>>>> with TDB [2] .
>>>>>>
>>>>>> The dbPedia lookup service is really nice but:
>>>>>>
>>>>>>      - the hosted service is often down
>>>>>>      - completion is in english only
>>>>>>
>>>>>> A lookup service with TDB and Lucene would overcome these 2 problems.
>>>>>>
>>>>>> So I would need completion with Lucene from SPARQL.
>>>>>> According to Jena doc., this does not seems to be implemented:
>>>>>> https://jena.apache.org/documentation/query/text-query.html#
>>>>>> query-with-sparql
>>>>>>
>>>>>> There are plenty of pages when searching for
>>>>>> lucene completion
>>>>>>
>>>>>>   From these pages there is a code snippet here
>>>>>> http://stackoverflow.com/questions/120180/how-to-do-query-
>>>>>> auto-completion-suggestions-in-lucene
>>>>>> but a regular Lucene API may exist.
>>>>>>
>>>>>> [1] https://github.com/dbpedia/lookup
>>>>>> [2]
>>>>>> https://github.com/jmvanel/semantic_forms/blob/master/doc/
>>>>>> en/administration.md#populating-with-dbpedia-mirroring-dbpedia
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>> Osma Suominen
>>>>> D.Sc. (Tech), Information Systems Specialist
>>>>> National Library of Finland
>>>>> P.O. Box 26 (Kaikukatu 4)
>>>>> 00014 HELSINGIN YLIOPISTO
>>>>> Tel. +358 50 3199529
>>>>> osma.suominen@helsinki.fi
>>>>> http://www.nationallibrary.fi
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>> --
>>> Osma Suominen
>>> D.Sc. (Tech), Information Systems Specialist
>>> National Library of Finland
>>> P.O. Box 26 (Kaikukatu 4)
>>> 00014 HELSINGIN YLIOPISTO
>>> Tel. +358 50 3199529
>>> osma.suominen@helsinki.fi
>>> http://www.nationallibrary.fi
>>>
>>>
>>
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
Jean-Marc Vanel
Profil:
http://163.172.179.125:9111/display?displayuri=http%3A%2F%2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: completion with Lucene: desirable from SPARQL

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Jean-Marc,

I'm not sure I understand why you need to put the weights inside the 
Lucene index. Is it done for performance reasons?

What if the data changes? I mean, not the indexed subject itself, but 
for example additional triples get added to the dataset using the same 
subject. Surely the Lucene index will get out of date?

-Osma

03.11.2016, 13:51, Jean-Marc Vanel kirjoitti:
> Hi Osma
>
> First I will implement the weight by counting the triples from and to each
> URI being indexed in Lucene by Jena-text.
> This will give users a first ordering in results, hopefully satisfying.
> This is quite similar to the Google page rank, except that instead of
> counting the <a href="XXX"> , it will count the triples.
>
> I sketched some code here with most of the plumbing:
> https://github.com/jmvanel/semantic_forms/blob/master/scala/forms/src/main/scala/deductions/runtime/jena/lucene/TextIndexerWeight.scala
>
> Comments welcome. It's in Scala, but it should be understandable.
> Note that I have one more library dependency :
> libraryDependencies += "org.apache.lucene" % "lucene-suggest" % "4.9.1"
>
> This is code for batch primary indexing or re-indexing.
> If this works well, I'll have to implement also the callback for updates
> like class TextDocProducerTriples in Jena-text.
>
>
>
> 2016-11-01 13:59 GMT+01:00 Osma Suominen <os...@helsinki.fi>:
>
>> Hi Jean-Marc,
>>
>> The wildcard queries etc. are basic Lucene features, part of Lucene query
>> syntax, so probably that's why they not documented on the jena-text page.
>> The query string is simply passed to the Lucene query parser by jena-text
>> and should support any features of Lucene, see:
>> http://lucene.apache.org/core/6_2_1/queryparser/org/apache/l
>> ucene/queryparser/classic/package-summary.html#package.description
>>
>> Glad you were able to get your lookup service working!
>>
>> Regarding the saving of weights: I think you could simply save them as
>> triples (perhaps in a separate graph), outside the Lucene index. Then
>> combine the results of the text:query with the weights from triples using
>> SPARQL.
>>
>> The jena-text query also returns score values. I'm not sure how useful
>> they are in your use case, but they could potentially be used as a factor
>> in the overall "notoriety" calculation. Though if you are searching just
>> for single words or prefixes, chances are that the score values will be the
>> same for all results.
>>
>> Thanks for all the work on the Lucene 5 and 6 upgrade (JENA-1250)! I hope
>> we can finish that work and get it merged soon after the 3.1.1 release. In
>> any case the newer Lucene version should perform better and be easier to
>> maintain moving forward.
>>
>> -Osma
>>
>> On 01/11/16 11:01, Jean-Marc Vanel wrote:
>>
>>> I's too bad that the * joker feature, and other details of the SPARQL to
>>> Lucene query translation, are not documented on the Jena text search page.
>>>
>>> Anyway, it works for my use case, I now have on my laptop a (kind of)
>>> replacement of dbPedia lookup service.
>>>
>>> To experiment with the original dbPedia lookup service, you can go to
>>> semantic_forms sandbox:
>>> http://163.172.179.125:9111/create?uri=&uri=http%3A%2F%2Fxml
>>> ns.com%2Ffoaf%2F0.1%2FPerson
>>> and type a few letters in the dct:subject field.
>>>
>>> I don't need the full original literal value, because the URI results of
>>> the query are labelled in the application: a foaf:Person is labelled by
>>> given and family names, etc.
>>>
>>> BUT, there is a "but", the dbPedia lookup service are apropriately ordered
>>> by "notoriety".
>>> Instead, I currently get with http://localhost:9000/lookup?q=*Pari*
>>>
>>> on my TDB that mirrors dbPedia.
>>>
>>> <ArrayOfResult>
>>>          <Result>
>>>            <Label>Universit� Pierre-et-Marie-Curie</Label>
>>>            <URI>http://dbpedia.org/resource/Pierre_and_Marie_Curie_
>>> University
>>> </URI>
>>>          </Result><Result>
>>>            <Label>Guillaume Le Gentil</Label>
>>>            <URI>http://dbpedia.org/resource/Guillaume_Le_Gentil</URI>
>>>          </Result><Result>
>>>            <Label>1 E1 m</Label>
>>>            <URI>http://dbpedia.org/resource/1_decametre</URI>
>>>          </Result><Result>
>>>            <Label>1 E4 m</Label>
>>>            <URI>http://dbpedia.org/resource/1_myriametre</URI>
>>>          </Result><Result>
>>>            <Label>Nadia Boulanger</Label>
>>>            <URI>http://dbpedia.org/resource/Nadia_Boulanger</URI>
>>>          </Result><Result>
>>>            <Label>Luis Mariano</Label>
>>>            <URI>http://dbpedia.org/resource/Luis_Mariano</URI>
>>>          </Result><Result>
>>>            <Label>Paul Chemetov</Label>
>>>            <URI>http://dbpedia.org/resource/Paul_Chemetov</URI>
>>>          </Result><Result>
>>>            <Label>Marc Boegner</Label>
>>>            <URI>http://dbpedia.org/resource/Marc_Boegner</URI>
>>>          </Result><Result>
>>>            <Label>Cassandre (graphiste)</Label>
>>>            <URI>http://dbpedia.org/resource/Cassandre_(artist)</URI>
>>>          </Result><Result>
>>>            <Label>La Norville</Label>
>>>            <URI>http://dbpedia.org/resource/La_Norville</URI>
>>>          </Result>
>>>      </ArrayOfResult>
>>>
>>> My understanding is that I need to set a weight on URI's in Lucene to
>>> reflect their "notoriety".
>>> I see 2 ways:
>>>
>>>     1. easy to implement: just count the triples from and to the URI
>>>     2. also take in account the the URI's consulted by user in my
>>>
>>>     application (but currently I don't record that information); there is
>>>     also the issue of combining weights 1) and 2)
>>>
>>> Google search does both weightings.
>>>
>>> So, in the short term I have to figure out how to add weights to the
>>> Lucene
>>> - Jena index.
>>>
>>> Then I have to read what dbPedia lookup does, and other background
>>> material.
>>>
>>>
>>>
>>> 2016-10-31 16:42 GMT+01:00 Osma Suominen <os...@helsinki.fi>:
>>>
>>> Hi Jean-Marc,
>>>>
>>>> Depending on what exactly you want from such a service, this may be
>>>> already possible with jena-text.
>>>>
>>>> I'm assuming that you want to perform a prefix search such as "�du*" and
>>>> get possible completions for that, such as "�ducation".
>>>>
>>>> You can of course already do a prefix search with jena-text. What you
>>>> will
>>>> get back will be the RDF resources which have labels that contain this
>>>> prefix. If the text index is configured to store literal values, you can
>>>> ask for the actual values as well.
>>>>
>>>> E.g. with this data:
>>>>
>>>> ex:cse rdfs:label "Conseil sup�rieur de l'�ducation"@fr .
>>>>
>>>> and a suitably configured jena-text index, you can perform this query:
>>>>
>>>> (?s ?score ?literal) text:query (rdfs:label "�du*") .
>>>>
>>>> and get back these bindings:
>>>>
>>>> ?s=ex:cse ?literal="Conseil sup�rieur de l'�ducation"@fr
>>>>
>>>> However, you will get the full original literal value, not just the
>>>> individual word that matched ("�ducation"). If you want just the matched
>>>> word, you will need special support that jena-text doesn't currently
>>>> have.
>>>>
>>>> -Osma
>>>>
>>>> On 17/10/16 11:37, Jean-Marc Vanel wrote:
>>>>
>>>> Hi
>>>>>
>>>>> I'm implementing an equivalent of dbPedia lookup service [1] in
>>>>> semantic_forms, leveraging on Lucene integration in TDB, and dbPedia
>>>>> mirror
>>>>> with TDB [2] .
>>>>>
>>>>> The dbPedia lookup service is really nice but:
>>>>>
>>>>>      - the hosted service is often down
>>>>>      - completion is in english only
>>>>>
>>>>> A lookup service with TDB and Lucene would overcome these 2 problems.
>>>>>
>>>>> So I would need completion with Lucene from SPARQL.
>>>>> According to Jena doc., this does not seems to be implemented:
>>>>> https://jena.apache.org/documentation/query/text-query.html#
>>>>> query-with-sparql
>>>>>
>>>>> There are plenty of pages when searching for
>>>>> lucene completion
>>>>>
>>>>>   From these pages there is a code snippet here
>>>>> http://stackoverflow.com/questions/120180/how-to-do-query-
>>>>> auto-completion-suggestions-in-lucene
>>>>> but a regular Lucene API may exist.
>>>>>
>>>>> [1] https://github.com/dbpedia/lookup
>>>>> [2]
>>>>> https://github.com/jmvanel/semantic_forms/blob/master/doc/
>>>>> en/administration.md#populating-with-dbpedia-mirroring-dbpedia
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suominen@helsinki.fi
>>>> http://www.nationallibrary.fi
>>>>
>>>>
>>>
>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: completion with Lucene: desirable from SPARQL

Posted by Jean-Marc Vanel <je...@gmail.com>.
Hi Osma

First I will implement the weight by counting the triples from and to each
URI being indexed in Lucene by Jena-text.
This will give users a first ordering in results, hopefully satisfying.
This is quite similar to the Google page rank, except that instead of
counting the <a href="XXX"> , it will count the triples.

I sketched some code here with most of the plumbing:
https://github.com/jmvanel/semantic_forms/blob/master/scala/forms/src/main/scala/deductions/runtime/jena/lucene/TextIndexerWeight.scala

Comments welcome. It's in Scala, but it should be understandable.
Note that I have one more library dependency :
libraryDependencies += "org.apache.lucene" % "lucene-suggest" % "4.9.1"

This is code for batch primary indexing or re-indexing.
If this works well, I'll have to implement also the callback for updates
like class TextDocProducerTriples in Jena-text.



2016-11-01 13:59 GMT+01:00 Osma Suominen <os...@helsinki.fi>:

> Hi Jean-Marc,
>
> The wildcard queries etc. are basic Lucene features, part of Lucene query
> syntax, so probably that's why they not documented on the jena-text page.
> The query string is simply passed to the Lucene query parser by jena-text
> and should support any features of Lucene, see:
> http://lucene.apache.org/core/6_2_1/queryparser/org/apache/l
> ucene/queryparser/classic/package-summary.html#package.description
>
> Glad you were able to get your lookup service working!
>
> Regarding the saving of weights: I think you could simply save them as
> triples (perhaps in a separate graph), outside the Lucene index. Then
> combine the results of the text:query with the weights from triples using
> SPARQL.
>
> The jena-text query also returns score values. I'm not sure how useful
> they are in your use case, but they could potentially be used as a factor
> in the overall "notoriety" calculation. Though if you are searching just
> for single words or prefixes, chances are that the score values will be the
> same for all results.
>
> Thanks for all the work on the Lucene 5 and 6 upgrade (JENA-1250)! I hope
> we can finish that work and get it merged soon after the 3.1.1 release. In
> any case the newer Lucene version should perform better and be easier to
> maintain moving forward.
>
> -Osma
>
> On 01/11/16 11:01, Jean-Marc Vanel wrote:
>
>> I's too bad that the * joker feature, and other details of the SPARQL to
>> Lucene query translation, are not documented on the Jena text search page.
>>
>> Anyway, it works for my use case, I now have on my laptop a (kind of)
>> replacement of dbPedia lookup service.
>>
>> To experiment with the original dbPedia lookup service, you can go to
>> semantic_forms sandbox:
>> http://163.172.179.125:9111/create?uri=&uri=http%3A%2F%2Fxml
>> ns.com%2Ffoaf%2F0.1%2FPerson
>> and type a few letters in the dct:subject field.
>>
>> I don't need the full original literal value, because the URI results of
>> the query are labelled in the application: a foaf:Person is labelled by
>> given and family names, etc.
>>
>> BUT, there is a "but", the dbPedia lookup service are apropriately ordered
>> by "notoriety".
>> Instead, I currently get with http://localhost:9000/lookup?q=*Pari*
>>
>> on my TDB that mirrors dbPedia.
>>
>> <ArrayOfResult>
>>          <Result>
>>            <Label>Université Pierre-et-Marie-Curie</Label>
>>            <URI>http://dbpedia.org/resource/Pierre_and_Marie_Curie_
>> University
>> </URI>
>>          </Result><Result>
>>            <Label>Guillaume Le Gentil</Label>
>>            <URI>http://dbpedia.org/resource/Guillaume_Le_Gentil</URI>
>>          </Result><Result>
>>            <Label>1 E1 m</Label>
>>            <URI>http://dbpedia.org/resource/1_decametre</URI>
>>          </Result><Result>
>>            <Label>1 E4 m</Label>
>>            <URI>http://dbpedia.org/resource/1_myriametre</URI>
>>          </Result><Result>
>>            <Label>Nadia Boulanger</Label>
>>            <URI>http://dbpedia.org/resource/Nadia_Boulanger</URI>
>>          </Result><Result>
>>            <Label>Luis Mariano</Label>
>>            <URI>http://dbpedia.org/resource/Luis_Mariano</URI>
>>          </Result><Result>
>>            <Label>Paul Chemetov</Label>
>>            <URI>http://dbpedia.org/resource/Paul_Chemetov</URI>
>>          </Result><Result>
>>            <Label>Marc Boegner</Label>
>>            <URI>http://dbpedia.org/resource/Marc_Boegner</URI>
>>          </Result><Result>
>>            <Label>Cassandre (graphiste)</Label>
>>            <URI>http://dbpedia.org/resource/Cassandre_(artist)</URI>
>>          </Result><Result>
>>            <Label>La Norville</Label>
>>            <URI>http://dbpedia.org/resource/La_Norville</URI>
>>          </Result>
>>      </ArrayOfResult>
>>
>> My understanding is that I need to set a weight on URI's in Lucene to
>> reflect their "notoriety".
>> I see 2 ways:
>>
>>     1. easy to implement: just count the triples from and to the URI
>>     2. also take in account the the URI's consulted by user in my
>>
>>     application (but currently I don't record that information); there is
>>     also the issue of combining weights 1) and 2)
>>
>> Google search does both weightings.
>>
>> So, in the short term I have to figure out how to add weights to the
>> Lucene
>> - Jena index.
>>
>> Then I have to read what dbPedia lookup does, and other background
>> material.
>>
>>
>>
>> 2016-10-31 16:42 GMT+01:00 Osma Suominen <os...@helsinki.fi>:
>>
>> Hi Jean-Marc,
>>>
>>> Depending on what exactly you want from such a service, this may be
>>> already possible with jena-text.
>>>
>>> I'm assuming that you want to perform a prefix search such as "édu*" and
>>> get possible completions for that, such as "éducation".
>>>
>>> You can of course already do a prefix search with jena-text. What you
>>> will
>>> get back will be the RDF resources which have labels that contain this
>>> prefix. If the text index is configured to store literal values, you can
>>> ask for the actual values as well.
>>>
>>> E.g. with this data:
>>>
>>> ex:cse rdfs:label "Conseil supérieur de l'éducation"@fr .
>>>
>>> and a suitably configured jena-text index, you can perform this query:
>>>
>>> (?s ?score ?literal) text:query (rdfs:label "édu*") .
>>>
>>> and get back these bindings:
>>>
>>> ?s=ex:cse ?literal="Conseil supérieur de l'éducation"@fr
>>>
>>> However, you will get the full original literal value, not just the
>>> individual word that matched ("éducation"). If you want just the matched
>>> word, you will need special support that jena-text doesn't currently
>>> have.
>>>
>>> -Osma
>>>
>>> On 17/10/16 11:37, Jean-Marc Vanel wrote:
>>>
>>> Hi
>>>>
>>>> I'm implementing an equivalent of dbPedia lookup service [1] in
>>>> semantic_forms, leveraging on Lucene integration in TDB, and dbPedia
>>>> mirror
>>>> with TDB [2] .
>>>>
>>>> The dbPedia lookup service is really nice but:
>>>>
>>>>      - the hosted service is often down
>>>>      - completion is in english only
>>>>
>>>> A lookup service with TDB and Lucene would overcome these 2 problems.
>>>>
>>>> So I would need completion with Lucene from SPARQL.
>>>> According to Jena doc., this does not seems to be implemented:
>>>> https://jena.apache.org/documentation/query/text-query.html#
>>>> query-with-sparql
>>>>
>>>> There are plenty of pages when searching for
>>>> lucene completion
>>>>
>>>>   From these pages there is a code snippet here
>>>> http://stackoverflow.com/questions/120180/how-to-do-query-
>>>> auto-completion-suggestions-in-lucene
>>>> but a regular Lucene API may exist.
>>>>
>>>> [1] https://github.com/dbpedia/lookup
>>>> [2]
>>>> https://github.com/jmvanel/semantic_forms/blob/master/doc/
>>>> en/administration.md#populating-with-dbpedia-mirroring-dbpedia
>>>>
>>>>
>>>>
>>> --
>>> Osma Suominen
>>> D.Sc. (Tech), Information Systems Specialist
>>> National Library of Finland
>>> P.O. Box 26 (Kaikukatu 4)
>>> 00014 HELSINGIN YLIOPISTO
>>> Tel. +358 50 3199529
>>> osma.suominen@helsinki.fi
>>> http://www.nationallibrary.fi
>>>
>>>
>>
>>
>>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi
>



-- 
Jean-Marc Vanel
Profil:
http://163.172.179.125:9111/display?displayuri=http%3A%2F%2Fjmvanel.free.fr%2Fjmv.rdf%23me
Déductions SARL - Consulting, services, training,
Rule-based programming, Semantic Web
+33 (0)6 89 16 29 52
Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui

Re: completion with Lucene: desirable from SPARQL

Posted by Osma Suominen <os...@helsinki.fi>.
Hi Jean-Marc,

The wildcard queries etc. are basic Lucene features, part of Lucene 
query syntax, so probably that's why they not documented on the 
jena-text page. The query string is simply passed to the Lucene query 
parser by jena-text and should support any features of Lucene, see: 
http://lucene.apache.org/core/6_2_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package.description

Glad you were able to get your lookup service working!

Regarding the saving of weights: I think you could simply save them as 
triples (perhaps in a separate graph), outside the Lucene index. Then 
combine the results of the text:query with the weights from triples 
using SPARQL.

The jena-text query also returns score values. I'm not sure how useful 
they are in your use case, but they could potentially be used as a 
factor in the overall "notoriety" calculation. Though if you are 
searching just for single words or prefixes, chances are that the score 
values will be the same for all results.

Thanks for all the work on the Lucene 5 and 6 upgrade (JENA-1250)! I 
hope we can finish that work and get it merged soon after the 3.1.1 
release. In any case the newer Lucene version should perform better and 
be easier to maintain moving forward.

-Osma

On 01/11/16 11:01, Jean-Marc Vanel wrote:
> I's too bad that the * joker feature, and other details of the SPARQL to
> Lucene query translation, are not documented on the Jena text search page.
>
> Anyway, it works for my use case, I now have on my laptop a (kind of)
> replacement of dbPedia lookup service.
>
> To experiment with the original dbPedia lookup service, you can go to
> semantic_forms sandbox:
> http://163.172.179.125:9111/create?uri=&uri=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson
> and type a few letters in the dct:subject field.
>
> I don't need the full original literal value, because the URI results of
> the query are labelled in the application: a foaf:Person is labelled by
> given and family names, etc.
>
> BUT, there is a "but", the dbPedia lookup service are apropriately ordered
> by "notoriety".
> Instead, I currently get with http://localhost:9000/lookup?q=*Pari*
> on my TDB that mirrors dbPedia.
>
> <ArrayOfResult>
>          <Result>
>            <Label>Universit� Pierre-et-Marie-Curie</Label>
>            <URI>http://dbpedia.org/resource/Pierre_and_Marie_Curie_University
> </URI>
>          </Result><Result>
>            <Label>Guillaume Le Gentil</Label>
>            <URI>http://dbpedia.org/resource/Guillaume_Le_Gentil</URI>
>          </Result><Result>
>            <Label>1 E1 m</Label>
>            <URI>http://dbpedia.org/resource/1_decametre</URI>
>          </Result><Result>
>            <Label>1 E4 m</Label>
>            <URI>http://dbpedia.org/resource/1_myriametre</URI>
>          </Result><Result>
>            <Label>Nadia Boulanger</Label>
>            <URI>http://dbpedia.org/resource/Nadia_Boulanger</URI>
>          </Result><Result>
>            <Label>Luis Mariano</Label>
>            <URI>http://dbpedia.org/resource/Luis_Mariano</URI>
>          </Result><Result>
>            <Label>Paul Chemetov</Label>
>            <URI>http://dbpedia.org/resource/Paul_Chemetov</URI>
>          </Result><Result>
>            <Label>Marc Boegner</Label>
>            <URI>http://dbpedia.org/resource/Marc_Boegner</URI>
>          </Result><Result>
>            <Label>Cassandre (graphiste)</Label>
>            <URI>http://dbpedia.org/resource/Cassandre_(artist)</URI>
>          </Result><Result>
>            <Label>La Norville</Label>
>            <URI>http://dbpedia.org/resource/La_Norville</URI>
>          </Result>
>      </ArrayOfResult>
>
> My understanding is that I need to set a weight on URI's in Lucene to
> reflect their "notoriety".
> I see 2 ways:
>
>     1. easy to implement: just count the triples from and to the URI
>     2. also take in account the the URI's consulted by user in my
>     application (but currently I don't record that information); there is
>     also the issue of combining weights 1) and 2)
>
> Google search does both weightings.
>
> So, in the short term I have to figure out how to add weights to the Lucene
> - Jena index.
>
> Then I have to read what dbPedia lookup does, and other background material.
>
>
>
> 2016-10-31 16:42 GMT+01:00 Osma Suominen <os...@helsinki.fi>:
>
>> Hi Jean-Marc,
>>
>> Depending on what exactly you want from such a service, this may be
>> already possible with jena-text.
>>
>> I'm assuming that you want to perform a prefix search such as "�du*" and
>> get possible completions for that, such as "�ducation".
>>
>> You can of course already do a prefix search with jena-text. What you will
>> get back will be the RDF resources which have labels that contain this
>> prefix. If the text index is configured to store literal values, you can
>> ask for the actual values as well.
>>
>> E.g. with this data:
>>
>> ex:cse rdfs:label "Conseil sup�rieur de l'�ducation"@fr .
>>
>> and a suitably configured jena-text index, you can perform this query:
>>
>> (?s ?score ?literal) text:query (rdfs:label "�du*") .
>>
>> and get back these bindings:
>>
>> ?s=ex:cse ?literal="Conseil sup�rieur de l'�ducation"@fr
>>
>> However, you will get the full original literal value, not just the
>> individual word that matched ("�ducation"). If you want just the matched
>> word, you will need special support that jena-text doesn't currently have.
>>
>> -Osma
>>
>> On 17/10/16 11:37, Jean-Marc Vanel wrote:
>>
>>> Hi
>>>
>>> I'm implementing an equivalent of dbPedia lookup service [1] in
>>> semantic_forms, leveraging on Lucene integration in TDB, and dbPedia
>>> mirror
>>> with TDB [2] .
>>>
>>> The dbPedia lookup service is really nice but:
>>>
>>>      - the hosted service is often down
>>>      - completion is in english only
>>>
>>> A lookup service with TDB and Lucene would overcome these 2 problems.
>>>
>>> So I would need completion with Lucene from SPARQL.
>>> According to Jena doc., this does not seems to be implemented:
>>> https://jena.apache.org/documentation/query/text-query.html#
>>> query-with-sparql
>>>
>>> There are plenty of pages when searching for
>>> lucene completion
>>>
>>>   From these pages there is a code snippet here
>>> http://stackoverflow.com/questions/120180/how-to-do-query-
>>> auto-completion-suggestions-in-lucene
>>> but a regular Lucene API may exist.
>>>
>>> [1] https://github.com/dbpedia/lookup
>>> [2]
>>> https://github.com/jmvanel/semantic_forms/blob/master/doc/
>>> en/administration.md#populating-with-dbpedia-mirroring-dbpedia
>>>
>>>
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suominen@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi