You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2017/11/22 13:44:16 UTC
Similar results with full text search
Are there any plans on implementing similar text search for Jena?
Until similarity is implemented, is it possible to query similar texts
using Lucene directly, bypassing Jena, but with the same data set?
Br,
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Similar results with full text search
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Thanks for the tips! I'll check what ES can do.
Br,
Mikael
On 23.11.2017 13:36, Osma Suominen wrote:
> Hi Mikael!
>
> Not sure how jena-text could help here if the documents are in another
> index. But maybe you could look at using the Elasticsearch backend of
> jena-text. It stores the index in ES, so it can also be queried
> outside Jena. If you had the documents + jena-text indexed metadata in
> ES, you could use ES facilities for similarity search and still do
> some things in SPARQL.
>
> -Osma
>
>
> Mikael Pesonen kirjoitti 23.11.2017 klo 12:59:
>>
>> Hi Osma!
>>
>> we have a set of documents and their metadata. Metadata is stored in
>> Jena and texts in separate database (RDF id, content).
>>
>> First case would be able to search documents and list their metadata
>> based on document content using SPARQL. I'm not sure if even this is
>> possible.
>>
>> Second, similar search would return id's of similar documents of a
>> document based on metadata and content.
>>
>>
>> We have already set this up as a separate Lucene installation. First
>> querying documents from Lucene index, then filtering the result sets
>> with additional meta fields using Jena. This setup is quite
>> complicated so was hoping a tighter integration to Jena would make
>> things easier.
>>
>> Br,
>> Mikael
>>
>>
>> On 22.11.2017 22:40, Osma Suominen wrote:
>>> Hi Mikael!
>>>
>>> Sorry, I probably misunderstood - I somehow read "similar" as
>>> meaning "fuzzy" but they are of course not the same thing. So if you
>>> mean "give me documents similar to document X", that's called
>>> MoreLikeThis in Lucene, and it's currently not supported by
>>> jena-text. What's your use case? How would you like to use it if it
>>> existed?
>>>
>>> -Osma
>>>
>>> Osma Suominen kirjoitti 22.11.2017 klo 22:37:
>>>> Hi Mikael!
>>>>
>>>> Fuzzy search is a basic Lucene feature, just like prefix searches.
>>>> You should be able to use it directly via jena-text using a query like
>>>> ?s text:query "word~"
>>>> or
>>>> ?s text:query "word~1"
>>>>
>>>> There is AFAICT nothing to implement on the jena-text side as this
>>>> already works right now.
>>>>
>>>> -Osma
>>>>
>>>> Mikael Pesonen kirjoitti 22.11.2017 klo 15:44:
>>>>>
>>>>> Are there any plans on implementing similar text search for Jena?
>>>>>
>>>>> Until similarity is implemented, is it possible to query similar
>>>>> texts using Lucene directly, bypassing Jena, but with the same
>>>>> data set?
>>>>>
>>>>> Br,
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Similar results with full text search
Posted by Osma Suominen <os...@helsinki.fi>.
Hi Mikael!
Not sure how jena-text could help here if the documents are in another
index. But maybe you could look at using the Elasticsearch backend of
jena-text. It stores the index in ES, so it can also be queried outside
Jena. If you had the documents + jena-text indexed metadata in ES, you
could use ES facilities for similarity search and still do some things
in SPARQL.
-Osma
Mikael Pesonen kirjoitti 23.11.2017 klo 12:59:
>
> Hi Osma!
>
> we have a set of documents and their metadata. Metadata is stored in
> Jena and texts in separate database (RDF id, content).
>
> First case would be able to search documents and list their metadata
> based on document content using SPARQL. I'm not sure if even this is
> possible.
>
> Second, similar search would return id's of similar documents of a
> document based on metadata and content.
>
>
> We have already set this up as a separate Lucene installation. First
> querying documents from Lucene index, then filtering the result sets
> with additional meta fields using Jena. This setup is quite complicated
> so was hoping a tighter integration to Jena would make things easier.
>
> Br,
> Mikael
>
>
> On 22.11.2017 22:40, Osma Suominen wrote:
>> Hi Mikael!
>>
>> Sorry, I probably misunderstood - I somehow read "similar" as meaning
>> "fuzzy" but they are of course not the same thing. So if you mean
>> "give me documents similar to document X", that's called MoreLikeThis
>> in Lucene, and it's currently not supported by jena-text. What's your
>> use case? How would you like to use it if it existed?
>>
>> -Osma
>>
>> Osma Suominen kirjoitti 22.11.2017 klo 22:37:
>>> Hi Mikael!
>>>
>>> Fuzzy search is a basic Lucene feature, just like prefix searches.
>>> You should be able to use it directly via jena-text using a query like
>>> ?s text:query "word~"
>>> or
>>> ?s text:query "word~1"
>>>
>>> There is AFAICT nothing to implement on the jena-text side as this
>>> already works right now.
>>>
>>> -Osma
>>>
>>> Mikael Pesonen kirjoitti 22.11.2017 klo 15:44:
>>>>
>>>> Are there any plans on implementing similar text search for Jena?
>>>>
>>>> Until similarity is implemented, is it possible to query similar
>>>> texts using Lucene directly, bypassing Jena, but with the same data
>>>> set?
>>>>
>>>> Br,
>>>>
>>>
>>>
>>
>>
>
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi
Re: Similar results with full text search
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Hi Osma!
we have a set of documents and their metadata. Metadata is stored in
Jena and texts in separate database (RDF id, content).
First case would be able to search documents and list their metadata
based on document content using SPARQL. I'm not sure if even this is
possible.
Second, similar search would return id's of similar documents of a
document based on metadata and content.
We have already set this up as a separate Lucene installation. First
querying documents from Lucene index, then filtering the result sets
with additional meta fields using Jena. This setup is quite complicated
so was hoping a tighter integration to Jena would make things easier.
Br,
Mikael
On 22.11.2017 22:40, Osma Suominen wrote:
> Hi Mikael!
>
> Sorry, I probably misunderstood - I somehow read "similar" as meaning
> "fuzzy" but they are of course not the same thing. So if you mean
> "give me documents similar to document X", that's called MoreLikeThis
> in Lucene, and it's currently not supported by jena-text. What's your
> use case? How would you like to use it if it existed?
>
> -Osma
>
> Osma Suominen kirjoitti 22.11.2017 klo 22:37:
>> Hi Mikael!
>>
>> Fuzzy search is a basic Lucene feature, just like prefix searches.
>> You should be able to use it directly via jena-text using a query like
>> ?s text:query "word~"
>> or
>> ?s text:query "word~1"
>>
>> There is AFAICT nothing to implement on the jena-text side as this
>> already works right now.
>>
>> -Osma
>>
>> Mikael Pesonen kirjoitti 22.11.2017 klo 15:44:
>>>
>>> Are there any plans on implementing similar text search for Jena?
>>>
>>> Until similarity is implemented, is it possible to query similar
>>> texts using Lucene directly, bypassing Jena, but with the same data
>>> set?
>>>
>>> Br,
>>>
>>
>>
>
>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Similar results with full text search
Posted by Osma Suominen <os...@helsinki.fi>.
Hi Mikael!
Sorry, I probably misunderstood - I somehow read "similar" as meaning
"fuzzy" but they are of course not the same thing. So if you mean "give
me documents similar to document X", that's called MoreLikeThis in
Lucene, and it's currently not supported by jena-text. What's your use
case? How would you like to use it if it existed?
-Osma
Osma Suominen kirjoitti 22.11.2017 klo 22:37:
> Hi Mikael!
>
> Fuzzy search is a basic Lucene feature, just like prefix searches. You
> should be able to use it directly via jena-text using a query like
> ?s text:query "word~"
> or
> ?s text:query "word~1"
>
> There is AFAICT nothing to implement on the jena-text side as this
> already works right now.
>
> -Osma
>
> Mikael Pesonen kirjoitti 22.11.2017 klo 15:44:
>>
>> Are there any plans on implementing similar text search for Jena?
>>
>> Until similarity is implemented, is it possible to query similar texts
>> using Lucene directly, bypassing Jena, but with the same data set?
>>
>> Br,
>>
>
>
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi
Re: Similar results with full text search
Posted by Osma Suominen <os...@helsinki.fi>.
Hi Mikael!
Fuzzy search is a basic Lucene feature, just like prefix searches. You
should be able to use it directly via jena-text using a query like
?s text:query "word~"
or
?s text:query "word~1"
There is AFAICT nothing to implement on the jena-text side as this
already works right now.
-Osma
Mikael Pesonen kirjoitti 22.11.2017 klo 15:44:
>
> Are there any plans on implementing similar text search for Jena?
>
> Until similarity is implemented, is it possible to query similar texts
> using Lucene directly, bypassing Jena, but with the same data set?
>
> Br,
>
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi