You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2019/02/14 11:23:23 UTC

Using content with meta on text index

Hi,

Our system stores documents with separate rest API and document id's are 
stored, along with document metadata, to Jena db. we would like to make 
text queries that target both the document contents and meta data.

Is there a recommended/supported way to make this happen on Jena and Lucene?

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Re: Using content with meta on text index

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

One option of course would be to store all document content as triples 
to Jena, but that might lead to other trouble since Jena is not meant to 
be used that way.

On 26/02/2019 20:29, ajs6f wrote:
> I'm not sure there are any widely-known best practices for that pattern, but I defer to Osma and Chris.
>
> My limited understanding of Lucene implies that only one JVM at a time can lock an index, but the last time I looked at that question was years ago, so take that with a bucket of salt.
>
> ajs6f
>
>> On Feb 21, 2019, at 9:02 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
>>
>>
>> Reason I'm asking this is that now, with external doxument index, we can't do any paging which results very slow and heavy queries on document index. We have to read all results from external Lucene, because we need to apply filtering to result by metadata fields which are store in Jena.
>>
>> For example we have query like
>> content matches "language AND technology" & metadata matches dcterms:created > "2019-01-01"
>>
>>
>>
>> On 20/02/2019 10:46, Mikael Pesonen wrote:
>>> Not sure. Reading Jena text documentation, it states that external document contents can be added into Jena text index.
>>>
>>> Just not sure how this should be done in practice. How to handle concurrency, and how exactly add documents so that we could make sparql queries that target content and metadata same time, preferably with some weights.
>>>
>>> But it's fine for us to use single Lucene index for all data.
>>>
>>> Br
>>>
>>>
>>> On 19.2.2019 17.44, ajs6f wrote:
>>>> Are you asking how to use an extant Lucene index with your text documents in it for Jena's text index as well?
>>>>
>>>> ajs6f
>>>>
>>>>> On Feb 14, 2019, at 6:23 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Our system stores documents with separate rest API and document id's are stored, along with document metadata, to Jena db. we would like to make text queries that target both the document contents and meta data.
>>>>>
>>>>> Is there a recommended/supported way to make this happen on Jena and Lucene?
>>>>>
>>>>> -- 
>>>>> Lingsoft - 30 years of Leading Language Management
>>>>>
>>>>> www.lingsoft.fi
>>>>>
>>>>> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>>>>>
>>>>> Mikael Pesonen
>>>>> System Engineer
>>>>>
>>>>> e-mail: mikael.pesonen@lingsoft.fi
>>>>> Tel. +358 2 279 3300
>>>>>
>>>>> Time zone: GMT+2
>>>>>
>>>>> Helsinki Office
>>>>> Eteläranta 10
>>>>> FI-00130 Helsinki
>>>>> FINLAND
>>>>>
>>>>> Turku Office
>>>>> Kauppiaskatu 5 A
>>>>> FI-20100 Turku
>>>>> FINLAND
>>>>>
>> -- 
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Re: Using content with meta on text index

Posted by ajs6f <aj...@apache.org>.

I'm not sure there are any widely-known best practices for that pattern, but I defer to Osma and Chris. 

My limited understanding of Lucene implies that only one JVM at a time can lock an index, but the last time I looked at that question was years ago, so take that with a bucket of salt.

ajs6f

> On Feb 21, 2019, at 9:02 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
> 
> 
> Reason I'm asking this is that now, with external doxument index, we can't do any paging which results very slow and heavy queries on document index. We have to read all results from external Lucene, because we need to apply filtering to result by metadata fields which are store in Jena.
> 
> For example we have query like
> content matches "language AND technology" & metadata matches dcterms:created > "2019-01-01"
> 
> 
> 
> On 20/02/2019 10:46, Mikael Pesonen wrote:
>> 
>> Not sure. Reading Jena text documentation, it states that external document contents can be added into Jena text index.
>> 
>> Just not sure how this should be done in practice. How to handle concurrency, and how exactly add documents so that we could make sparql queries that target content and metadata same time, preferably with some weights.
>> 
>> But it's fine for us to use single Lucene index for all data.
>> 
>> Br
>> 
>> 
>> On 19.2.2019 17.44, ajs6f wrote:
>>> Are you asking how to use an extant Lucene index with your text documents in it for Jena's text index as well?
>>> 
>>> ajs6f
>>> 
>>>> On Feb 14, 2019, at 6:23 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> Our system stores documents with separate rest API and document id's are stored, along with document metadata, to Jena db. we would like to make text queries that target both the document contents and meta data.
>>>> 
>>>> Is there a recommended/supported way to make this happen on Jena and Lucene?
>>>> 
>>>> -- 
>>>> Lingsoft - 30 years of Leading Language Management
>>>> 
>>>> www.lingsoft.fi
>>>> 
>>>> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>>>> 
>>>> Mikael Pesonen
>>>> System Engineer
>>>> 
>>>> e-mail: mikael.pesonen@lingsoft.fi
>>>> Tel. +358 2 279 3300
>>>> 
>>>> Time zone: GMT+2
>>>> 
>>>> Helsinki Office
>>>> Eteläranta 10
>>>> FI-00130 Helsinki
>>>> FINLAND
>>>> 
>>>> Turku Office
>>>> Kauppiaskatu 5 A
>>>> FI-20100 Turku
>>>> FINLAND
>>>> 
> 
> -- 
> Lingsoft - 30 years of Leading Language Management
> 
> www.lingsoft.fi
> 
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
> 
> Mikael Pesonen
> System Engineer
> 
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
> 
> Time zone: GMT+2
> 
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
> 
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>

Re: Using content with meta on text index

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

Reason I'm asking this is that now, with external doxument index, we 
can't do any paging which results very slow and heavy queries on 
document index. We have to read all results from external Lucene, 
because we need to apply filtering to result by metadata fields which 
are store in Jena.

For example we have query like
content matches "language AND technology" & metadata matches 
dcterms:created > "2019-01-01"



On 20/02/2019 10:46, Mikael Pesonen wrote:
>
> Not sure. Reading Jena text documentation, it states that external 
> document contents can be added into Jena text index.
>
> Just not sure how this should be done in practice. How to handle 
> concurrency, and how exactly add documents so that we could make 
> sparql queries that target content and metadata same time, preferably 
> with some weights.
>
> But it's fine for us to use single Lucene index for all data.
>
> Br
>
>
> On 19.2.2019 17.44, ajs6f wrote:
>> Are you asking how to use an extant Lucene index with your text 
>> documents in it for Jena's text index as well?
>>
>> ajs6f
>>
>>> On Feb 14, 2019, at 6:23 AM, Mikael Pesonen 
>>> <mi...@lingsoft.fi> wrote:
>>>
>>>
>>> Hi,
>>>
>>> Our system stores documents with separate rest API and document id's 
>>> are stored, along with document metadata, to Jena db. we would like 
>>> to make text queries that target both the document contents and meta 
>>> data.
>>>
>>> Is there a recommended/supported way to make this happen on Jena and 
>>> Lucene?
>>>
>>> -- 
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi
>>>
>>> Speech Applications - Language Management - Translation - Reader's 
>>> and Writer's Tools - Text Tools - E-books and M-books
>>>
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Re: Using content with meta on text index

Posted by Mikael Pesonen <mi...@lingsoft.fi>.

Not sure. Reading Jena text documentation, it states that external 
document contents can be added into Jena text index.

Just not sure how this should be done in practice. How to handle 
concurrency, and how exactly add documents so that we could make sparql 
queries that target content and metadata same time, preferably with some 
weights.

But it's fine for us to use single Lucene index for all data.

Br


On 19.2.2019 17.44, ajs6f wrote:
> Are you asking how to use an extant Lucene index with your text documents in it for Jena's text index as well?
>
> ajs6f
>
>> On Feb 14, 2019, at 6:23 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
>>
>>
>> Hi,
>>
>> Our system stores documents with separate rest API and document id's are stored, along with document metadata, to Jena db. we would like to make text queries that target both the document contents and meta data.
>>
>> Is there a recommended/supported way to make this happen on Jena and Lucene?
>>
>> -- 
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>

Re: Using content with meta on text index

Posted by ajs6f <aj...@apache.org>.

Are you asking how to use an extant Lucene index with your text documents in it for Jena's text index as well?

ajs6f

> On Feb 14, 2019, at 6:23 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
> 
> 
> Hi,
> 
> Our system stores documents with separate rest API and document id's are stored, along with document metadata, to Jena db. we would like to make text queries that target both the document contents and meta data.
> 
> Is there a recommended/supported way to make this happen on Jena and Lucene?
> 
> -- 
> Lingsoft - 30 years of Leading Language Management
> 
> www.lingsoft.fi
> 
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
> 
> Mikael Pesonen
> System Engineer
> 
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
> 
> Time zone: GMT+2
> 
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
> 
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>