You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2019/05/02 10:56:39 UTC

Trouble with querying with language in Jena text

I'm using Jena 3.11, full server as jar, and have following text index 
config:

<#indexLucene> a text:TextIndexLucene ;
      text:directory <jena_text_index>  ;
      text:entityMap <#entMap> ;
      text:storeValues true ;
      text:analyzer [ a text:StandardAnalyzer ] ;
      text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
      text:queryParser text:AnalyzingQueryParser ;
      text:multilingualSupport true ;
   .

<#entMap> a text:EntityMap ;
      text:defaultField     "prefLabel" ;
      text:entityField      "uri" ;
      text:uidField         "uid" ;
      text:langField        "lang" ;
      text:graphField       "graph" ;
      text:map (
           [ text:field "prefLabel" ; text:predicate skos:prefLabel ]
           [ text:field "altLabel"  ; text:predicate skos:altLabel ]
           [ text:field "content"  ; text:predicate lsrm:content ]
           ) .


When inserting long text into lsrm:content, search usually works only 
without language. So, inserted

<https://example.com/someid> lsrm:content "long ... text ... here"@en

and querying like this works

(?s ?score ?content) text:query (lsrm:content "text" ) .

but this returns empty result

(?s ?score ?content) text:query (lsrm:content "text" "lang:en") .

But in some occasions language search does work in lsrm:content, can't 
see what is the cause here.

Any ideas?

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Trouble with querying with language in Jena text

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
That was the reason, query was returning all the big text content which 
slowed thing down over 20 fold.
So by filtering out the contents

select ?s ?p ?o where
{
   graph <https://resource.lingsoft.fi/416c3258-4974-4ab1-ae3d-115961923010/>
   {
      (?s ?score ?content) text:query (lsrm:content "some search" ) .
     ?s ?p ?o FILTER ( ?p != lsrm:content)
   }
}

query is as fast as just returning just ?s in the first example. Wonder 
why collecting large texts slows things down so much?



On 02/05/2019 18:29, Chris Tomlinson wrote:
> Hi Mikael,
>
> Regarding this issue, Is it possible that the 20 sec query is returning a lot more data than the < 1 sec query?
>
> I usually find that the underlying jena query time is easily dominated by data transfer costs to a browser.
>
> Chris
>
>> On May 2, 2019, at 9:29 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
>>
>>
>> Another issue in same setup. Following query is fast (lest than second) as expected:
>>
>> select ?s where
>> {
>> graph <https://resource.lingsoft.fi/416c3258-4974-4ab1-ae3d-115961923010/>
>> {
>>       (?s ?score ?content) text:query (lsrm:content "some search" ) .
>>      ?s ?p ?o
>>    }
>> }
>>
>> But this takes about 20 seconds
>>
>> select ?s ?p ?o where
>> {
>> graph <https://resource.lingsoft.fi/416c3258-4974-4ab1-ae3d-115961923010/>
>> {
>>       (?s ?score ?content) text:query (lsrm:content "some search" ) .
>>      ?s ?p ?o
>>    }
>> }
>>
>> Is first query optimized so that ?p and ?o aren't actually collected at all? What would be the correctway to make this query?
>>
>> Number of documents is about 3000, and number or triplets per document is 10.
>>
>>
>>
>>
>> On 02/05/2019 13:56, Mikael Pesonen wrote:
>>> I'm using Jena 3.11, full server as jar, and have following text index config:
>>>
>>> <#indexLucene> a text:TextIndexLucene ;
>>>       text:directory <jena_text_index>  ;
>>>       text:entityMap <#entMap> ;
>>>       text:storeValues true ;
>>>       text:analyzer [ a text:StandardAnalyzer ] ;
>>>       text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
>>>       text:queryParser text:AnalyzingQueryParser ;
>>>       text:multilingualSupport true ;
>>>    .
>>>
>>> <#entMap> a text:EntityMap ;
>>>       text:defaultField     "prefLabel" ;
>>>       text:entityField      "uri" ;
>>>       text:uidField         "uid" ;
>>>       text:langField        "lang" ;
>>>       text:graphField       "graph" ;
>>>       text:map (
>>>            [ text:field "prefLabel" ; text:predicate skos:prefLabel ]
>>>            [ text:field "altLabel"  ; text:predicate skos:altLabel ]
>>>            [ text:field "content"  ; text:predicate lsrm:content ]
>>>            ) .
>>>
>>>
>>> When inserting long text into lsrm:content, search usually works only without language. So, inserted
>>>
>>> <https://example.com/someid> lsrm:content "long ... text ... here"@en
>>>
>>> and querying like this works
>>>
>>> (?s ?score ?content) text:query (lsrm:content "text" ) .
>>>
>>> but this returns empty result
>>>
>>> (?s ?score ?content) text:query (lsrm:content "text" "lang:en") .
>>>
>>> But in some occasions language search does work in lsrm:content, can't see what is the cause here.
>>>
>>> Any ideas?
>>>
>> -- 
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Trouble with querying with language in Jena text

Posted by Chris Tomlinson <ch...@gmail.com>.
Hi Mikael,

Regarding this issue, Is it possible that the 20 sec query is returning a lot more data than the < 1 sec query?

I usually find that the underlying jena query time is easily dominated by data transfer costs to a browser.

Chris

> On May 2, 2019, at 9:29 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
> 
> 
> Another issue in same setup. Following query is fast (lest than second) as expected:
> 
> select ?s where
> {
> graph <https://resource.lingsoft.fi/416c3258-4974-4ab1-ae3d-115961923010/>
> {
>      (?s ?score ?content) text:query (lsrm:content "some search" ) .
>     ?s ?p ?o
>   }
> }
> 
> But this takes about 20 seconds
> 
> select ?s ?p ?o where
> {
> graph <https://resource.lingsoft.fi/416c3258-4974-4ab1-ae3d-115961923010/>
> {
>      (?s ?score ?content) text:query (lsrm:content "some search" ) .
>     ?s ?p ?o
>   }
> }
> 
> Is first query optimized so that ?p and ?o aren't actually collected at all? What would be the correctway to make this query?
> 
> Number of documents is about 3000, and number or triplets per document is 10.
> 
> 
> 
> 
> On 02/05/2019 13:56, Mikael Pesonen wrote:
>> 
>> I'm using Jena 3.11, full server as jar, and have following text index config:
>> 
>> <#indexLucene> a text:TextIndexLucene ;
>>      text:directory <jena_text_index>  ;
>>      text:entityMap <#entMap> ;
>>      text:storeValues true ;
>>      text:analyzer [ a text:StandardAnalyzer ] ;
>>      text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
>>      text:queryParser text:AnalyzingQueryParser ;
>>      text:multilingualSupport true ;
>>   .
>> 
>> <#entMap> a text:EntityMap ;
>>      text:defaultField     "prefLabel" ;
>>      text:entityField      "uri" ;
>>      text:uidField         "uid" ;
>>      text:langField        "lang" ;
>>      text:graphField       "graph" ;
>>      text:map (
>>           [ text:field "prefLabel" ; text:predicate skos:prefLabel ]
>>           [ text:field "altLabel"  ; text:predicate skos:altLabel ]
>>           [ text:field "content"  ; text:predicate lsrm:content ]
>>           ) .
>> 
>> 
>> When inserting long text into lsrm:content, search usually works only without language. So, inserted
>> 
>> <https://example.com/someid> lsrm:content "long ... text ... here"@en
>> 
>> and querying like this works
>> 
>> (?s ?score ?content) text:query (lsrm:content "text" ) .
>> 
>> but this returns empty result
>> 
>> (?s ?score ?content) text:query (lsrm:content "text" "lang:en") .
>> 
>> But in some occasions language search does work in lsrm:content, can't see what is the cause here.
>> 
>> Any ideas?
>> 
> 
> -- 
> Lingsoft - 30 years of Leading Language Management
> 
> www.lingsoft.fi
> 
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
> 
> Mikael Pesonen
> System Engineer
> 
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
> 
> Time zone: GMT+2
> 
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
> 
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
> 


Re: Trouble with querying with language in Jena text

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Another issue in same setup. Following query is fast (lest than second) 
as expected:

select ?s where
{
graph <https://resource.lingsoft.fi/416c3258-4974-4ab1-ae3d-115961923010/>
{
      (?s ?score ?content) text:query (lsrm:content "some search" ) .
     ?s ?p ?o
   }
}

But this takes about 20 seconds

select ?s ?p ?o where
{
graph <https://resource.lingsoft.fi/416c3258-4974-4ab1-ae3d-115961923010/>
{
      (?s ?score ?content) text:query (lsrm:content "some search" ) .
     ?s ?p ?o
   }
}

Is first query optimized so that ?p and ?o aren't actually collected at 
all? What would be the correctway to make this query?

Number of documents is about 3000, and number or triplets per document 
is 10.




On 02/05/2019 13:56, Mikael Pesonen wrote:
>
> I'm using Jena 3.11, full server as jar, and have following text index 
> config:
>
> <#indexLucene> a text:TextIndexLucene ;
>      text:directory <jena_text_index>  ;
>      text:entityMap <#entMap> ;
>      text:storeValues true ;
>      text:analyzer [ a text:StandardAnalyzer ] ;
>      text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
>      text:queryParser text:AnalyzingQueryParser ;
>      text:multilingualSupport true ;
>   .
>
> <#entMap> a text:EntityMap ;
>      text:defaultField     "prefLabel" ;
>      text:entityField      "uri" ;
>      text:uidField         "uid" ;
>      text:langField        "lang" ;
>      text:graphField       "graph" ;
>      text:map (
>           [ text:field "prefLabel" ; text:predicate skos:prefLabel ]
>           [ text:field "altLabel"  ; text:predicate skos:altLabel ]
>           [ text:field "content"  ; text:predicate lsrm:content ]
>           ) .
>
>
> When inserting long text into lsrm:content, search usually works only 
> without language. So, inserted
>
> <https://example.com/someid> lsrm:content "long ... text ... here"@en
>
> and querying like this works
>
> (?s ?score ?content) text:query (lsrm:content "text" ) .
>
> but this returns empty result
>
> (?s ?score ?content) text:query (lsrm:content "text" "lang:en") .
>
> But in some occasions language search does work in lsrm:content, can't 
> see what is the cause here.
>
> Any ideas?
>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Trouble with querying with language in Jena text

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Removing the two lines from config solved language search, both 
"text"@en and "lang:en" works now. Thanks!

On 02/05/2019 17:47, Chris Tomlinson wrote:
> Hi Mikael,
>
> try removing:
>
>>       text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
>>       text:queryParser text:AnalyzingQueryParser ;
> Also the following should work as well as using “lang:en”:
>
>        (?s ?score ?content) text:query (lsrm:content “text”@en)
>
> but I doubt that will make a difference.
>
> I’m still on 3.10 but there’ve been no changes in the jena-text for 3.11 that should be in play for your issue.
>
> Chris
>
>
>> On May 2, 2019, at 5:56 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
>>
>>
>> I'm using Jena 3.11, full server as jar, and have following text index config:
>>
>> <#indexLucene> a text:TextIndexLucene ;
>>       text:directory <jena_text_index>  ;
>>       text:entityMap <#entMap> ;
>>       text:storeValues true ;
>>       text:analyzer [ a text:StandardAnalyzer ] ;
>>       text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
>>       text:queryParser text:AnalyzingQueryParser ;
>>       text:multilingualSupport true ;
>>    .
>>
>> <#entMap> a text:EntityMap ;
>>       text:defaultField     "prefLabel" ;
>>       text:entityField      "uri" ;
>>       text:uidField         "uid" ;
>>       text:langField        "lang" ;
>>       text:graphField       "graph" ;
>>       text:map (
>>            [ text:field "prefLabel" ; text:predicate skos:prefLabel ]
>>            [ text:field "altLabel"  ; text:predicate skos:altLabel ]
>>            [ text:field "content"  ; text:predicate lsrm:content ]
>>            ) .
>>
>>
>> When inserting long text into lsrm:content, search usually works only without language. So, inserted
>>
>> <https://example.com/someid> lsrm:content "long ... text ... here"@en
>>
>> and querying like this works
>>
>> (?s ?score ?content) text:query (lsrm:content "text" ) .
>>
>> but this returns empty result
>>
>> (?s ?score ?content) text:query (lsrm:content "text" "lang:en") .
>>
>> But in some occasions language search does work in lsrm:content, can't see what is the cause here.
>>
>> Any ideas?
>>
>> -- 
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Trouble with querying with language in Jena text

Posted by Chris Tomlinson <ch...@gmail.com>.
Hi Mikael,

try removing:

>      text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
>      text:queryParser text:AnalyzingQueryParser ;

Also the following should work as well as using “lang:en”:

      (?s ?score ?content) text:query (lsrm:content “text”@en)

but I doubt that will make a difference.

I’m still on 3.10 but there’ve been no changes in the jena-text for 3.11 that should be in play for your issue.

Chris


> On May 2, 2019, at 5:56 AM, Mikael Pesonen <mi...@lingsoft.fi> wrote:
> 
> 
> I'm using Jena 3.11, full server as jar, and have following text index config:
> 
> <#indexLucene> a text:TextIndexLucene ;
>      text:directory <jena_text_index>  ;
>      text:entityMap <#entMap> ;
>      text:storeValues true ;
>      text:analyzer [ a text:StandardAnalyzer ] ;
>      text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
>      text:queryParser text:AnalyzingQueryParser ;
>      text:multilingualSupport true ;
>   .
> 
> <#entMap> a text:EntityMap ;
>      text:defaultField     "prefLabel" ;
>      text:entityField      "uri" ;
>      text:uidField         "uid" ;
>      text:langField        "lang" ;
>      text:graphField       "graph" ;
>      text:map (
>           [ text:field "prefLabel" ; text:predicate skos:prefLabel ]
>           [ text:field "altLabel"  ; text:predicate skos:altLabel ]
>           [ text:field "content"  ; text:predicate lsrm:content ]
>           ) .
> 
> 
> When inserting long text into lsrm:content, search usually works only without language. So, inserted
> 
> <https://example.com/someid> lsrm:content "long ... text ... here"@en
> 
> and querying like this works
> 
> (?s ?score ?content) text:query (lsrm:content "text" ) .
> 
> but this returns empty result
> 
> (?s ?score ?content) text:query (lsrm:content "text" "lang:en") .
> 
> But in some occasions language search does work in lsrm:content, can't see what is the cause here.
> 
> Any ideas?
> 
> -- 
> Lingsoft - 30 years of Leading Language Management
> 
> www.lingsoft.fi
> 
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
> 
> Mikael Pesonen
> System Engineer
> 
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
> 
> Time zone: GMT+2
> 
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
> 
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>