You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Chris Tomlinson <ch...@gmail.com> on 2017/10/20 19:00:04 UTC

jena-text highlighting support

Hi,

I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.

While examining the code I found an oddity. I was interested in seeing the final Lucene query created in TextIndexLucene via the log.debug at line 417 but it never fires even though the log.debug at line 249 in TextQueryPF produces expected output. I’m guessing that somehow the use of the lambda in getOrFill of the cache at line 267 of TextQueryPF somehow obscures the logging properties. Maybe that’s why there’s the commented out alternative that avoids the cache?

Thanks,
Chris


Re: TextQueryPF, TextIndexLucene logging [was Re: jena-text highlighting support]

Posted by Chris Tomlinson <ch...@gmail.com>.
Hello Andy,

Mea culpa!! I finally tracked down my problem. The testbed was running a slightly older version than I had thought. Now that I’m using up-to-date I’m seeing the debug info that was added 19 Aug.

Sorry for wasting b/w,
Chris

> On Oct 23, 2017, at 4:58 PM, Chris Tomlinson <ch...@gmail.com> wrote:
> 
> Hi Andy,
> 
> The "Lucene query: {} ({})” message is not coming out at all. I try a variety of queries, and other parameter settings and since the cache key is such that any change in query, limit, language snd such leads to another key I should see many such log statements.
> 
> In any event, I’ll look further into the matter since I’ll now fork jena and see what I can do regarding the highlighting.
> 
> Thanks,
> Chris
> 
> 
> 
>> On Oct 23, 2017, at 3:31 PM, Andy Seaborne <andy@apache.org <ma...@apache.org>> wrote:
>> 
>> That all looks right.
>> 
>> Is the message not coming out at all or coming out once only?  I ask because it is in a "getOrFill" so the Callable is only called when there is a cache miss.
>> 
>>    Andy
>> 
>> On 22/10/17 18:22, Chris Tomlinson wrote:
>>> Hello Andy,
>>> Thank you for the reply. I’m apparently not facile with log4j.properties. I've tried several configurations and I'm only able to get the TextQueryPF log.debug() to fire:
>>>> [2017-10-22 16:47:52] Fuseki     INFO  [1] POST http://localhost:13180/fuseki/bdrcrw/query <http://localhost:13180/fuseki/bdrcrw/query>
>>>> [2017-10-22 16:47:52] Fuseki     INFO  [1] POST /bdrcrw :: 'query' :: [application/x-www-form-urlencoded charset=UTF-8] ?
>>>> [2017-10-22 16:47:52] Fuseki     INFO  [1] Query = PREFIX : <http://purl.bdrc.io/ontology/root# <http://purl.bdrc.io/ontology/root#>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns# <http://www.w3.org/1999/02/22-rdf-syntax-ns#>> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema# <http://www.w3.org/2000/01/rdf-schema#>> PREFIX owl: <http://www.w3.org/2002/07/owl# <http://www.w3.org/2002/07/owl#>> PREFIX adm: <http://purl.bdrc.io/ontology/admin/ <http://purl.bdrc.io/ontology/admin/>> PREFIX bdo: <http://purl.bdrc.io/ontology/core/ <http://purl.bdrc.io/ontology/core/>> PREFIX bdr: <http://purl.bdrc.io/resource/ <http://purl.bdrc.io/resource/>> PREFIX text: <http://jena.apache.org/text# <http://jena.apache.org/text#>> PREFIX skos: <http://www.w3.org/2004/02/skos/core# <http://www.w3.org/2004/02/skos/core#>> PREFIX lang: <http://ontologi.es/lang/core# <http://ontologi.es/lang/core#>> PREFIX ad: <http://schemas.talis.com/2005/address/schema# <http://schemas.talis.com/2005/address/schema#>>  select ?s ?n ?sc where {   (?s1 ?sc ?lit) text:query (bdo:chunkContents "\"དགའ་རབ་རྡོ་རྗེ་\"") .   ?s bdo:eTextHasChunk ?s1 .   ?s1 bdo:seqNum ?n . } limit 10
>>>> [2017-10-22 16:47:52] TextQueryPF DEBUG Text query: "དགའ་རབ་རྡོ་རྗེ་" (-1)
>>>> [2017-10-22 16:47:52] Fuseki     INFO  [1] 200 OK (423 ms)
>>> I’ve tried a variety of configurations along the following lines:
>>>> log4j.rootLogger=INFO, jena.plainstdout
>>>> log4j.logger.org.apache.jena=INFO
>>>> log4j.logger.org.apache.jena.fuseki=INFO
>>>> log4j.logger.org.apache.jena.query.text=DEBUG
>>>> log4j.logger.org.apache.jena.query.text.TextQueryPF=DEBUG
>>>> log4j.logger.org.apache.jena.query.text.TextIndexLucene=DEBUG
>>> As I understand, setting  log4j.logger.org.apache.jena=DEBUG should be sufficient (if too verbose) and setting log4j.logger.org.apache.jena.query.text=DEBUG should have enabled debug for any class in the package and I really thought log4j.logger.org.apache.jena.query.text.TextIndexLucene=DEBUG should have enabled debug level in TextIndexLucene specifically. Setting log4j.logger.org.apache.jena.query.text=DEBUG and log4j.logger.org.apache.jena.query.text.TextQueryPF=INFO indeed turns off the log.debug in TextQueryPF as I expected.
>>> The code region containing the log.debug() in TextIndexLucene is:
>>>> 
>>>>         String queryString = textClause ;
>>>>         if (langClause != null)
>>>>             queryString = "(" + queryString + ") AND " + langClause ;
>>>>         if (graphClause != null)
>>>>             queryString = "(" + queryString + ") AND " + graphClause ;
>>>> 
>>>>         if ( log.isDebugEnabled())
>>>>             log.debug("Lucene query: {} ({})", queryString,limit) ;
>>>> 
>>>>         IndexSearcher indexSearcher = new IndexSearcher(indexReader) ;
>>>>         Query query = parseQuery(queryString, queryAnalyzer) ;
>>>>         if ( limit <= 0 )
>>>>             limit = MAX_N ;
>>>>         ScoreDoc[] sDocs = indexSearcher.search(query, limit).scoreDocs ;
>>>> 
>>>>         List<TextHit> results = new ArrayList<>() ;
>>> And I know that execution passes the log.isDebugEnabled and reaches the parseQuery() since I can inject a queryString with an error and see the stack trace where the ParseException is caught in the getOrFill().
>>> As far as I can tell the log is setup in the same manner in TextQueryPF and TextIndexLucene:
>>>> import org.slf4j.Logger ;
>>>> import org.slf4j.LoggerFactory ;
>>>> 
>>>> /** property function that accesses a text index */
>>>> public class TextQueryPF extends PropertyFunctionBase {
>>>>     private static Logger log = LoggerFactory.getLogger(TextQueryPF.class) ;
>>> and
>>>> import org.slf4j.Logger ;
>>>> import org.slf4j.LoggerFactory ;
>>>> 
>>>> public class TextIndexLucene implements TextIndex {
>>>>     private static Logger          log      = LoggerFactory.getLogger(TextIndexLucene.class) ;
>>> What am I missing?
>>> Thanks,
>>> Chris
>>>> On Oct 21, 2017, at 3:33 PM, Andy Seaborne <andy@apache.org <ma...@apache.org>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On 20/10/17 20:00, Chris Tomlinson wrote:
>>>>> Hi,
>>>>> I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.
>>>>> While examining the code I found an oddity. I was interested in seeing the final Lucene query created in TextIndexLucene via the log.debug at line 417 but it never fires even though the log.debug at line 249 in TextQueryPF produces expected output. I’m guessing that somehow the use of the lambda in getOrFill of the cache at line 267 of TextQueryPF somehow obscures the logging properties. Maybe that’s why there’s the commented out alternative that avoids the cache?
>>>> 
>>>> Because they are different loggers? As statics, they are as compiled, no closure related.
>>>> 
>>>> TextIndexLucene.class and TextQueryPF.class respectively.
>>>> 
>>>> (I see the "no cache" code as a debugging remains - there was a cache issue recently)
>>>> 
>>>>    Andy
>>>> 
>>>>> Thanks,
>>>>> Chris
> 


Re: TextQueryPF, TextIndexLucene logging [was Re: jena-text highlighting support]

Posted by Chris Tomlinson <ch...@gmail.com>.
Hi Andy,

The "Lucene query: {} ({})” message is not coming out at all. I try a variety of queries, and other parameter settings and since the cache key is such that any change in query, limit, language snd such leads to another key I should see many such log statements.

In any event, I’ll look further into the matter since I’ll now fork jena and see what I can do regarding the highlighting.

Thanks,
Chris



> On Oct 23, 2017, at 3:31 PM, Andy Seaborne <an...@apache.org> wrote:
> 
> That all looks right.
> 
> Is the message not coming out at all or coming out once only?  I ask because it is in a "getOrFill" so the Callable is only called when there is a cache miss.
> 
>    Andy
> 
> On 22/10/17 18:22, Chris Tomlinson wrote:
>> Hello Andy,
>> Thank you for the reply. I’m apparently not facile with log4j.properties. I've tried several configurations and I'm only able to get the TextQueryPF log.debug() to fire:
>>> [2017-10-22 16:47:52] Fuseki     INFO  [1] POST http://localhost:13180/fuseki/bdrcrw/query
>>> [2017-10-22 16:47:52] Fuseki     INFO  [1] POST /bdrcrw :: 'query' :: [application/x-www-form-urlencoded charset=UTF-8] ?
>>> [2017-10-22 16:47:52] Fuseki     INFO  [1] Query = PREFIX : <http://purl.bdrc.io/ontology/root#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX adm: <http://purl.bdrc.io/ontology/admin/> PREFIX bdo: <http://purl.bdrc.io/ontology/core/> PREFIX bdr: <http://purl.bdrc.io/resource/> PREFIX text: <http://jena.apache.org/text#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX lang: <http://ontologi.es/lang/core#> PREFIX ad: <http://schemas.talis.com/2005/address/schema#>  select ?s ?n ?sc where {   (?s1 ?sc ?lit) text:query (bdo:chunkContents "\"དགའ་རབ་རྡོ་རྗེ་\"") .   ?s bdo:eTextHasChunk ?s1 .   ?s1 bdo:seqNum ?n . } limit 10
>>> [2017-10-22 16:47:52] TextQueryPF DEBUG Text query: "དགའ་རབ་རྡོ་རྗེ་" (-1)
>>> [2017-10-22 16:47:52] Fuseki     INFO  [1] 200 OK (423 ms)
>> I’ve tried a variety of configurations along the following lines:
>>> log4j.rootLogger=INFO, jena.plainstdout
>>> log4j.logger.org.apache.jena=INFO
>>> log4j.logger.org.apache.jena.fuseki=INFO
>>> log4j.logger.org.apache.jena.query.text=DEBUG
>>> log4j.logger.org.apache.jena.query.text.TextQueryPF=DEBUG
>>> log4j.logger.org.apache.jena.query.text.TextIndexLucene=DEBUG
>> As I understand, setting  log4j.logger.org.apache.jena=DEBUG should be sufficient (if too verbose) and setting log4j.logger.org.apache.jena.query.text=DEBUG should have enabled debug for any class in the package and I really thought log4j.logger.org.apache.jena.query.text.TextIndexLucene=DEBUG should have enabled debug level in TextIndexLucene specifically. Setting log4j.logger.org.apache.jena.query.text=DEBUG and log4j.logger.org.apache.jena.query.text.TextQueryPF=INFO indeed turns off the log.debug in TextQueryPF as I expected.
>> The code region containing the log.debug() in TextIndexLucene is:
>>> 
>>>         String queryString = textClause ;
>>>         if (langClause != null)
>>>             queryString = "(" + queryString + ") AND " + langClause ;
>>>         if (graphClause != null)
>>>             queryString = "(" + queryString + ") AND " + graphClause ;
>>> 
>>>         if ( log.isDebugEnabled())
>>>             log.debug("Lucene query: {} ({})", queryString,limit) ;
>>> 
>>>         IndexSearcher indexSearcher = new IndexSearcher(indexReader) ;
>>>         Query query = parseQuery(queryString, queryAnalyzer) ;
>>>         if ( limit <= 0 )
>>>             limit = MAX_N ;
>>>         ScoreDoc[] sDocs = indexSearcher.search(query, limit).scoreDocs ;
>>> 
>>>         List<TextHit> results = new ArrayList<>() ;
>> And I know that execution passes the log.isDebugEnabled and reaches the parseQuery() since I can inject a queryString with an error and see the stack trace where the ParseException is caught in the getOrFill().
>> As far as I can tell the log is setup in the same manner in TextQueryPF and TextIndexLucene:
>>> import org.slf4j.Logger ;
>>> import org.slf4j.LoggerFactory ;
>>> 
>>> /** property function that accesses a text index */
>>> public class TextQueryPF extends PropertyFunctionBase {
>>>     private static Logger log = LoggerFactory.getLogger(TextQueryPF.class) ;
>> and
>>> import org.slf4j.Logger ;
>>> import org.slf4j.LoggerFactory ;
>>> 
>>> public class TextIndexLucene implements TextIndex {
>>>     private static Logger          log      = LoggerFactory.getLogger(TextIndexLucene.class) ;
>> What am I missing?
>> Thanks,
>> Chris
>>> On Oct 21, 2017, at 3:33 PM, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>> 
>>> 
>>> On 20/10/17 20:00, Chris Tomlinson wrote:
>>>> Hi,
>>>> I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.
>>>> While examining the code I found an oddity. I was interested in seeing the final Lucene query created in TextIndexLucene via the log.debug at line 417 but it never fires even though the log.debug at line 249 in TextQueryPF produces expected output. I’m guessing that somehow the use of the lambda in getOrFill of the cache at line 267 of TextQueryPF somehow obscures the logging properties. Maybe that’s why there’s the commented out alternative that avoids the cache?
>>> 
>>> Because they are different loggers? As statics, they are as compiled, no closure related.
>>> 
>>> TextIndexLucene.class and TextQueryPF.class respectively.
>>> 
>>> (I see the "no cache" code as a debugging remains - there was a cache issue recently)
>>> 
>>>    Andy
>>> 
>>>> Thanks,
>>>> Chris


Re: TextQueryPF, TextIndexLucene logging [was Re: jena-text highlighting support]

Posted by Andy Seaborne <an...@apache.org>.
That all looks right.

Is the message not coming out at all or coming out once only?  I ask 
because it is in a "getOrFill" so the Callable is only called when there 
is a cache miss.

     Andy

On 22/10/17 18:22, Chris Tomlinson wrote:
> Hello Andy,
> 
> Thank you for the reply. I’m apparently not facile with log4j.properties. I've tried several configurations and I'm only able to get the TextQueryPF log.debug() to fire:
> 
>> [2017-10-22 16:47:52] Fuseki     INFO  [1] POST http://localhost:13180/fuseki/bdrcrw/query
>> [2017-10-22 16:47:52] Fuseki     INFO  [1] POST /bdrcrw :: 'query' :: [application/x-www-form-urlencoded charset=UTF-8] ?
>> [2017-10-22 16:47:52] Fuseki     INFO  [1] Query = PREFIX : <http://purl.bdrc.io/ontology/root#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX adm: <http://purl.bdrc.io/ontology/admin/> PREFIX bdo: <http://purl.bdrc.io/ontology/core/> PREFIX bdr: <http://purl.bdrc.io/resource/> PREFIX text: <http://jena.apache.org/text#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX lang: <http://ontologi.es/lang/core#> PREFIX ad: <http://schemas.talis.com/2005/address/schema#>  select ?s ?n ?sc where {   (?s1 ?sc ?lit) text:query (bdo:chunkContents "\"དགའ་རབ་རྡོ་རྗེ་\"") .   ?s bdo:eTextHasChunk ?s1 .   ?s1 bdo:seqNum ?n . } limit 10
>> [2017-10-22 16:47:52] TextQueryPF DEBUG Text query: "དགའ་རབ་རྡོ་རྗེ་" (-1)
>> [2017-10-22 16:47:52] Fuseki     INFO  [1] 200 OK (423 ms)
> 
> I’ve tried a variety of configurations along the following lines:
> 
>> log4j.rootLogger=INFO, jena.plainstdout
>> log4j.logger.org.apache.jena=INFO
>> log4j.logger.org.apache.jena.fuseki=INFO
>> log4j.logger.org.apache.jena.query.text=DEBUG
>> log4j.logger.org.apache.jena.query.text.TextQueryPF=DEBUG
>> log4j.logger.org.apache.jena.query.text.TextIndexLucene=DEBUG
> 
> 
> As I understand, setting  log4j.logger.org.apache.jena=DEBUG should be sufficient (if too verbose) and setting log4j.logger.org.apache.jena.query.text=DEBUG should have enabled debug for any class in the package and I really thought log4j.logger.org.apache.jena.query.text.TextIndexLucene=DEBUG should have enabled debug level in TextIndexLucene specifically. Setting log4j.logger.org.apache.jena.query.text=DEBUG and log4j.logger.org.apache.jena.query.text.TextQueryPF=INFO indeed turns off the log.debug in TextQueryPF as I expected.
> 
> The code region containing the log.debug() in TextIndexLucene is:
> 
>>
>>          String queryString = textClause ;
>>          if (langClause != null)
>>              queryString = "(" + queryString + ") AND " + langClause ;
>>          if (graphClause != null)
>>              queryString = "(" + queryString + ") AND " + graphClause ;
>>
>>          if ( log.isDebugEnabled())
>>              log.debug("Lucene query: {} ({})", queryString,limit) ;
>>
>>          IndexSearcher indexSearcher = new IndexSearcher(indexReader) ;
>>          Query query = parseQuery(queryString, queryAnalyzer) ;
>>          if ( limit <= 0 )
>>              limit = MAX_N ;
>>          ScoreDoc[] sDocs = indexSearcher.search(query, limit).scoreDocs ;
>>
>>          List<TextHit> results = new ArrayList<>() ;
> 
> 
> And I know that execution passes the log.isDebugEnabled and reaches the parseQuery() since I can inject a queryString with an error and see the stack trace where the ParseException is caught in the getOrFill().
> 
> As far as I can tell the log is setup in the same manner in TextQueryPF and TextIndexLucene:
> 
>> import org.slf4j.Logger ;
>> import org.slf4j.LoggerFactory ;
>>
>> /** property function that accesses a text index */
>> public class TextQueryPF extends PropertyFunctionBase {
>>      private static Logger log = LoggerFactory.getLogger(TextQueryPF.class) ;
> 
> 
> and
> 
>> import org.slf4j.Logger ;
>> import org.slf4j.LoggerFactory ;
>>
>> public class TextIndexLucene implements TextIndex {
>>      private static Logger          log      = LoggerFactory.getLogger(TextIndexLucene.class) ;
> 
> 
> What am I missing?
> 
> Thanks,
> Chris
> 
> 
> 
> 
>> On Oct 21, 2017, at 3:33 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>
>>
>> On 20/10/17 20:00, Chris Tomlinson wrote:
>>> Hi,
>>> I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.
>>> While examining the code I found an oddity. I was interested in seeing the final Lucene query created in TextIndexLucene via the log.debug at line 417 but it never fires even though the log.debug at line 249 in TextQueryPF produces expected output. I’m guessing that somehow the use of the lambda in getOrFill of the cache at line 267 of TextQueryPF somehow obscures the logging properties. Maybe that’s why there’s the commented out alternative that avoids the cache?
>>
>> Because they are different loggers? As statics, they are as compiled, no closure related.
>>
>> TextIndexLucene.class and TextQueryPF.class respectively.
>>
>> (I see the "no cache" code as a debugging remains - there was a cache issue recently)
>>
>>     Andy
>>
>>> Thanks,
>>> Chris
> 
> 

Re: TextQueryPF, TextIndexLucene logging [was Re: jena-text highlighting support]

Posted by Chris Tomlinson <ch...@gmail.com>.
Hello Andy,

Thank you for the reply. I’m apparently not facile with log4j.properties. I've tried several configurations and I'm only able to get the TextQueryPF log.debug() to fire:

> [2017-10-22 16:47:52] Fuseki     INFO  [1] POST http://localhost:13180/fuseki/bdrcrw/query
> [2017-10-22 16:47:52] Fuseki     INFO  [1] POST /bdrcrw :: 'query' :: [application/x-www-form-urlencoded charset=UTF-8] ? 
> [2017-10-22 16:47:52] Fuseki     INFO  [1] Query = PREFIX : <http://purl.bdrc.io/ontology/root#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX adm: <http://purl.bdrc.io/ontology/admin/> PREFIX bdo: <http://purl.bdrc.io/ontology/core/> PREFIX bdr: <http://purl.bdrc.io/resource/> PREFIX text: <http://jena.apache.org/text#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX lang: <http://ontologi.es/lang/core#> PREFIX ad: <http://schemas.talis.com/2005/address/schema#>  select ?s ?n ?sc where {   (?s1 ?sc ?lit) text:query (bdo:chunkContents "\"དགའ་རབ་རྡོ་རྗེ་\"") .   ?s bdo:eTextHasChunk ?s1 .   ?s1 bdo:seqNum ?n . } limit 10 
> [2017-10-22 16:47:52] TextQueryPF DEBUG Text query: "དགའ་རབ་རྡོ་རྗེ་" (-1)
> [2017-10-22 16:47:52] Fuseki     INFO  [1] 200 OK (423 ms)

I’ve tried a variety of configurations along the following lines:

> log4j.rootLogger=INFO, jena.plainstdout
> log4j.logger.org.apache.jena=INFO
> log4j.logger.org.apache.jena.fuseki=INFO
> log4j.logger.org.apache.jena.query.text=DEBUG
> log4j.logger.org.apache.jena.query.text.TextQueryPF=DEBUG
> log4j.logger.org.apache.jena.query.text.TextIndexLucene=DEBUG


As I understand, setting  log4j.logger.org.apache.jena=DEBUG should be sufficient (if too verbose) and setting log4j.logger.org.apache.jena.query.text=DEBUG should have enabled debug for any class in the package and I really thought log4j.logger.org.apache.jena.query.text.TextIndexLucene=DEBUG should have enabled debug level in TextIndexLucene specifically. Setting log4j.logger.org.apache.jena.query.text=DEBUG and log4j.logger.org.apache.jena.query.text.TextQueryPF=INFO indeed turns off the log.debug in TextQueryPF as I expected.

The code region containing the log.debug() in TextIndexLucene is:

> 
>         String queryString = textClause ;
>         if (langClause != null)
>             queryString = "(" + queryString + ") AND " + langClause ;
>         if (graphClause != null)
>             queryString = "(" + queryString + ") AND " + graphClause ;
> 
>         if ( log.isDebugEnabled())
>             log.debug("Lucene query: {} ({})", queryString,limit) ;
> 
>         IndexSearcher indexSearcher = new IndexSearcher(indexReader) ;
>         Query query = parseQuery(queryString, queryAnalyzer) ;
>         if ( limit <= 0 )
>             limit = MAX_N ;
>         ScoreDoc[] sDocs = indexSearcher.search(query, limit).scoreDocs ;
> 
>         List<TextHit> results = new ArrayList<>() ;


And I know that execution passes the log.isDebugEnabled and reaches the parseQuery() since I can inject a queryString with an error and see the stack trace where the ParseException is caught in the getOrFill().

As far as I can tell the log is setup in the same manner in TextQueryPF and TextIndexLucene:

> import org.slf4j.Logger ;
> import org.slf4j.LoggerFactory ;
> 
> /** property function that accesses a text index */
> public class TextQueryPF extends PropertyFunctionBase {
>     private static Logger log = LoggerFactory.getLogger(TextQueryPF.class) ;


and

> import org.slf4j.Logger ;
> import org.slf4j.LoggerFactory ;
> 
> public class TextIndexLucene implements TextIndex {
>     private static Logger          log      = LoggerFactory.getLogger(TextIndexLucene.class) ;


What am I missing?

Thanks,
Chris




> On Oct 21, 2017, at 3:33 PM, Andy Seaborne <an...@apache.org> wrote:
> 
> 
> 
> On 20/10/17 20:00, Chris Tomlinson wrote:
>> Hi,
>> I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.
>> While examining the code I found an oddity. I was interested in seeing the final Lucene query created in TextIndexLucene via the log.debug at line 417 but it never fires even though the log.debug at line 249 in TextQueryPF produces expected output. I’m guessing that somehow the use of the lambda in getOrFill of the cache at line 267 of TextQueryPF somehow obscures the logging properties. Maybe that’s why there’s the commented out alternative that avoids the cache?
> 
> Because they are different loggers? As statics, they are as compiled, no closure related.
> 
> TextIndexLucene.class and TextQueryPF.class respectively.
> 
> (I see the "no cache" code as a debugging remains - there was a cache issue recently)
> 
>    Andy
> 
>> Thanks,
>> Chris


TextQueryPF, TextIndexLucene logging [was Re: jena-text highlighting support]

Posted by Andy Seaborne <an...@apache.org>.

On 20/10/17 20:00, Chris Tomlinson wrote:
> Hi,
> 
> I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.
> 
> While examining the code I found an oddity. I was interested in seeing the final Lucene query created in TextIndexLucene via the log.debug at line 417 but it never fires even though the log.debug at line 249 in TextQueryPF produces expected output. I’m guessing that somehow the use of the lambda in getOrFill of the cache at line 267 of TextQueryPF somehow obscures the logging properties. Maybe that’s why there’s the commented out alternative that avoids the cache?

Because they are different loggers? As statics, they are as compiled, no 
closure related.

TextIndexLucene.class and TextQueryPF.class respectively.

(I see the "no cache" code as a debugging remains - there was a cache 
issue recently)

     Andy

> 
> Thanks,
> Chris
> 

Re: jena-text highlighting support

Posted by Osma Suominen <os...@helsinki.fi>.
Andy Seaborne kirjoitti 23.10.2017 klo 22:59:

>> With the extraneous part of the original email having been split-off, 
>> I'm interested to know if and how other users have provided for 
>> highlighting of Lucene search matches when using jena-text.

While I'm relying on jena-text in my application (Skosmos), there hasn't 
been any need for highlighting since the matched strings are typically 
very short (labels of SKOS concepts) and in most cases prefix matching 
is used so the start of the match is also the start of the label.

>> If the literals are relatively short then perhaps highlighting where a 
>> match was found by Lucene is not considered necessary; however, if the 
>> literals are several hundreds of code-points then it becomes more 
>> useful to help with ultimately displaying results to users. It seems 
>> like it might be feasible to provide a 4th return parameter that could 
>> identify the start and end of each match in the returned literal.
> 
> That's good especially if when not using it has no impact on performance.

+1

Willing to review PRs.

-Osma

PS. Targeting only the dev@ list, for some reason the previous two 
messages in this thread had both users@ and dev@ in To:/Cc: fields, with 
Reply-To set to users@.

-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Re: jena-text highlighting support

Posted by Andy Seaborne <an...@apache.org>.

On 22/10/17 18:50, Chris Tomlinson wrote:
> Hello,
> 
> With the extraneous part of the original email having been split-off, I'm interested to know if and how other users have provided for highlighting of Lucene search matches when using jena-text.
> 
> If the literals are relatively short then perhaps highlighting where a match was found by Lucene is not considered necessary; however, if the literals are several hundreds of code-points then it becomes more useful to help with ultimately displaying results to users. It seems like it might be feasible to provide a 4th return parameter that could identify the start and end of each match in the returned literal.

That's good especially if when not using it has no impact on performance.

     Andy

> 
> Thanks,
> Chris
> 
> 
>> On Oct 20, 2017, at 2:00 PM, Chris Tomlinson <ch...@gmail.com> wrote:
>>
>> Hi,
>>
>> I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.

Re: jena-text highlighting support

Posted by Andy Seaborne <an...@apache.org>.

On 22/10/17 18:50, Chris Tomlinson wrote:
> Hello,
> 
> With the extraneous part of the original email having been split-off, I'm interested to know if and how other users have provided for highlighting of Lucene search matches when using jena-text.
> 
> If the literals are relatively short then perhaps highlighting where a match was found by Lucene is not considered necessary; however, if the literals are several hundreds of code-points then it becomes more useful to help with ultimately displaying results to users. It seems like it might be feasible to provide a 4th return parameter that could identify the start and end of each match in the returned literal.

That's good especially if when not using it has no impact on performance.

     Andy

> 
> Thanks,
> Chris
> 
> 
>> On Oct 20, 2017, at 2:00 PM, Chris Tomlinson <ch...@gmail.com> wrote:
>>
>> Hi,
>>
>> I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.

Re: jena-text highlighting support

Posted by Chris Tomlinson <ch...@gmail.com>.
Hello,

With the extraneous part of the original email having been split-off, I'm interested to know if and how other users have provided for highlighting of Lucene search matches when using jena-text. 

If the literals are relatively short then perhaps highlighting where a match was found by Lucene is not considered necessary; however, if the literals are several hundreds of code-points then it becomes more useful to help with ultimately displaying results to users. It seems like it might be feasible to provide a 4th return parameter that could identify the start and end of each match in the returned literal.

Thanks,
Chris


> On Oct 20, 2017, at 2:00 PM, Chris Tomlinson <ch...@gmail.com> wrote:
> 
> Hi,
> 
> I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.

Re: jena-text highlighting support

Posted by Chris Tomlinson <ch...@gmail.com>.
Hello,

With the extraneous part of the original email having been split-off, I'm interested to know if and how other users have provided for highlighting of Lucene search matches when using jena-text. 

If the literals are relatively short then perhaps highlighting where a match was found by Lucene is not considered necessary; however, if the literals are several hundreds of code-points then it becomes more useful to help with ultimately displaying results to users. It seems like it might be feasible to provide a 4th return parameter that could identify the start and end of each match in the returned literal.

Thanks,
Chris


> On Oct 20, 2017, at 2:00 PM, Chris Tomlinson <ch...@gmail.com> wrote:
> 
> Hi,
> 
> I’m interested in looking into whether and how it might be possible to incorporate Lucene highlighting into jena-text. I don’t see any other work, but perhaps others have dealt with the topic already. I was thinking of some sort of a 4th return parameter in the PF.