You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Andrea Dessi <an...@gmail.com> on 2013/10/05 09:45:33 UTC

Text Query vs Filter Regex

Hi,

I've got jena-text working to do Lucene indexing, and I'd like to know the
technical differences  between text:query and filter regex.

Thanks for any help,

Andrea

Re: Text Query vs Filter Regex

Posted by Rob Vesse <rv...@dotnetrdf.org>.

Both are true, no part of the SPARQL specification in any way requires
Lucene

Text query is an ARQ specific extension to SPARQL that uses Lucene

Rob

On 10/7/13 2:52 PM, "Andrea Dessi" <an...@gmail.com> wrote:

>Hi Rob,
>
>ok thanks for your quickly reply.
>
>in this case then when you say
>
>"REGEX has absolutely no relation to Lucene,"
>
>means maybe "*FILTER ** *has absolutely no relation to Lucene"?
>
>
>
>
>
>2013/10/7 Rob Vesse <rv...@dotnetrdf.org>
>
>> No, as Andy explained they are two completely different mechanisms
>>
>> REGEX has absolutely no relation to Lucene, it is a part of standard
>> SPARQL and causes the query engine to have to evaluate the regular
>> expression for every possible solution returned by the inner portion of
>> the query to decided whether to retain that solution or not.
>>
>> Text querying uses a Lucene index and so can pick out only solutions
>>that
>> satisfy some Lucene query and so is substantially more performant.
>>
>> In the example given your FILTER clause is superfluous since the text
>> query ensures that only subjects which satisfy the text query are
>>matched
>> so having the FILTER as well is doing unnecessary work over the possible
>> solutions produced by the text query.
>>
>> Rob
>>
>> On 10/7/13 1:54 PM, "Andrea Dessi" <an...@gmail.com> wrote:
>>
>> >Thank you Andy for your previously reply.
>> >
>> >My doubts born for example in this query
>> >
>> >
>> >SELECT DISTINCT ?s
>> >
>> >WHERE
>> >
>> >{
>> >
>> >*?s text:query (dbpprop:type 'Daily newspaper')* .
>> >
>> >?s dbpprop:type ?v1 .
>> >
>> >*FILTER ( REGEX(STR(?v1), "Daily newspaper" , "i") )*
>> >}
>> >
>> >
>> >Lucene indexes work together (one Lucene query) or not?
>> >
>> >Thanks again
>> >
>> >--
>> >Andrea.
>> >
>> >
>> >2013/10/5 Andy Seaborne <an...@gmail.com>
>> >
>> >> On 05/10/13 08:45, Andrea Dessi wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> I've got jena-text working to do Lucene indexing, and I'd like to
>>know
>> >>>the
>> >>> technical differences  between text:query and filter regex.
>> >>>
>> >>> Thanks for any help,
>> >>>
>> >>> Andrea
>> >>>
>> >>>
>> >> Hi Andrea,
>> >>
>> >> I'm not sure what aspect you mean but
>> >>
>> >> text:query is a property function - it looks like part of the basic
>> >>graph
>> >> pattern and it can generate answers, binding variables.  A filter can
>> >>not
>> >> do that - all it can do is take a stream of possibilities and accept
>>or
>> >> reject them.  Optimizers may do magic things in certain cases but, in
>> >> general, it is a case of generating all possibilities which may may
>>be a
>> >> huge amount of work, and reducing it to a few results.  That's why
>> >>FILTER
>> >> regex can be very expensive.
>> >>
>> >> The lucene (or Solr) index used by text:query only generate the
>>possible
>> >> matches because, being an index, the Lucene engine looks up the query
>> >> string (or some part of) to find a moderate number of possibilites.
>> >>There
>> >> is no "get everything and reduce" effect.
>> >>
>> >>         Andy
>> >>
>> >>
>> >
>> >
>> >--
>> >Andrea Dessi
>>
>>
>>
>>
>>
>
>
>-- 
>Andrea Dessi

Re: SPAM-HIGH: Re: Text Query vs Filter Regex

Posted by Andrea Dessi <an...@gmail.com>.

Hi Rob,

ok thanks for your quickly reply.

in this case then when you say

"REGEX has absolutely no relation to Lucene,"

means maybe "*FILTER ** *has absolutely no relation to Lucene"?





2013/10/7 Rob Vesse <rv...@dotnetrdf.org>

> No, as Andy explained they are two completely different mechanisms
>
> REGEX has absolutely no relation to Lucene, it is a part of standard
> SPARQL and causes the query engine to have to evaluate the regular
> expression for every possible solution returned by the inner portion of
> the query to decided whether to retain that solution or not.
>
> Text querying uses a Lucene index and so can pick out only solutions that
> satisfy some Lucene query and so is substantially more performant.
>
> In the example given your FILTER clause is superfluous since the text
> query ensures that only subjects which satisfy the text query are matched
> so having the FILTER as well is doing unnecessary work over the possible
> solutions produced by the text query.
>
> Rob
>
> On 10/7/13 1:54 PM, "Andrea Dessi" <an...@gmail.com> wrote:
>
> >Thank you Andy for your previously reply.
> >
> >My doubts born for example in this query
> >
> >
> >SELECT DISTINCT ?s
> >
> >WHERE
> >
> >{
> >
> >*?s text:query (dbpprop:type 'Daily newspaper')* .
> >
> >?s dbpprop:type ?v1 .
> >
> >*FILTER ( REGEX(STR(?v1), "Daily newspaper" , "i") )*
> >}
> >
> >
> >Lucene indexes work together (one Lucene query) or not?
> >
> >Thanks again
> >
> >--
> >Andrea.
> >
> >
> >2013/10/5 Andy Seaborne <an...@gmail.com>
> >
> >> On 05/10/13 08:45, Andrea Dessi wrote:
> >>
> >>> Hi,
> >>>
> >>> I've got jena-text working to do Lucene indexing, and I'd like to know
> >>>the
> >>> technical differences  between text:query and filter regex.
> >>>
> >>> Thanks for any help,
> >>>
> >>> Andrea
> >>>
> >>>
> >> Hi Andrea,
> >>
> >> I'm not sure what aspect you mean but
> >>
> >> text:query is a property function - it looks like part of the basic
> >>graph
> >> pattern and it can generate answers, binding variables.  A filter can
> >>not
> >> do that - all it can do is take a stream of possibilities and accept or
> >> reject them.  Optimizers may do magic things in certain cases but, in
> >> general, it is a case of generating all possibilities which may may be a
> >> huge amount of work, and reducing it to a few results.  That's why
> >>FILTER
> >> regex can be very expensive.
> >>
> >> The lucene (or Solr) index used by text:query only generate the possible
> >> matches because, being an index, the Lucene engine looks up the query
> >> string (or some part of) to find a moderate number of possibilites.
> >>There
> >> is no "get everything and reduce" effect.
> >>
> >>         Andy
> >>
> >>
> >
> >
> >--
> >Andrea Dessi
>
>
>
>
>


-- 
Andrea Dessi

Re: Text Query vs Filter Regex

Posted by Andrea Dessi <an...@gmail.com>.

Me too. Very good point!

And I don't why it is not possibile TO use double text:query at the same
time!?
Il 07/ott/2013 17:06 "Rob Vesse" <rv...@dotnetrdf.org> ha scritto:

> Good point, I'm honestly not sure how Lucene would exactly interpret the
> example query string
>
> Rob
>
> On 10/7/13 3:32 PM, "Chris Dollin" <ch...@epimorphics.com> wrote:
>
> >On Monday, October 07, 2013 02:31:35 PM Rob Vesse wrote:
> >
> >> In the example given your FILTER clause is superfluous since the text
> >> query ensures that only subjects which satisfy the text query are
> >>matched
> >> so having the FILTER as well is doing unnecessary work over the possible
> >> solutions produced by the text query.
> >>
> >> Rob
> >
> >I thought the string ('Daily newspaper') fed to text:query was taken
> >as a statement in the Lucene query language, in which, if I am
> >recalling  correctly, it will be taken as "containing 'Daily' or
> >'newspaper'.
> >OR rather than AND or in-sequence. The filter is needed to validate
> >candidates.
> >
> >If I'm wrong I have some Elda documentation to fix ...
> >
> >> >{
> >> >
> >> >*?s text:query (dbpprop:type 'Daily newspaper')* .
> >> >
> >> >?s dbpprop:type ?v1 .
> >> >
> >> >*FILTER ( REGEX(STR(?v1), "Daily newspaper" , "i") )*
> >> >}
> >
> >Chris
> >
> >--
> >"You're down as expendable. You don't get a weapon."    /Dark Lord of
> >Derkholm/
> >
> >Epimorphics Ltd, http://www.epimorphics.com
> >Registered address: Court Lodge, 105 High Street, Portishead, Bristol
> >BS20 6PT
> >Epimorphics Ltd. is a limited company registered in England (number
> >7016688)
> >
>
>
>
>
>

Re: Text Query vs Filter Regex

Posted by Rob Vesse <rv...@dotnetrdf.org>.

Good point, I'm honestly not sure how Lucene would exactly interpret the
example query string

Rob

On 10/7/13 3:32 PM, "Chris Dollin" <ch...@epimorphics.com> wrote:

>On Monday, October 07, 2013 02:31:35 PM Rob Vesse wrote:
>
>> In the example given your FILTER clause is superfluous since the text
>> query ensures that only subjects which satisfy the text query are
>>matched
>> so having the FILTER as well is doing unnecessary work over the possible
>> solutions produced by the text query.
>> 
>> Rob
>
>I thought the string ('Daily newspaper') fed to text:query was taken
>as a statement in the Lucene query language, in which, if I am
>recalling  correctly, it will be taken as "containing 'Daily' or
>'newspaper'.
>OR rather than AND or in-sequence. The filter is needed to validate
>candidates.
>
>If I'm wrong I have some Elda documentation to fix ...
>
>> >{
>> >
>> >*?s text:query (dbpprop:type 'Daily newspaper')* .
>> >
>> >?s dbpprop:type ?v1 .
>> >
>> >*FILTER ( REGEX(STR(?v1), "Daily newspaper" , "i") )*
>> >}
>
>Chris
>
>-- 
>"You're down as expendable. You don't get a weapon."    /Dark Lord of
>Derkholm/
>
>Epimorphics Ltd, http://www.epimorphics.com
>Registered address: Court Lodge, 105 High Street, Portishead, Bristol
>BS20 6PT
>Epimorphics Ltd. is a limited company registered in England (number
>7016688)
>

Re: Re: SPAM-HIGH: Re: Text Query vs Filter Regex

Posted by Chris Dollin <ch...@epimorphics.com>.

On Monday, October 07, 2013 02:31:35 PM Rob Vesse wrote:

> In the example given your FILTER clause is superfluous since the text
> query ensures that only subjects which satisfy the text query are matched
> so having the FILTER as well is doing unnecessary work over the possible
> solutions produced by the text query.
> 
> Rob

I thought the string ('Daily newspaper') fed to text:query was taken
as a statement in the Lucene query language, in which, if I am
recalling  correctly, it will be taken as "containing 'Daily' or 'newspaper'.
OR rather than AND or in-sequence. The filter is needed to validate
candidates.

If I'm wrong I have some Elda documentation to fix ...

> >{
> >
> >*?s text:query (dbpprop:type 'Daily newspaper')* .
> >
> >?s dbpprop:type ?v1 .
> >
> >*FILTER ( REGEX(STR(?v1), "Daily newspaper" , "i") )*
> >}

Chris

-- 
"You're down as expendable. You don't get a weapon."    /Dark Lord of Derkholm/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)

Re: SPAM-HIGH: Re: Text Query vs Filter Regex

Posted by Rob Vesse <rv...@dotnetrdf.org>.

No, as Andy explained they are two completely different mechanisms

REGEX has absolutely no relation to Lucene, it is a part of standard
SPARQL and causes the query engine to have to evaluate the regular
expression for every possible solution returned by the inner portion of
the query to decided whether to retain that solution or not.

Text querying uses a Lucene index and so can pick out only solutions that
satisfy some Lucene query and so is substantially more performant.

In the example given your FILTER clause is superfluous since the text
query ensures that only subjects which satisfy the text query are matched
so having the FILTER as well is doing unnecessary work over the possible
solutions produced by the text query.

Rob

On 10/7/13 1:54 PM, "Andrea Dessi" <an...@gmail.com> wrote:

>Thank you Andy for your previously reply.
>
>My doubts born for example in this query
>
>
>SELECT DISTINCT ?s
>
>WHERE
>
>{
>
>*?s text:query (dbpprop:type 'Daily newspaper')* .
>
>?s dbpprop:type ?v1 .
>
>*FILTER ( REGEX(STR(?v1), "Daily newspaper" , "i") )*
>}
>
>
>Lucene indexes work together (one Lucene query) or not?
>
>Thanks again
>
>--
>Andrea.
>
>
>2013/10/5 Andy Seaborne <an...@gmail.com>
>
>> On 05/10/13 08:45, Andrea Dessi wrote:
>>
>>> Hi,
>>>
>>> I've got jena-text working to do Lucene indexing, and I'd like to know
>>>the
>>> technical differences  between text:query and filter regex.
>>>
>>> Thanks for any help,
>>>
>>> Andrea
>>>
>>>
>> Hi Andrea,
>>
>> I'm not sure what aspect you mean but
>>
>> text:query is a property function - it looks like part of the basic
>>graph
>> pattern and it can generate answers, binding variables.  A filter can
>>not
>> do that - all it can do is take a stream of possibilities and accept or
>> reject them.  Optimizers may do magic things in certain cases but, in
>> general, it is a case of generating all possibilities which may may be a
>> huge amount of work, and reducing it to a few results.  That's why
>>FILTER
>> regex can be very expensive.
>>
>> The lucene (or Solr) index used by text:query only generate the possible
>> matches because, being an index, the Lucene engine looks up the query
>> string (or some part of) to find a moderate number of possibilites.
>>There
>> is no "get everything and reduce" effect.
>>
>>         Andy
>>
>>
>
>
>-- 
>Andrea Dessi

Re: Text Query vs Filter Regex

Posted by Andrea Dessi <an...@gmail.com>.

Thank you Andy for your previously reply.

My doubts born for example in this query


SELECT DISTINCT ?s

WHERE

{

*?s text:query (dbpprop:type 'Daily newspaper')* .

?s dbpprop:type ?v1 .

*FILTER ( REGEX(STR(?v1), "Daily newspaper" , "i") )*
}


Lucene indexes work together (one Lucene query) or not?

Thanks again

--
Andrea.


2013/10/5 Andy Seaborne <an...@gmail.com>

> On 05/10/13 08:45, Andrea Dessi wrote:
>
>> Hi,
>>
>> I've got jena-text working to do Lucene indexing, and I'd like to know the
>> technical differences  between text:query and filter regex.
>>
>> Thanks for any help,
>>
>> Andrea
>>
>>
> Hi Andrea,
>
> I'm not sure what aspect you mean but
>
> text:query is a property function - it looks like part of the basic graph
> pattern and it can generate answers, binding variables.  A filter can not
> do that - all it can do is take a stream of possibilities and accept or
> reject them.  Optimizers may do magic things in certain cases but, in
> general, it is a case of generating all possibilities which may may be a
> huge amount of work, and reducing it to a few results.  That's why FILTER
> regex can be very expensive.
>
> The lucene (or Solr) index used by text:query only generate the possible
> matches because, being an index, the Lucene engine looks up the query
> string (or some part of) to find a moderate number of possibilites. There
> is no "get everything and reduce" effect.
>
>         Andy
>
>


-- 
Andrea Dessi

Re: Text Query vs Filter Regex

Posted by Andy Seaborne <an...@gmail.com>.

On 05/10/13 08:45, Andrea Dessi wrote:
> Hi,
>
> I've got jena-text working to do Lucene indexing, and I'd like to know the
> technical differences  between text:query and filter regex.
>
> Thanks for any help,
>
> Andrea
>

Hi Andrea,

I'm not sure what aspect you mean but

text:query is a property function - it looks like part of the basic 
graph pattern and it can generate answers, binding variables.  A filter 
can not do that - all it can do is take a stream of possibilities and 
accept or reject them.  Optimizers may do magic things in certain cases 
but, in general, it is a case of generating all possibilities which may 
may be a huge amount of work, and reducing it to a few results.  That's 
why FILTER regex can be very expensive.

The lucene (or Solr) index used by text:query only generate the possible 
matches because, being an index, the Lucene engine looks up the query 
string (or some part of) to find a moderate number of possibilites. 
There is no "get everything and reduce" effect.

	Andy