You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mikhail Khludnev <mk...@apache.org> on 2019/10/01 20:28:22 UTC

Re: filter in JSON Query DSL

Raised  https://issues.apache.org/jira/browse/SOLR-13808. Thanks, Jochen!

On Mon, Sep 30, 2019 at 4:26 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Jochen, right! Sorry for didn't get your point earlier.  {!bool filter=}
> means Lucene filter, not Solr's one. I suppose {!bool cache=true} flag can
> be easily added, but so far there is no laconic syntax for it. Don't
> hesitate to raise a jira for it.
>
> On Mon, Sep 30, 2019 at 3:18 PM Jochen Barth <ba...@ub.uni-heidelberg.de>
> wrote:
>
>> Here the corrected equivalent query, giving the same results (and still
>> much faster) as JsonQueryDSL:
>>
>> +filter(+((_query_:"{!graph from=parent_ids to=id }(meta_title_txt:muller
>> meta_name_txt:muller meta_subject_txt:muller meta_shelflocator_txt:muller)"
>> _query_:"{!graph from=id to=parent_ids  traversalFilter=\"class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal\"}(meta_title_txt:muller meta_name_txt:muller
>> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
>> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
>> text_abstract_ft:muller text_pdf_ft:muller)") ) +class_s:meta )
>> -_query_:"{!join to=id from=parent_ids}(filter(+((_query_:\"{!graph
>> from=parent_ids to=id }(meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller meta_shelflocator_txt:muller)\" _query_:\"{!graph
>> from=id to=parent_ids  traversalFilter=\\\"class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal\\\"}(meta_title_txt:muller meta_name_txt:muller
>> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
>> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
>> text_abstract_ft:muller text_pdf_ft:muller)\") ) +class_s:meta ))"
>>
>> I am querying the "core" of the above query (the string before
>> »-_query_:"{!join«) for faceting;
>> than the next query is the one above [ like »+(a) -{!join...}(a)« ]
>>
>> Now the second query is running in much less time because the result of
>> term "a" is cached.
>>
>> Caching seems not to work with {boolean=>{must=>"*:*", filter=>...}}.
>>
>> Kind regards,
>> Jochen
>>
>>
>>
>>
>>
>>
>> Am 30.09.19 um 11:02 schrieb Jochen Barth:
>>
>> Ooops... Json is returning 48652 docs, StandardQueryParser 827...
>>
>> Must check this.
>>
>> Sorry,
>>
>> Jochen
>>
>> Am 30.09.19 um 10:39 schrieb Jochen Barth:
>>
>> the *:* in JsonQueryDSL is appearing two times because of two times
>> »filter(...)« in StandardQueryParser.
>>
>>
>>
>> I've did some System.out.println in FastLRU, LRU, LFUCache,
>> here the logging with JsonQueryDSL (solr 8.1.1):
>>
>> Fast-get +*:* #(+(([[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> [[meta_title_txt:muller meta_name_txt:muller text_ocr_ft:muller
>> text_heidicon_ft:muller text_watermark_ft:muller text_catalogue_ft:muller
>> text_index_ft:muller text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
>> +class_s:meta) valLen=null
>>
>> Fast-get DocValuesFieldExistsQuery [field=id] valLen=38
>>
>> Fast-get DocValuesFieldExistsQuery [field=parent_ids] valLen=38
>>
>> Fast-put +*:* #(+(([[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> [[meta_title_txt:muller meta_name_txt:muller text_ocr_ft:muller
>> text_heidicon_ft:muller text_watermark_ft:muller text_catalogue_ft:muller
>> text_index_ft:muller text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
>> +class_s:meta)
>>
>> ...
>>
>> Fast(LRUCache)-get is called only once, but it should have been called 2
>> Times:
>> the first for finding out that this filter is not already cached and the
>> second one for the identical part of the subquery.
>>
>>
>> So now analzying Cache access with StandardQueryParser:
>> Fast-get +(+[[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> +[[meta_title_txt:muller meta_name_txt
>> :muller text_ocr_ft:muller text_heidicon_ft:muller
>> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
>> text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>>  -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false])
>> +class_s:meta valLen=null
>> Fast-get DocValuesFieldExistsQuery [field=id] valLen=null
>> Fast-put DocValuesFieldExistsQuery [field=id]
>> Fast-get DocValuesFieldExistsQuery [field=parent_ids] valLen=null
>> Fast-put DocValuesFieldExistsQuery [field=parent_ids]
>> Fast-put +(+[[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> +[[meta_title_txt:muller meta_name_txt
>> :muller text_ocr_ft:muller text_heidicon_ft:muller
>> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
>> text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>>  -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false])
>> +class_s:meta
>> Fast-get +filter(+(+(+[[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> +[[meta_title_txt:muller met
>> a_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
>> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
>> text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: cl
>> ass_s:meta -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
>> +class_s:meta) valLen=null
>> Fast-get +(+[[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> +[[meta_title_txt:muller meta_name_txt
>> :muller text_ocr_ft:muller text_heidicon_ft:muller
>> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
>> text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: class_s:meta
>>  -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false])
>> +class_s:meta valLen=40
>> Fast-put +filter(+(+(+[[meta_title_txt:muller meta_name_txt:muller
>> meta_subject_txt:muller
>> meta_shelflocator_txt:muller],parent_ids=id][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]
>> +[[meta_title_txt:muller met
>> a_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
>> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
>> text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller],id=parent_ids] [TraversalFilter: cl
>> ass_s:meta -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal][maxDepth=-1][returnRoot=true][onlyLeafNodes=false][useAutn=false]))
>> +class_s:meta)
>>
>> Two times Fast(LRUCache)-get +(+([[... as expected.
>>
>> Kind regards,
>> Jochen
>>
>>
>>
>> Am 30.09.19 um 10:01 schrieb Jochen Barth:
>>
>> Dear Mikhail,
>>
>> maybe I am wrong,
>>
>> but this query (standardQueryParser):
>>
>> +filter(+((+((+(_query_:"{!graph from=parent_ids to=id
>> }(meta_title_txt:muller meta_name_txt:muller meta_subject_txt:muller
>> meta_shelflocator_txt:muller)") +(_query_:"{!graph from=id to=parent_ids
>> traversalFilter=\"class_s:meta -type_s:multivolume_work -type_s:periodical
>> -type_s:issue -type_s:journal\"}(meta_title_txt:muller meta_name_txt:muller
>> text_ocr_ft:muller text_heidicon_ft:muller text_watermark_ft:muller
>> text_catalogue_ft:muller text_index_ft:muller text_tei_ft:muller
>> text_abstract_ft:muller text_pdf_ft:muller)"))))) +(class_s:meta))
>> -(+(_query_:"{!join from=parent_ids
>> to=id}(+filter(+((+((+(_query_:\"{!graph from=parent_ids to=id
>> }(meta_title_txt:muller meta_name_txt:muller meta_subject_txt:muller
>> meta_shelflocator_txt:muller)\") +(_query_:\"{!graph from=id to=parent_ids
>> traversalFilter=\\\"class_s:meta -type_s:multivolume_work
>> -type_s:periodical -type_s:issue -type_s:journal\\\"}(meta_title_txt:muller
>> meta_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
>> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
>> text_tei_ft:muller text_abstract_ft:muller text_pdf_ft:muller)\")))))
>> +(class_s:meta)))"))
>>
>> is as twice as fast as this equivalent one (JsonQueryDSL, "canonical" for
>> stable key order):
>>
>> {"query":{"bool":{"filter":{"bool":{"must":[{"bool":{"should":[{"bool":{"should":[{"graph":{"from":"parent_ids","query":"meta_title_txt:muller
>> meta_name_txt:muller meta_subject_txt:muller
>> meta_shelflocator_txt:muller","to":"id"}},{"graph":{"from":"id","query":"meta_title_txt:muller
>> meta_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
>> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
>> text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller","to":"parent_ids","traversalFilter":"class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal"}}]}}]}},"class_s:meta"]}},"must":"*:*","must_not":[{"join":{"from":"parent_ids","query":{"bool":{"filter":{"bool":{"must":[{"bool":{"should":[{"bool":{"should":[{"graph":{"from":"parent_ids","query":"meta_title_txt:muller
>> meta_name_txt:muller meta_subject_txt:muller
>> meta_shelflocator_txt:muller","to":"id"}},{"graph":{"from":"id","query":"meta_title_txt:muller
>> meta_name_txt:muller text_ocr_ft:muller text_heidicon_ft:muller
>> text_watermark_ft:muller text_catalogue_ft:muller text_index_ft:muller
>> text_tei_ft:muller text_abstract_ft:muller
>> text_pdf_ft:muller","to":"parent_ids","traversalFilter":"class_s:meta
>> -type_s:multivolume_work -type_s:periodical -type_s:issue
>> -type_s:journal"}}]}}]}},"class_s:meta"]}},"must":"*:*"}},"to":"id"}}]}}}
>>
>> Kind regards,
>> Jochen
>>
>>
>>
>> Am 29.09.19 um 21:28 schrieb Mikhail Khludnev:
>>
>> On Sun, Sep 29, 2019 at 8:37 PM Barth, Jochen <Ba...@ub.uni-heidelberg.de> <Ba...@ub.uni-heidelberg.de>
>> wrote:
>>
>>
>> Thanks for your hint. The documentation does not say if the result of
>> filter is cached here (like fq=...) (I could test this).
>>
>>
>> 'filter' implies caching.
>>
>>
>>
>> Is *:* more expensive  (query time) than filter() (*:* not required in
>> StandardQueryParser) ?
>>
>>
>> I either doesn't get the question or it isn't worth to worry about.
>>
>>
>>
>> Kind regrads,
>> Jochen
>>
>> ________________________________________
>> Von: Mikhail Khludnev <mk...@apache.org> <mk...@apache.org>
>> Gesendet: Samstag, 28. September 2019 22:58
>> An: solr-user
>> Betreff: Re: filter in JSON Query DSL
>>
>> Giving
>> https://lucene.apache.org/solr/guide/8_0/other-parsers.html#boolean-query-parser
>> something
>> like
>> '{"query": { "bool": { "must": ["*:*"] , "filter": [
>> "meta_subject_txt:globe" ] } } }'
>> I'm not sure why to put filter under must they should be siblings.
>>
>> On Fri, Sep 27, 2019 at 4:34 PM Jochen Barth <ba...@ub.uni-heidelberg.de> <ba...@ub.uni-heidelberg.de>
>> wrote:
>>
>>
>> Dear reader,
>>
>> this query works as expected:
>>
>> curl -XGET http://localhost:8982/solr/Suchindex/query -d '
>> {"query": { "bool": { "must": "*:*" } },
>> "filter": [ "meta_subject_txt:globe" ] }'
>>
>> this does not (nor without the curley braces around "filter"):
>>
>> curl -XGET http://localhost:8982/solr/Suchindex/query -d '
>> {"query": { "bool": { "must": [ "*:*", { "filter": [
>> "meta_subject_txt:globe" ] } ] } } }'
>>
>> Is "filter" within deeper queries possible?
>>
>> I've got some complex queries with a "kernel" somewhat below the top
>> level...
>>
>> Is "canonical" json important to match query cache entry?
>>
>> Would it help to serialize this queries to standard syntax and then use
>> filter(...)?
>>
>> Kind regards,
>>
>> Jochen
>>
>>
>>
>> --
>> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221
>> 54-2580
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>>
>>
>> --
>> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>>
>>
>> --
>> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>>
>>
>> --
>> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>>
>>
>> --
>> Jochen Barth * Universitätsbibliothek Heidelberg, IT * Telefon 06221 54-2580
>>
>>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev