You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Noah Torp-Smith <no...@dbc.dk.INVALID> on 2022/05/02 07:44:25 UTC
Possible issue with nesting and the pf parameter
Hello,
This is the first time I reach out in this forum, so I apologize in advance if this is a known issue or if I have not spent enough time reading carefully through previous posts.
I am working with a solr containing library data. We have a nested structure where a "work" can have child documents that represent different "pids" (or manifestations) for that work. The canonical example is the work "Harry Potter and the Philosopher's Stone" that can have different manifestations/pids representing for example an audiobook version, an e-book version, and of course, the physical book. Some information, like the title and the author is stored at the work level, and other information, like the materialType (book/audiobook/ebook) or the year is stored at the manifestation/pid level in child docs. I hope that makes sense. It is of course simplified, but it should convey what we are trying to do.
I can provide the full schema of our solr if necessary, but there's a lot of info in there that I am not sure would convey much information. If need be, I will be happy to provide it, though. But I thought I'd try and describe a simplified version of the issue I am struggling with. There's a Danish author called Hans Scherfig and I want to search for physical books by him. I issue this query to our solr. As you can see, I have enabled debugging at the `query` level.
```json
{
"query": "(scherfig)+{!parent which='doc_type:work' v='pid.material_type:(\"bog\")'}",
"filter": [
"doc_type:work"
],
"fields": "work.workid work.title, [child childFilter='pid.material_type:(\"bog\")']",
"offset": 0,
"limit": 1,
"params": {
"defType": "edismax",
"qf": [
"work.creator",
"work.title",
"pid.material_type"
],
"pf": "work.creator",
"sort": "score desc",
"debug": "query"
}
}
```
We send this to the /query endpoint of solr, like this (the core is called simple-search):
```
curl -H "Content-Type: application/json" "http://search-solr/solr/simple-search/query" -d @scherfig-filter-test.json
```
I am using the `parent which` construction, documented here, for example: https://solr.apache.org/guide/8_2/other-parsers.html (we are on solr 8.10.1). Looking at the debug output, I see this:
```
(work.creator:\"scherfig parent which doc_type:work v pid.material_type\")
```
which worries me slightly. It looks like "parent which" is part of what solr is looking for in the work.creator field?
The "interesting" bit is that, if I remove the line with `"pf":"work.creator"`, then that part of the debug output is no longer there. Is there an issue with `pf` here? Or am I formatting my query wrongly?
Thanks in advance for any insight you can provide.
Best regards,
/Noah
--
Noah Torp-Smith (nots@dbc.dk)
Re: Possible issue with nesting and the pf parameter
Posted by Michael Gibney <mi...@michaelgibney.net>.
I know I've noticed this as well -- that the `pf` parsing is naive with
respect to more complex query syntax. I'm curious what others might have to
say about this; if nobody else weighs in perhaps it might be a question for
the dev@solr list.
Regardless of the above, I'd advise against the kind of implicit "mixed
query parsing" that you have currently in your `q` param. Assuming that the
`!parent` qparser does not affect scoring, I wonder if you'd be better off
placing it on its own in an `fq` -- i.e.: `fq={!parent [...]}`. It's also
worth noting that SOLR-11501 [1] should change the parsing of this kind of
nested query syntax as of version 7.2 (subject to `luceneMatchVersion`) --
so you'd do well to switch approaches regardless.
If you still want to bundle these as a single query, I'd recommend
explicitly combining in a boolean query, e.g.:
defType=lucene&q={!boolean should='{!edismax v=$qq}'
filter=$myParentFilter}&qq=(scherfig)&myParentFilter={!parent
which='doc_type:work' v='pid.material_type:(\"bog\")'}
Any of these more explicit alternate approaches should also cause the `pf`
param to properly construct boosting phrase queries (according to the
purpose of the `pf` param).
[1] https://issues.apache.org/jira/browse/SOLR-11501
On Mon, May 2, 2022 at 3:51 AM Noah Torp-Smith <no...@dbc.dk.invalid> wrote:
> Hello,
>
> This is the first time I reach out in this forum, so I apologize in
> advance if this is a known issue or if I have not spent enough time reading
> carefully through previous posts.
>
> I am working with a solr containing library data. We have a nested
> structure where a "work" can have child documents that represent different
> "pids" (or manifestations) for that work. The canonical example is the work
> "Harry Potter and the Philosopher's Stone" that can have different
> manifestations/pids representing for example an audiobook version, an
> e-book version, and of course, the physical book. Some information, like
> the title and the author is stored at the work level, and other
> information, like the materialType (book/audiobook/ebook) or the year is
> stored at the manifestation/pid level in child docs. I hope that makes
> sense. It is of course simplified, but it should convey what we are trying
> to do.
>
> I can provide the full schema of our solr if necessary, but there's a lot
> of info in there that I am not sure would convey much information. If need
> be, I will be happy to provide it, though. But I thought I'd try and
> describe a simplified version of the issue I am struggling with. There's a
> Danish author called Hans Scherfig and I want to search for physical books
> by him. I issue this query to our solr. As you can see, I have enabled
> debugging at the `query` level.
>
> ```json
> {
> "query": "(scherfig)+{!parent which='doc_type:work'
> v='pid.material_type:(\"bog\")'}",
> "filter": [
> "doc_type:work"
> ],
> "fields": "work.workid work.title, [child
> childFilter='pid.material_type:(\"bog\")']",
> "offset": 0,
> "limit": 1,
> "params": {
> "defType": "edismax",
> "qf": [
> "work.creator",
> "work.title",
> "pid.material_type"
> ],
> "pf": "work.creator",
> "sort": "score desc",
> "debug": "query"
> }
> }
> ```
>
> We send this to the /query endpoint of solr, like this (the core is called
> simple-search):
>
> ```
> curl -H "Content-Type: application/json" "
> http://search-solr/solr/simple-search/query" -d @scherfig-filter-test.json
> ```
>
> I am using the `parent which` construction, documented here, for example:
> https://solr.apache.org/guide/8_2/other-parsers.html (we are on solr
> 8.10.1). Looking at the debug output, I see this:
>
> ```
> (work.creator:\"scherfig parent which doc_type:work v pid.material_type\")
> ```
>
> which worries me slightly. It looks like "parent which" is part of what
> solr is looking for in the work.creator field?
>
> The "interesting" bit is that, if I remove the line with
> `"pf":"work.creator"`, then that part of the debug output is no longer
> there. Is there an issue with `pf` here? Or am I formatting my query
> wrongly?
>
> Thanks in advance for any insight you can provide.
>
> Best regards,
>
> /Noah
>
>
>
> --
>
> Noah Torp-Smith (nots@dbc.dk)
>