You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Noah Torp-Smith <no...@dbc.dk.INVALID> on 2022/05/02 07:44:25 UTC

Possible issue with nesting and the pf parameter

Hello,

This is the first time I reach out in this forum, so I apologize in advance if this is a known issue or if I have not spent enough time reading carefully through previous posts.

I am working with a solr containing library data. We have a nested structure where a "work" can have child documents that represent different "pids" (or manifestations) for that work. The canonical example is the work "Harry Potter and the Philosopher's Stone" that can have different manifestations/pids representing for example an audiobook version, an e-book version, and of course, the physical book. Some information, like the title and the author is stored at the work level, and other information, like the materialType (book/audiobook/ebook) or the year is stored at the manifestation/pid level in child docs. I hope that makes sense. It is of course simplified, but it should convey what we are trying to do.

I can provide the full schema of our solr if necessary, but there's a lot of info in there that I am not sure would convey much information. If need be, I will be happy to provide it, though. But I thought I'd try and describe a simplified version of the issue I am struggling with. There's a Danish author called Hans Scherfig and I want to search for physical books by him. I issue this query to our solr. As you can see, I have enabled debugging at the `query` level.

```json
{
    "query": "(scherfig)+{!parent which='doc_type:work' v='pid.material_type:(\"bog\")'}",
    "filter": [
        "doc_type:work"
    ],
    "fields": "work.workid work.title, [child childFilter='pid.material_type:(\"bog\")']",
    "offset": 0,
    "limit": 1,
    "params": {
        "defType": "edismax",
        "qf": [
            "work.creator",
            "work.title",
            "pid.material_type"
        ],
        "pf": "work.creator",
        "sort": "score desc",
        "debug": "query"
    }
}
```

We send this to the /query endpoint of solr, like this (the core is called simple-search):

```
curl -H "Content-Type: application/json" "http://search-solr/solr/simple-search/query" -d @scherfig-filter-test.json
```

I am using the `parent which` construction, documented here, for example: https://solr.apache.org/guide/8_2/other-parsers.html (we are on solr 8.10.1). Looking at the debug output, I see this:

```
(work.creator:\"scherfig parent which doc_type:work v pid.material_type\")
```

which worries me slightly. It looks like "parent which" is part of what solr is looking for in the work.creator field?

The "interesting" bit is that, if I remove the line with `"pf":"work.creator"`, then that part of the debug output is no longer there. Is there an issue with `pf` here? Or am I formatting my query wrongly?

Thanks in advance for any insight you can provide.

Best regards,

/Noah



--

Noah Torp-Smith (nots@dbc.dk)

Re: Possible issue with nesting and the pf parameter

Posted by Michael Gibney <mi...@michaelgibney.net>.
I know I've noticed this as well -- that the `pf` parsing is naive with
respect to more complex query syntax. I'm curious what others might have to
say about this; if nobody else weighs in perhaps it might be a question for
the dev@solr list.

Regardless of the above, I'd advise against the kind of implicit "mixed
query parsing" that you have currently in your `q` param. Assuming that the
`!parent` qparser does not affect scoring, I wonder if you'd be better off
placing it on its own in an `fq` -- i.e.: `fq={!parent [...]}`. It's also
worth noting that SOLR-11501 [1] should change the parsing of this kind of
nested query syntax as of version 7.2 (subject to `luceneMatchVersion`) --
so you'd do well to switch approaches regardless.

If you still want to bundle these as a single query, I'd recommend
explicitly combining in a boolean query, e.g.:

defType=lucene&q={!boolean should='{!edismax v=$qq}'
filter=$myParentFilter}&qq=(scherfig)&myParentFilter={!parent
which='doc_type:work' v='pid.material_type:(\"bog\")'}

Any of these more explicit alternate approaches should also cause the `pf`
param to properly construct boosting phrase queries (according to the
purpose of the `pf` param).

[1] https://issues.apache.org/jira/browse/SOLR-11501

On Mon, May 2, 2022 at 3:51 AM Noah Torp-Smith <no...@dbc.dk.invalid> wrote:

> Hello,
>
> This is the first time I reach out in this forum, so I apologize in
> advance if this is a known issue or if I have not spent enough time reading
> carefully through previous posts.
>
> I am working with a solr containing library data. We have a nested
> structure where a "work" can have child documents that represent different
> "pids" (or manifestations) for that work. The canonical example is the work
> "Harry Potter and the Philosopher's Stone" that can have different
> manifestations/pids representing for example an audiobook version, an
> e-book version, and of course, the physical book. Some information, like
> the title and the author is stored at the work level, and other
> information, like the materialType (book/audiobook/ebook) or the year is
> stored at the manifestation/pid level in child docs. I hope that makes
> sense. It is of course simplified, but it should convey what we are trying
> to do.
>
> I can provide the full schema of our solr if necessary, but there's a lot
> of info in there that I am not sure would convey much information. If need
> be, I will be happy to provide it, though. But I thought I'd try and
> describe a simplified version of the issue I am struggling with. There's a
> Danish author called Hans Scherfig and I want to search for physical books
> by him. I issue this query to our solr. As you can see, I have enabled
> debugging at the `query` level.
>
> ```json
> {
>     "query": "(scherfig)+{!parent which='doc_type:work'
> v='pid.material_type:(\"bog\")'}",
>     "filter": [
>         "doc_type:work"
>     ],
>     "fields": "work.workid work.title, [child
> childFilter='pid.material_type:(\"bog\")']",
>     "offset": 0,
>     "limit": 1,
>     "params": {
>         "defType": "edismax",
>         "qf": [
>             "work.creator",
>             "work.title",
>             "pid.material_type"
>         ],
>         "pf": "work.creator",
>         "sort": "score desc",
>         "debug": "query"
>     }
> }
> ```
>
> We send this to the /query endpoint of solr, like this (the core is called
> simple-search):
>
> ```
> curl -H "Content-Type: application/json" "
> http://search-solr/solr/simple-search/query" -d @scherfig-filter-test.json
> ```
>
> I am using the `parent which` construction, documented here, for example:
> https://solr.apache.org/guide/8_2/other-parsers.html (we are on solr
> 8.10.1). Looking at the debug output, I see this:
>
> ```
> (work.creator:\"scherfig parent which doc_type:work v pid.material_type\")
> ```
>
> which worries me slightly. It looks like "parent which" is part of what
> solr is looking for in the work.creator field?
>
> The "interesting" bit is that, if I remove the line with
> `"pf":"work.creator"`, then that part of the debug output is no longer
> there. Is there an issue with `pf` here? Or am I formatting my query
> wrongly?
>
> Thanks in advance for any insight you can provide.
>
> Best regards,
>
> /Noah
>
>
>
> --
>
> Noah Torp-Smith (nots@dbc.dk)
>