You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Tomasz Elendt <to...@gmail.com> on 2023/03/22 16:37:22 UTC

Terms Query Parser: escaping separator character in terms values

Hey, I tried to find how to escape the separator character in term values used in Terms Query Parser but I could find it.

I check the documentation but it's not there:
https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#terms-query-parser

Next, I tried to escape it with "\", or even quoting the value that contains it. But it didn't work.

{!terms field=x}a,"b\,c"

gives me:

TermInSetQuery(x:("b a c")) !!

I tried to find the answer in the test [1] but it wasn't there.
Next, I looked into the implementation [2] and it looks like the values are simply split with no support of any form of escaping/enquoting.

I think that the lack of support for escaping makes this query parser pretty unusable for handling arbitrary input. Am I wrong?

I searched if maybe someone reported it already, but I couldn't find anything in Solr's bug tracker. Should I open an issue for it?


[1] https://github.com/apache/solr/blob/11253f05cfb31f9fb945c831d8889b3db1e607f1/solr/core/src/test/org/apache/solr/search/TestTermsQParserPlugin.java
[2] https://github.com/apache/solr/blob/11253f05cfb31f9fb945c831d8889b3db1e607f1/solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java#L157-L158

Re: Terms Query Parser: escaping separator character in terms values

Posted by Tomasz Elendt <to...@gmail.com>.
If by simplicity you mean the simplicity of implementation then I agree. But it's not simple for the users. 

Choosing the separator not present in the actual values is a much harder task for the end user than escaping/enquoting because escaping/enquoting requires just a simple mapping of values while picking a "non-conflicting" separator requires an extra loop over all the values (and possibly separator candidates, depending on the implementation).

Finally, I believe the queries should not just be interpretable by Solr but also readable by humans. In my opinion, it is better to have the same separator on the terms queries like coma or space and then escape/enquote the problematic values because we humans like consistency.

I'm not sure if it makes to discuss this topic any further on the users' mailing list. I will create a ticket for this issue.

Cheers,
Tomasz

> On 22. Mar 2023, at 19:53, Mikhail Khludnev <mk...@apache.org> wrote:
> 
> I think it was made so for sake of simplicity. That's why it has separator
> param. Query generator should just choose the right one absent
> across terms.
> 
> On Wed, Mar 22, 2023 at 7:37 PM Tomasz Elendt <to...@gmail.com>
> wrote:
> 
>> Hey, I tried to find how to escape the separator character in term values
>> used in Terms Query Parser but I could find it.
>> 
>> I check the documentation but it's not there:
>> 
>> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#terms-query-parser
>> 
>> Next, I tried to escape it with "\", or even quoting the value that
>> contains it. But it didn't work.
>> 
>> {!terms field=x}a,"b\,c"
>> 
>> gives me:
>> 
>> TermInSetQuery(x:("b a c")) !!
>> 
>> I tried to find the answer in the test [1] but it wasn't there.
>> Next, I looked into the implementation [2] and it looks like the values
>> are simply split with no support of any form of escaping/enquoting.
>> 
>> I think that the lack of support for escaping makes this query parser
>> pretty unusable for handling arbitrary input. Am I wrong?
>> 
>> I searched if maybe someone reported it already, but I couldn't find
>> anything in Solr's bug tracker. Should I open an issue for it?
>> 
>> 
>> [1]
>> https://github.com/apache/solr/blob/11253f05cfb31f9fb945c831d8889b3db1e607f1/solr/core/src/test/org/apache/solr/search/TestTermsQParserPlugin.java
>> [2]
>> https://github.com/apache/solr/blob/11253f05cfb31f9fb945c831d8889b3db1e607f1/solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java#L157-L158
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!


Re: Terms Query Parser: escaping separator character in terms values

Posted by Mikhail Khludnev <mk...@apache.org>.
I think it was made so for sake of simplicity. That's why it has separator
param. Query generator should just choose the right one absent
across terms.

On Wed, Mar 22, 2023 at 7:37 PM Tomasz Elendt <to...@gmail.com>
wrote:

> Hey, I tried to find how to escape the separator character in term values
> used in Terms Query Parser but I could find it.
>
> I check the documentation but it's not there:
>
> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#terms-query-parser
>
> Next, I tried to escape it with "\", or even quoting the value that
> contains it. But it didn't work.
>
> {!terms field=x}a,"b\,c"
>
> gives me:
>
> TermInSetQuery(x:("b a c")) !!
>
> I tried to find the answer in the test [1] but it wasn't there.
> Next, I looked into the implementation [2] and it looks like the values
> are simply split with no support of any form of escaping/enquoting.
>
> I think that the lack of support for escaping makes this query parser
> pretty unusable for handling arbitrary input. Am I wrong?
>
> I searched if maybe someone reported it already, but I couldn't find
> anything in Solr's bug tracker. Should I open an issue for it?
>
>
> [1]
> https://github.com/apache/solr/blob/11253f05cfb31f9fb945c831d8889b3db1e607f1/solr/core/src/test/org/apache/solr/search/TestTermsQParserPlugin.java
> [2]
> https://github.com/apache/solr/blob/11253f05cfb31f9fb945c831d8889b3db1e607f1/solr/core/src/java/org/apache/solr/search/TermsQParserPlugin.java#L157-L158



-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!