You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Luca Cavanna (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/02/29 22:07:57 UTC
[jira] [Issue Comment Edited] (SOLR-2719) REGRESSION ReturnFields incorrect parse fields with hyphen - breaks existing "fl=my-field-name" type usecases

    [ https://issues.apache.org/jira/browse/SOLR-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219545#comment-13219545 ] 

Luca Cavanna edited comment on SOLR-2719 at 2/29/12 9:06 PM:
-------------------------------------------------------------

My patch isn't a fix but just a starting point: it adds a ReturnFieldsTest class which tests some of the new fl features. Some tests are of course failing. The biggest problem is the hyphen within the field name, which I guess is widely used. This could be corrected as suggested by Nik, but we have problems with other characters, even if less used within field names.

Solr doesn't validate field names, but now a lot of potential field names can't actually be used within the fl parameter, or even worse they break the query. Some of my test methods are intentionally weird, like the ~idtest or id$test, but those field names are both allowed by Solr. I'm afraid we might have the same problem with sorting since the QueryParsing#parseSort uses the same StrParser#getId method.

The main rule to identify the end of a field name in StrParser#getId seems to be the following:
{code}
if (!Character.isJavaIdentifierPart(ch) && ch != '.')
	break;
{code}
I guess it should be extended, not just with the hyphen, but in my opinion the point here is not just correct the hyphen regression. I think we should introduce consistency between fl and decide which characters Solr should accept within a field name. I mean, if Solr accepts everything, we'll always have this fl problem. What are your thoughts guys?
                
      was (Author: lucacavanna):
    My patch isn't a fix but just a starting point: it adds a ReturnFieldsTest class which tests some of the new fl features. Some tests are of course failing. The biggest problem is the hyphen within the field name, which I guess is widely used. This could be corrected as suggested by Nik, but we have problems with other characters, even if less used within field names.

Solr doesn't validate field names, but now a lot of potential field names can't actually be used within the fl parameter, or even worse they break the query. Some of my tests method are intentionally weird, like the ~idtest or id$test, but those field names are both allowed by Solr. I'm afraid we might have the same problem with sorting since the QueryParsing#parseSort uses the same StrParser#getId method.

The main rule to identify the end of a field name seems to be the following:
{code}
if (!Character.isJavaIdentifierPart(ch) && ch != '.')
	break;
{code}

In my opinion, the point here is not just correct the hyphen regression. I think we should introduce consistency between fl and decide which characters Solr accepts within a field name.

What are your thoughts guys?
                  
> REGRESSION ReturnFields incorrect parse fields with hyphen - breaks existing "fl=my-field-name" type usecases
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2719
>                 URL: https://issues.apache.org/jira/browse/SOLR-2719
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 4.0
>            Reporter: Nik V. Babichev
>            Priority: Blocker
>              Labels: field, fl, query, queryparser
>             Fix For: 4.0
>
>         Attachments: SOLR-2719.patch
>
>
> fl=my-hyphen-field in query params parsed as "my" instead of "my-hyphen-field".
> OAS.search.ReturnFields use method getId() from OAS.search.QueryParsing
> in which check chars "if (!Character.isJavaIdentifierPart(ch) && ch != '.')"
> Hyphen is not JavaIdentifierPart and this check break when first "-" is found.
> This problem solve by passing '-' to check:
> if (!Character.isJavaIdentifierPart(ch) && ch != '.' && ch != '-') break;
> But I don't know how it can affect on whole project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org