You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2018/12/18 17:02:00 UTC

[jira] [Commented] (SOLR-12768) Determine how _nest_path_ should be analyzed to support various use-cases

    [ https://issues.apache.org/jira/browse/SOLR-12768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724260#comment-16724260 ] 

David Smiley commented on SOLR-12768:
-------------------------------------

Simple proposal:
* Use a new FieldType subclass to a simplify upgrades and enable ease of use
* Use one index token instead of path tokenizing at this stage.  This is lighter-weight when a user might not even need/want to query on it.  Instead, queries would use wildcards on it to express relationships.  Some day in the future, someone could make an easy to use query parser and/or query language that would build the appropriate wildcard patterns.

The index analyzer would simply be the indexed equivalent of:
{code:xml}
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <!--remove the # and digit index of array from path toppings#1/ingredients#/ turns to toppings/ingredients/ -->
      <filter class="solr.PatternReplaceFilterFactory" pattern="#\d*" replace="all"/>
{code}
Notice the last pattern is simplified and fixes a bug in the current test that will match all digits instead of only those after a pound.  I wrote a unit test for that fix.

CC [~moshebla]


> Determine how _nest_path_ should be analyzed to support various use-cases
> -------------------------------------------------------------------------
>
>                 Key: SOLR-12768
>                 URL: https://issues.apache.org/jira/browse/SOLR-12768
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Priority: Blocker
>             Fix For: master (8.0)
>
>
> We know we need {{\_nest\_path\_}} in the schema for the new nested documents support, and we loosely know what goes in it.  From a DocValues perspective, we've got it down; though we might tweak it.  From an indexing (text analysis) perspective, we're not quite sure yet, though we've got a test schema, {{schema-nest.xml}} with a decent shot at it.  Ultimately, how we index it will depend on the query/filter use-cases we need to support.  So we'll review some of them here.
> TBD: Not sure if the outcome of this task is just a "decide" or wether we also potentially add a few tests for some of these cases, and/or if we also add a FieldType to make declaring it as easy as a one-liner.  A FieldType would have other benefits too once we're ready to make querying on the path easier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org