You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/02/05 20:28:27 UTC

[jira] Created: (SOLR-1760) convert synonymsfilter to new tokenstream API

convert synonymsfilter to new tokenstream API
---------------------------------------------

                 Key: SOLR-1760
                 URL: https://issues.apache.org/jira/browse/SOLR-1760
             Project: Solr
          Issue Type: Task
          Components: Schema and Analysis
            Reporter: Robert Muir


This is the other non-trival tokenstream to convert to the new API. I looked at this again today, and think I have a design where it will be nice and efficient.

if you have ideas or are already looking at it, please comment!! I havent started coding and we shouldn't duplicate any efforts.

here is my current design:

* add a variable 'maximumContext' to SynonymMap. This is simply the maximum singleMatch.size(), its the maximum number of tokens lookahead that is ever needed.
* save/restoreState/cloning can be minimized by using a stack (fixed array of maximumContext) of references to the SynonymMap submaps. This way we can backtrack efficiently for multiword matches without save/restoreState and less comparisons.
* two queues (can be fixed arrays of maximumContext) are needed still for placing state objects. the first is those that have been evaluated (always empty in the case of !preserveOriginal), and the second is those that havent yet been evaluated, but are queued due to lookahead. 

i plan on coding this up soon, if you have a better idea or have started work, please comment.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.