You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Tricia Williams (JIRA)" <ji...@apache.org> on 2008/04/03 18:06:24 UTC

[jira] Created: (SOLR-532) WordDelimiterFilter ignores payloads

WordDelimiterFilter ignores payloads
------------------------------------

                 Key: SOLR-532
                 URL: https://issues.apache.org/jira/browse/SOLR-532
             Project: Solr
          Issue Type: Bug
            Reporter: Tricia Williams
            Priority: Minor


When a WordDelimiterFilter ingests a token stream and creates a new token (newTok) it appears to copy most of the old token attributes, except the payload.  I believe this is a bug.  My solution is for the WordDelimiterFilter to use the Token clone() method to create a carbon copy and then modify the appropriate attributes (offsets and term text). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-532) WordDelimiterFilter ignores payloads

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned SOLR-532:
------------------------------------

    Assignee: Grant Ingersoll

> WordDelimiterFilter ignores payloads
> ------------------------------------
>
>                 Key: SOLR-532
>                 URL: https://issues.apache.org/jira/browse/SOLR-532
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Tricia Williams
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-532-WordDelimiterFilter.patch
>
>
> When a WordDelimiterFilter ingests a token stream and creates a new token (newTok) it appears to copy most of the old token attributes, except the payload.  I believe this is a bug.  My solution is for the WordDelimiterFilter to use the Token clone() method to create a carbon copy and then modify the appropriate attributes (offsets and term text). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-532) WordDelimiterFilter ignores payloads

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved SOLR-532.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4

> WordDelimiterFilter ignores payloads
> ------------------------------------
>
>                 Key: SOLR-532
>                 URL: https://issues.apache.org/jira/browse/SOLR-532
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Tricia Williams
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-532-WordDelimiterFilter.patch
>
>
> When a WordDelimiterFilter ingests a token stream and creates a new token (newTok) it appears to copy most of the old token attributes, except the payload.  I believe this is a bug.  My solution is for the WordDelimiterFilter to use the Token clone() method to create a carbon copy and then modify the appropriate attributes (offsets and term text). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-532) WordDelimiterFilter ignores payloads

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641404#action_12641404 ] 

Grant Ingersoll commented on SOLR-532:
--------------------------------------

I consolidated this down to take advantage of Lucene's new clone method:
Index: src/java/org/apache/solr/analysis/WordDelimiterFilter.java
===================================================================
--- src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (revision 706648)
+++ src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (working copy)
@@ -236,11 +236,7 @@
       startOff += start;     
     }
 
-    Token newTok = new Token(startOff,
-            endOff,
-            orig.type());
-    newTok.setTermBuffer(orig.termBuffer(), start, (end - start));
-    return newTok;
+    return (Token)orig.clone(orig.termBuffer(), start, (end - start), startOff, endOff);
   }

I will likely commit today or tomorrow.  Let me know if this works for you, Tricia.  The tests pass for me.

> WordDelimiterFilter ignores payloads
> ------------------------------------
>
>                 Key: SOLR-532
>                 URL: https://issues.apache.org/jira/browse/SOLR-532
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Tricia Williams
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-532-WordDelimiterFilter.patch
>
>
> When a WordDelimiterFilter ingests a token stream and creates a new token (newTok) it appears to copy most of the old token attributes, except the payload.  I believe this is a bug.  My solution is for the WordDelimiterFilter to use the Token clone() method to create a carbon copy and then modify the appropriate attributes (offsets and term text). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-532) WordDelimiterFilter ignores payloads

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641404#action_12641404 ] 

gsingers edited comment on SOLR-532 at 10/21/08 8:32 AM:
----------------------------------------------------------------

I consolidated this down to take advantage of Lucene's new clone method:
{code}
Index: src/java/org/apache/solr/analysis/WordDelimiterFilter.java
===================================================================
--- src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (revision 706648)
+++ src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (working copy)
@@ -236,11 +236,7 @@
       startOff += start;     
     }
 
-    Token newTok = new Token(startOff,
-            endOff,
-            orig.type());
-    newTok.setTermBuffer(orig.termBuffer(), start, (end - start));
-    return newTok;
+    return (Token)orig.clone(orig.termBuffer(), start, (end - start), startOff, endOff);
   }
{code}
I will likely commit today or tomorrow.  Let me know if this works for you, Tricia.  The tests pass for me.

      was (Author: gsingers):
    I consolidated this down to take advantage of Lucene's new clone method:
Index: src/java/org/apache/solr/analysis/WordDelimiterFilter.java
===================================================================
--- src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (revision 706648)
+++ src/java/org/apache/solr/analysis/WordDelimiterFilter.java  (working copy)
@@ -236,11 +236,7 @@
       startOff += start;     
     }
 
-    Token newTok = new Token(startOff,
-            endOff,
-            orig.type());
-    newTok.setTermBuffer(orig.termBuffer(), start, (end - start));
-    return newTok;
+    return (Token)orig.clone(orig.termBuffer(), start, (end - start), startOff, endOff);
   }

I will likely commit today or tomorrow.  Let me know if this works for you, Tricia.  The tests pass for me.
  
> WordDelimiterFilter ignores payloads
> ------------------------------------
>
>                 Key: SOLR-532
>                 URL: https://issues.apache.org/jira/browse/SOLR-532
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Tricia Williams
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-532-WordDelimiterFilter.patch
>
>
> When a WordDelimiterFilter ingests a token stream and creates a new token (newTok) it appears to copy most of the old token attributes, except the payload.  I believe this is a bug.  My solution is for the WordDelimiterFilter to use the Token clone() method to create a carbon copy and then modify the appropriate attributes (offsets and term text). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-532) WordDelimiterFilter ignores payloads

Posted by "Tricia Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641694#action_12641694 ] 

Tricia Williams commented on SOLR-532:
--------------------------------------

Thanks Grant.  That's much cleaner using the new clone method.  It works for me after catching up with the new slf4j logging.  Thanks too for committing it!

> WordDelimiterFilter ignores payloads
> ------------------------------------
>
>                 Key: SOLR-532
>                 URL: https://issues.apache.org/jira/browse/SOLR-532
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Tricia Williams
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-532-WordDelimiterFilter.patch
>
>
> When a WordDelimiterFilter ingests a token stream and creates a new token (newTok) it appears to copy most of the old token attributes, except the payload.  I believe this is a bug.  My solution is for the WordDelimiterFilter to use the Token clone() method to create a carbon copy and then modify the appropriate attributes (offsets and term text). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (SOLR-532) WordDelimiterFilter ignores payloads

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on SOLR-532 started by Grant Ingersoll.

> WordDelimiterFilter ignores payloads
> ------------------------------------
>
>                 Key: SOLR-532
>                 URL: https://issues.apache.org/jira/browse/SOLR-532
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Tricia Williams
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-532-WordDelimiterFilter.patch
>
>
> When a WordDelimiterFilter ingests a token stream and creates a new token (newTok) it appears to copy most of the old token attributes, except the payload.  I believe this is a bug.  My solution is for the WordDelimiterFilter to use the Token clone() method to create a carbon copy and then modify the appropriate attributes (offsets and term text). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-532) WordDelimiterFilter ignores payloads

Posted by "Tricia Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tricia Williams updated SOLR-532:
---------------------------------

    Attachment: SOLR-532-WordDelimiterFilter.patch

Quick fix.  Does this need a unit test to go with it?

> WordDelimiterFilter ignores payloads
> ------------------------------------
>
>                 Key: SOLR-532
>                 URL: https://issues.apache.org/jira/browse/SOLR-532
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Tricia Williams
>            Priority: Minor
>         Attachments: SOLR-532-WordDelimiterFilter.patch
>
>
> When a WordDelimiterFilter ingests a token stream and creates a new token (newTok) it appears to copy most of the old token attributes, except the payload.  I believe this is a bug.  My solution is for the WordDelimiterFilter to use the Token clone() method to create a carbon copy and then modify the appropriate attributes (offsets and term text). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.