You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Alex Vigdor (JIRA)" <ji...@apache.org> on 2009/08/19 02:43:15 UTC

[jira] Created: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

FastVectorHighlighter truncates words at beginning and end of fragments
-----------------------------------------------------------------------

                 Key: LUCENE-1824
                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/*
         Environment: any
            Reporter: Alex Vigdor
            Priority: Minor
             Fix For: 2.9
         Attachments: LUCENE-1824.patch

FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744815#action_12744815 ] 

Michael Busch commented on LUCENE-1824:
---------------------------------------

Could you add a small junit that tests this (i.e. fails without the patch), Alex?

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745048#action_12745048 ] 

Alex Vigdor commented on LUCENE-1824:
-------------------------------------

The failing test was due to an extra whitespace character at the beginning of the output, which I think is insignificant.

However, I appreciate that the whitespace approach will not work for CJK, so I have moved my modifications to a new WhitespaceFragmentBuilder class and associated test class.  The updated patch now contains just these two new classes and no modifications to other code.

I don't want to hold up the release of 2.9, but anyone attempting to use the SimpleFragmentsBuilder with latin languages, or others that use whitespace to delimit words, will be dismayed by the rampant truncation!

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Vigdor updated LUCENE-1824:
--------------------------------

    Attachment:     (was: LUCENE-1824-test.patch)

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Vigdor updated LUCENE-1824:
--------------------------------

    Attachment:     (was: LUCENE-1824-test.patch)

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744830#action_12744830 ] 

Alex Vigdor edited comment on LUCENE-1824 at 8/18/09 7:19 PM:
--------------------------------------------------------------

Actually a couple of the existing tests specifically check for the faulty behavior - the attached patch for SimpleFragmentsBuilderTest tests for the non-truncating behavior implemented in the patch.  For example, where the prior test looked for "ssing <b>speed</b>", it now looks for " processing <b>speed</b>".


      was (Author: alexvigdor):
    Actually a couple of the existing tests specifically check for the faulty behavior - the following modification of SimpleFragmentsBuilderTest tests for the non-truncating behavior implemented in the patch.  A couple other tests in this file fail now (with the strings of "a b b a" etc.), but they don't seem serious to me (i.e. I would think the tests could be changed to test for the results they get from the patch).

Index: contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
===================================================================
--- contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java	(revision 805400)
+++ contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java	(working copy)
@@ -90,7 +90,7 @@
     SimpleFragListBuilder sflb = new SimpleFragListBuilder();
     FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
     SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
-    assertEquals( " most <b>search engines</b> use only one of these methods. Even the <b>search engines</b> that says they can use t",
+    assertEquals( " most <b>search engines</b> use only one of these methods. Even the <b>search engines</b> that says they can use the ",
         sfb.createFragment( reader, 0, F, ffl ) );
   }
 
@@ -103,7 +103,7 @@
     SimpleFragListBuilder sflb = new SimpleFragListBuilder();
     FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
     SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
-    assertEquals( "ssing <b>speed</b>, the", sfb.createFragment( reader, 0, F, ffl ) );
+    assertEquals( " processing <b>speed</b>, the", sfb.createFragment( reader, 0, F, ffl ) );
   }
   
   public void testUnstoredField() throws Exception {

  
> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Vigdor updated LUCENE-1824:
--------------------------------

    Attachment: LUCENE-1824-test.patch

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744872#action_12744872 ] 

Koji Sekiguchi commented on LUCENE-1824:
----------------------------------------

Alex,
I don't have much time to look into this patch but I understand the requirement.
Why I named *Simple* FragmentsBuilder because it simply makes fragments without concern for boundaries. I designed FragmentsBuilder can be pluggable, so that any other FragmentsBuilders can be written/contributed, e.g. WhitespaceFragmentsBuilder, SentenceAwareFragmentsBuilder, etc. I think adding new FragmentsBuilders (plus test cases) is better than modifying existing FragmentsBuilders. Don't forget that some languages (CJK) don't use period or whitespace for boundaries of words/sentences when you write new FragmentsBuilders.


> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Vigdor updated LUCENE-1824:
--------------------------------

    Attachment:     (was: LUCENE-1824.patch)

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Vigdor updated LUCENE-1824:
--------------------------------

    Attachment: LUCENE-1824.patch
                LUCENE-1824-test.patch

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744830#action_12744830 ] 

Alex Vigdor commented on LUCENE-1824:
-------------------------------------

Actually a couple of the existing tests specifically check for the faulty behavior - the following modification of SimpleFragmentsBuilderTest tests for the non-truncating behavior implemented in the patch.  A couple other tests in this file fail now (with the strings of "a b b a" etc.), but they don't seem serious to me (i.e. I would think the tests could be changed to test for the results they get from the patch).

Index: contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
===================================================================
--- contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java	(revision 805400)
+++ contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java	(working copy)
@@ -90,7 +90,7 @@
     SimpleFragListBuilder sflb = new SimpleFragListBuilder();
     FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
     SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
-    assertEquals( " most <b>search engines</b> use only one of these methods. Even the <b>search engines</b> that says they can use t",
+    assertEquals( " most <b>search engines</b> use only one of these methods. Even the <b>search engines</b> that says they can use the ",
         sfb.createFragment( reader, 0, F, ffl ) );
   }
 
@@ -103,7 +103,7 @@
     SimpleFragListBuilder sflb = new SimpleFragListBuilder();
     FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
     SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
-    assertEquals( "ssing <b>speed</b>, the", sfb.createFragment( reader, 0, F, ffl ) );
+    assertEquals( " processing <b>speed</b>, the", sfb.createFragment( reader, 0, F, ffl ) );
   }
   
   public void testUnstoredField() throws Exception {


> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Vigdor updated LUCENE-1824:
--------------------------------

    Attachment:     (was: LUCENE-1824.patch)

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Vigdor updated LUCENE-1824:
--------------------------------

    Attachment: LUCENE-1824.patch

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744830#action_12744830 ] 

Alex Vigdor edited comment on LUCENE-1824 at 8/18/09 7:25 PM:
--------------------------------------------------------------

Actually a couple of the existing tests specifically check for the faulty behavior - the attached patch for SimpleFragmentsBuilderTest tests for the non-truncating behavior implemented in the patch.  For example, where the prior test looked for "ssing <b>speed</b>", it now looks for " processing <b>speed</b>".  While fixing the tests I noticed an off-by-1 error in the orginal patch, which I have updated.


      was (Author: alexvigdor):
    Actually a couple of the existing tests specifically check for the faulty behavior - the attached patch for SimpleFragmentsBuilderTest tests for the non-truncating behavior implemented in the patch.  For example, where the prior test looked for "ssing <b>speed</b>", it now looks for " processing <b>speed</b>".

  
> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Alex Vigdor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Vigdor updated LUCENE-1824:
--------------------------------

    Attachment: LUCENE-1824.patch

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744866#action_12744866 ] 

Michael Busch commented on LUCENE-1824:
---------------------------------------

ScoreOrderFragmentsBuilderTest.test3Frags() fails after applying your patches.

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-1824:
----------------------------------

    Fix Version/s:     (was: 2.9)
                   3.1

I think we should exclude this from 2.9, as were getting very close to the code freeze.

With the current approach tests are failing, and I agree with Koji that new functionality like this can and should be added as a new FragmentBuilder.

> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated.  This makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first.  This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org