You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "MRIT64 (JIRA)" <ji...@apache.org> on 2009/10/07 22:31:31 UTC

[jira] Created: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?
--------------------------------------------------------------------------------------------

                 Key: LUCENE-1958
                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
             Project: Lucene - Java
          Issue Type: Bug
          Components: contrib/analyzers
    Affects Versions: 2.4.1
         Environment: Windows XP / jdk1.6.0_15
            Reporter: MRIT64
            Priority: Minor


HI

I add two consecutive documents that are indexed with some filters. The last one is ShingleFilter.
ShingleFilter creates a shingle spannnig the two documents, which has no sense in my context.
Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal behaviour, is it possible to change it optionnaly ?

Thanks

MR

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Created: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

Posted by Ted Dunning <te...@gmail.com>.
Bug.

On Wed, Oct 7, 2009 at 1:31 PM, MRIT64 (JIRA) <ji...@apache.org> wrote:

> ShingleFilter creates shingles across two consecutives documents : bug or
> normal behaviour ?
>
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.4.1
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
>
> HI
>
> I add two consecutive documents that are indexed with some filters. The
> last one is ShingleFilter.
> ShingleFilter creates a shingle spannnig the two documents, which has no
> sense in my context.
> Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal
> behaviour, is it possible to change it optionnaly ?
>
> Thanks
>
> MR
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Ted Dunning, CTO
DeepDyve

[jira] Commented: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

Posted by "MRIT64 (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763638#action_12763638 ] 

MRIT64 commented on LUCENE-1958:
--------------------------------

It doesnt happen with Lucene 2.9 (just downloaded).

> ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.4.1
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> HI
> I add two consecutive documents that are indexed with some filters. The last one is ShingleFilter.
> ShingleFilter creates a shingle spannnig the two documents, which has no sense in my context.
> Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal behaviour, is it possible to change it optionnaly ?
> Thanks
> MR

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

Posted by "MRIT64 (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764333#action_12764333 ] 

MRIT64 commented on LUCENE-1958:
--------------------------------

Yes, Ok to mark this issue as resolved

> ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.4.1
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> HI
> I add two consecutive documents that are indexed with some filters. The last one is ShingleFilter.
> ShingleFilter creates a shingle spannnig the two documents, which has no sense in my context.
> Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal behaviour, is it possible to change it optionnaly ?
> Thanks
> MR

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

Posted by "MRIT64 (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764134#action_12764134 ] 

MRIT64 commented on LUCENE-1958:
--------------------------------

- Yes, I use a custom analyser which uses reusableToken

- I dont know if reusableToken is supported or not in this version, but the method next(Token reusableToken)  is proposed on the ShingleFilter  2.4.1 Javadoc (see http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/analysis/shingle/ShingleFilter.html). That's the reason why I have used it and I don't know how it works internally and the is nothing mentionned on the documentation.

Anyway, it doesnt' matter know because the problem doesnt occur with Lucene 2.9.

Regards

> ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.4.1
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> HI
> I add two consecutive documents that are indexed with some filters. The last one is ShingleFilter.
> ShingleFilter creates a shingle spannnig the two documents, which has no sense in my context.
> Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal behaviour, is it possible to change it optionnaly ?
> Thanks
> MR

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763297#action_12763297 ] 

Robert Muir commented on LUCENE-1958:
-------------------------------------

this says affects version: 2.4.1, do you see this behavior with 2.9?

I only ask this because ShingleFilter did not implement reset() until 2.9, so if you are using reusableTokenStream in your analyzer, maybe there is a problem, maybe not.


> ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.4.1
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> HI
> I add two consecutive documents that are indexed with some filters. The last one is ShingleFilter.
> ShingleFilter creates a shingle spannnig the two documents, which has no sense in my context.
> Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal behaviour, is it possible to change it optionnaly ?
> Thanks
> MR

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Closed: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin closed LUCENE-1958.
-------------------------------

    Resolution: Won't Fix

Not a problem in 2.9

> ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.4.1
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> HI
> I add two consecutive documents that are indexed with some filters. The last one is ShingleFilter.
> ShingleFilter creates a shingle spannnig the two documents, which has no sense in my context.
> Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal behaviour, is it possible to change it optionnaly ?
> Thanks
> MR

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763293#action_12763293 ] 

Mark Miller commented on LUCENE-1958:
-------------------------------------

You should ask bug or normal on the email lists before creating a bug report.

> ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.4.1
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> HI
> I add two consecutive documents that are indexed with some filters. The last one is ShingleFilter.
> ShingleFilter creates a shingle spannnig the two documents, which has no sense in my context.
> Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal behaviour, is it possible to change it optionnaly ?
> Thanks
> MR

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763641#action_12763641 ] 

Robert Muir commented on LUCENE-1958:
-------------------------------------

bq. It doesnt happen with Lucene 2.9 (just downloaded). 

Can you tell me if you have made a custom analyzer? If so, does this analyzer implement reusableTokenStream?

If this is the case, its really not a bug, reset() is an optional operation and with Lucene 2.4.1 you can't safely reuse instances of ShingleFilter for this reason, it does not support reuse as of that version.


> ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.4.1
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> HI
> I add two consecutive documents that are indexed with some filters. The last one is ShingleFilter.
> ShingleFilter creates a shingle spannnig the two documents, which has no sense in my context.
> Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal behaviour, is it possible to change it optionnaly ?
> Thanks
> MR

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1958) ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764141#action_12764141 ] 

Robert Muir commented on LUCENE-1958:
-------------------------------------

MRIT64, actually I am not curious about next(reusableToken), but instead whether your Analyzer implements 

{code}
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
{code}

If you were trying to reuse ShingleFilters in 2.4.1 with this technique, I think this would be unsafe. It is safe in 2.9

bq. Anyway, it doesnt' matter know because the problem doesnt occur with Lucene 2.9.

Ok to mark this issue as resolved?


> ShingleFilter creates shingles across two consecutives documents : bug or normal behaviour ?
> --------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1958
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.4.1
>         Environment: Windows XP / jdk1.6.0_15
>            Reporter: MRIT64
>            Priority: Minor
>
> HI
> I add two consecutive documents that are indexed with some filters. The last one is ShingleFilter.
> ShingleFilter creates a shingle spannnig the two documents, which has no sense in my context.
> Is that a bug oris it  ShingleFilter normal behaviour ? If it's normal behaviour, is it possible to change it optionnaly ?
> Thanks
> MR

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org