You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jspwiki.apache.org by "Janne Jalkanen (JIRA)" <ji...@apache.org> on 2009/02/11 10:25:02 UTC

[jira] Commented: (JSPWIKI-501) Search Cannot Find Terms With Underscores

    [ https://issues.apache.org/jira/browse/JSPWIKI-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672564#action_12672564 ] 

Janne Jalkanen commented on JSPWIKI-501:
----------------------------------------

This is certainly the right way :-).

A quick googling suggests that it's because the standard Tokenizer treats "_" as a space, so "Page_WithUnderscore" is tokenized into two words: "Page" and "WithUnderscore".

This works fairly well with the bold markup, but for page titles it's obviously not a good thing.

Any ideas?

> Search Cannot Find Terms With Underscores
> -----------------------------------------
>
>                 Key: JSPWIKI-501
>                 URL: https://issues.apache.org/jira/browse/JSPWIKI-501
>             Project: JSPWiki
>          Issue Type: Bug
>          Components: Core & storage
>    Affects Versions: 2.8.1
>         Environment: Red Hat Linux
>            Reporter: Stefan Bohn
>            Priority: Minor
>
> http://www.jspwiki.org/wiki/BugSearchCannotFindTermsWithUnderscores
> I want to re-open this bug.
> We have many pages with a underscore in the pagename. These pages could not be found with Lucene, when the underscore is entered in the searchbox.
> Example:
> Pagename: Page_WithUnderscore
> Searchstring: Page* - search is ok
> Searchstring Page_* page is not listed in the search result
> Is this a bug, or must I change the search string, because of Lucene?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re : [jira] Commented: (JSPWIKI-501) Search Cannot Find Terms With Underscores

Posted by Dupriez Christophe <ch...@yahoo.fr>.
I had the same issue and I usually patch the soft I use the following way:

public final class MyTokenizer extends CharTokenizer
{
 /**
     * Construct a new LowerCaseTokenizer.
     */
    public DSTokenizer(Reader in)
    {
        super(in);
    }

    protected char normalize(char c)
    {
        return Character.toLowerCase(c);
    }

   /**
    * Collects only letters, digits and "_"
    */
    protected boolean isTokenChar(char c)
    {
        return (c=='_') || Character.isLetterOrDigit(c); // '_' is accepted to link two words together
    }
}

Cheers,

Christophe

--- En date de : Mer 11.2.09, Janne Jalkanen (JIRA) <ji...@apache.org> a écrit :

> De: Janne Jalkanen (JIRA) <ji...@apache.org>
> Objet: [jira] Commented: (JSPWIKI-501) Search Cannot Find Terms With Underscores
> À: jspwiki-dev@incubator.apache.org
> Date: Mercredi 11 Février 2009, 10h25
> [
> https://issues.apache.org/jira/browse/JSPWIKI-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672564#action_12672564
> ] 
> 
> Janne Jalkanen commented on JSPWIKI-501:
> ----------------------------------------
> 
> This is certainly the right way :-).
> 
> A quick googling suggests that it's because the
> standard Tokenizer treats "_" as a space, so
> "Page_WithUnderscore" is tokenized into two words:
> "Page" and "WithUnderscore".
> 
> This works fairly well with the bold markup, but for page
> titles it's obviously not a good thing.
> 
> Any ideas?
> 
> > Search Cannot Find Terms With Underscores
> > -----------------------------------------
> >
> >                 Key: JSPWIKI-501
> >                 URL:
> https://issues.apache.org/jira/browse/JSPWIKI-501
> >             Project: JSPWiki
> >          Issue Type: Bug
> >          Components: Core & storage
> >    Affects Versions: 2.8.1
> >         Environment: Red Hat Linux
> >            Reporter: Stefan Bohn
> >            Priority: Minor
> >
> >
> http://www.jspwiki.org/wiki/BugSearchCannotFindTermsWithUnderscores
> > I want to re-open this bug.
> > We have many pages with a underscore in the pagename.
> These pages could not be found with Lucene, when the
> underscore is entered in the searchbox.
> > Example:
> > Pagename: Page_WithUnderscore
> > Searchstring: Page* - search is ok
> > Searchstring Page_* page is not listed in the search
> result
> > Is this a bug, or must I change the search string,
> because of Lucene?
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue
> online.


      __________________________________________________________________________________________________
Ne pleurez pas si votre Webmail ferme ! Récupérez votre historique sur Yahoo! Mail ! http://fr.docs.yahoo.com/mail/transfert_mails.html