You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2008/07/27 06:48:31 UTC

[jira] Created: (SOLR-663) Allow multiple files for stopwords, protwords and synonyms

Allow multiple files for stopwords, protwords and synonyms
----------------------------------------------------------

                 Key: SOLR-663
                 URL: https://issues.apache.org/jira/browse/SOLR-663
             Project: Solr
          Issue Type: Improvement
          Components: search
    Affects Versions: 1.3
            Reporter: Shalin Shekhar Mangar
            Assignee: Shalin Shekhar Mangar
            Priority: Minor
             Fix For: 1.3


Allow multiple files separated by comma (escaped by backslash) for StopFilterFactory, EnglishPorterFilterFactory, KeepWordFilterFactory and SynonymFilterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-663) Allow multiple files for stopwords, protwords and synonyms

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693871#action_12693871 ] 

Grant Ingersoll commented on SOLR-663:
--------------------------------------

Shalin,

I'm not sure I am following the logic in this patch, specifically around:
{code}

        java.io.File keepWordsFile = new File(wordFiles);
        if (keepWordsFile.exists()) {
          List<String> wlist = loader.getLines(wordFiles);
          words = StopFilter.makeStopSet(
              (String[])wlist.toArray(new String[0]), ignoreCase);
        } else  {
          List<String> files = StrUtils.splitFileNames(wordFiles);
          for (String file : files) {
            List<String> wlist = loader.getLines(file.trim());
            words.addAll(StopFilter.makeStopSet((String[])wlist.toArray(new String[0]), ignoreCase));
          }
{code}

When is the if clause above executed?  Seems like the most likely case is that people are just using a file in solr/conf and it is relative, so the exists() method call will never be true, since the current working directory is likely to be two levels up?

Also, why the need for separate cases anyway?  Isn't the single file case just a degenerate case of the multiple files version?

See also SOLR-1095, where I am working on some fixes to this, but I want to make sure I capture your understanding first.




> Allow multiple files for stopwords, protwords and synonyms
> ----------------------------------------------------------
>
>                 Key: SOLR-663
>                 URL: https://issues.apache.org/jira/browse/SOLR-663
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: SOLR-663.patch
>
>
> Allow multiple files separated by comma (escaped by backslash) for StopFilterFactory, EnglishPorterFilterFactory, KeepWordFilterFactory and SynonymFilterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (SOLR-663) Allow multiple files for stopwords, protwords and synonyms

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on SOLR-663 started by Shalin Shekhar Mangar.

> Allow multiple files for stopwords, protwords and synonyms
> ----------------------------------------------------------
>
>                 Key: SOLR-663
>                 URL: https://issues.apache.org/jira/browse/SOLR-663
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.3
>
>
> Allow multiple files separated by comma (escaped by backslash) for StopFilterFactory, EnglishPorterFilterFactory, KeepWordFilterFactory and SynonymFilterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-663) Allow multiple files for stopwords, protwords and synonyms

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-663:
---------------------------------------

    Attachment: SOLR-663.patch

Added StrUtils#splitFileNames for the split logic. I chose to add another method because the escaping backslash had to be removed from the file names which the existing splitSmart methods didn't provide. Added test for this method in TestUtils.java

Added logic to load multiple files in
* EnglishPorterFilterFactory
* SynonymFilterFactory
* KeepWordFilterFactory
* StopFilterFactory

> Allow multiple files for stopwords, protwords and synonyms
> ----------------------------------------------------------
>
>                 Key: SOLR-663
>                 URL: https://issues.apache.org/jira/browse/SOLR-663
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: SOLR-663.patch
>
>
> Allow multiple files separated by comma (escaped by backslash) for StopFilterFactory, EnglishPorterFilterFactory, KeepWordFilterFactory and SynonymFilterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-663) Allow multiple files for stopwords, protwords and synonyms

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-663.
----------------------------------------

    Resolution: Fixed

Committed revision 680935.

> Allow multiple files for stopwords, protwords and synonyms
> ----------------------------------------------------------
>
>                 Key: SOLR-663
>                 URL: https://issues.apache.org/jira/browse/SOLR-663
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: SOLR-663.patch
>
>
> Allow multiple files separated by comma (escaped by backslash) for StopFilterFactory, EnglishPorterFilterFactory, KeepWordFilterFactory and SynonymFilterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-663) Allow multiple files for stopwords, protwords and synonyms

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693882#action_12693882 ] 

Grant Ingersoll commented on SOLR-663:
--------------------------------------

I have a fix for it along w/ the changes to SOLR-1095, so let's just note it here and on that issue.

> Allow multiple files for stopwords, protwords and synonyms
> ----------------------------------------------------------
>
>                 Key: SOLR-663
>                 URL: https://issues.apache.org/jira/browse/SOLR-663
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: SOLR-663.patch
>
>
> Allow multiple files separated by comma (escaped by backslash) for StopFilterFactory, EnglishPorterFilterFactory, KeepWordFilterFactory and SynonymFilterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-663) Allow multiple files for stopwords, protwords and synonyms

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693879#action_12693879 ] 

Shalin Shekhar Mangar commented on SOLR-663:
--------------------------------------------

bq. Seems like the most likely case is that people are just using a file in solr/conf and it is relative, so the exists() method call will never be true, since the current working directory is likely to be two levels up?

You are right Grant. It is a bug. It will never be executed unless an absolute path is specified. The reason it worked is because the splitSmart took care of the single file relative path case. Do you want me to re-open this issue?

> Allow multiple files for stopwords, protwords and synonyms
> ----------------------------------------------------------
>
>                 Key: SOLR-663
>                 URL: https://issues.apache.org/jira/browse/SOLR-663
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: SOLR-663.patch
>
>
> Allow multiple files separated by comma (escaped by backslash) for StopFilterFactory, EnglishPorterFilterFactory, KeepWordFilterFactory and SynonymFilterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.