You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Stefan Neufeind (JIRA)" <ji...@apache.org> on 2006/10/01 16:02:20 UTC

[jira] Created: (NUTCH-377) Add possibility to search for multiple values

Add possibility to search for multiple values
---------------------------------------------

                 Key: NUTCH-377
                 URL: http://issues.apache.org/jira/browse/NUTCH-377
             Project: Nutch
          Issue Type: Improvement
          Components: searcher
            Reporter: Stefan Neufeind


Searches with boolean operators (AND or OR) are not (yet) possible. All search-items are always searched with AND.

But it would be nice to have the possibility to allow multiple values for a certain field. Maybe that could done using a separator?

As an example you might want to search for:

someword    site:www.example.org|www.apache.org

Which (to my understand) would allow to search for one or more words with a restriction to those two sites. It would prevent having to implement AND and OR fully (maybe even including brackets) but would allow to cover a few often used cases imho.

Easy/hard to do? To my understanding Lucene itself allows AND/OR-searches. So might basically be a problem of string-parsing and query-building towards Lucene?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (NUTCH-377) Add possibility to search for multiple values

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-377?page=comments#action_12439016 ] 
            
Otis Gospodnetic commented on NUTCH-377:
----------------------------------------

You'd need to modify ./src/java/org/apache/nutch/analysis/NutchAnalysis.jj and regenerate the .java files that produces.


> Add possibility to search for multiple values
> ---------------------------------------------
>
>                 Key: NUTCH-377
>                 URL: http://issues.apache.org/jira/browse/NUTCH-377
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>            Reporter: Stefan Neufeind
>
> Searches with boolean operators (AND or OR) are not (yet) possible. All search-items are always searched with AND.
> But it would be nice to have the possibility to allow multiple values for a certain field. Maybe that could done using a separator?
> As an example you might want to search for:
> someword    site:www.example.org|www.apache.org
> Which (to my understand) would allow to search for one or more words with a restriction to those two sites. It would prevent having to implement AND and OR fully (maybe even including brackets) but would allow to cover a few often used cases imho.
> Easy/hard to do? To my understanding Lucene itself allows AND/OR-searches. So might basically be a problem of string-parsing and query-building towards Lucene?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (NUTCH-377) Add possibility to search for multiple values

Posted by "Stefan Neufeind (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-377?page=comments#action_12439018 ] 
            
Stefan Neufeind commented on NUTCH-377:
---------------------------------------

Hmm, I'm not too sure I understand how to do that. There is one part which adds prohibited or required phrases but ...

To my understanding isn't the above example parsed "as is" into one string for the whole "site:...|..." ? If yes, could the split be done where evaluating the site-command maybe? Had a look at query-site - but there doesn't seem to be much code over there ...

What is a good syntax that the nutch-community could agree on? And could you maybe wrap up an initial patch for that?

> Add possibility to search for multiple values
> ---------------------------------------------
>
>                 Key: NUTCH-377
>                 URL: http://issues.apache.org/jira/browse/NUTCH-377
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>            Reporter: Stefan Neufeind
>
> Searches with boolean operators (AND or OR) are not (yet) possible. All search-items are always searched with AND.
> But it would be nice to have the possibility to allow multiple values for a certain field. Maybe that could done using a separator?
> As an example you might want to search for:
> someword    site:www.example.org|www.apache.org
> Which (to my understand) would allow to search for one or more words with a restriction to those two sites. It would prevent having to implement AND and OR fully (maybe even including brackets) but would allow to cover a few often used cases imho.
> Easy/hard to do? To my understanding Lucene itself allows AND/OR-searches. So might basically be a problem of string-parsing and query-building towards Lucene?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira