You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Jack Tang <hi...@gmail.com> on 2005/06/15 12:27:03 UTC

Nutch Query

Hi All

I have customized some query filters in passed two weeks.
And one question here. As I mentioned in my previous email, the target
website is made up of two part: text-only and graphic. My goal is to
tag the index with "textonly" and "graphic". Here I show two
approaches to reach the goal. Both query filters implements
FieldQueryFilter.

1. Tagging the content(parse.getText()) with the name("textonly" and
"graphic"), so the query string should look like:
   textonly:queryString 
or 
   graphic:queryString

2. Adding another field whose name is "version", and the available
values are "textonly" and "graphic". So the query string looks like:
   version:textonly queryString 
or
   version:graphic queryString


In my eyes, if queryString is the same, the search result should be
the same. Right? But in my test, the later query filter show all
textonly/graphic pages and ignore the queryString. The first one seems
OK.

So, can someone explain it more?

BTW:
In Query.class
Query: version:graphic file
Parsed: version:graphic file
Translated: +version:graphic +(url:file^4.0 anchor:file^2.0 content:file)

Regards
/Jack

Re: Search bug with short words

Posted by "yoursoft@freemail.hu" <yo...@freemail.hu>.
Dear Matthias,

Where is the stopword list? This is not the same with the 
common-terms.utf8 file. In my common-terms file there are 'be' and 'ki' too.

Regards,
    Ferenc

Matthias Jaekle wortte:

> Hi Ferenc,
> had you have a look on your stopword list?
> Matthias
>
> yoursoft@freemail.hu schrieb:
>
>> Dear Developers!
>>
>> There is a bug:
>> E.g. If you find word: 'it', the result is 0. Try it e.g. on 
>> objectssearch.com.
>>
>> If I find some Hungarian words in my engine, some works and some 
>> doesn't works. E.g. in my documents there are some:  'be-ki'.
>> If I find on 'ki', there are the results. If I find 'be', there are 0 
>> results.
>>
>> Best Regards,
>>    Ferenc
>>
>


Re: Search bug with short words

Posted by Matthias Jaekle <ja...@eventax.de>.
Hi Ferenc,
had you have a look on your stopword list?
Matthias

yoursoft@freemail.hu schrieb:

> Dear Developers!
> 
> There is a bug:
> E.g. If you find word: 'it', the result is 0. Try it e.g. on 
> objectssearch.com.
> 
> If I find some Hungarian words in my engine, some works and some doesn't 
> works. E.g. in my documents there are some:  'be-ki'.
> If I find on 'ki', there are the results. If I find 'be', there are 0 
> results.
> 
> Best Regards,
>    Ferenc
> 

-- 
http://www.eventax.com - eventax GmbH
http://www.umkreisfinder.de - Die Suchmaschine für Lokales und Events

Re: [Nutch-dev] Re: Search bug with short words

Posted by "yoursoft@freemail.hu" <yo...@freemail.hu>.
Dear List!

I found that there is a hardcoded stop words list in the NutchAnalysis.java.
I think this is not Language independent. Not posible to put out into 
conf files? And load it only when the bean is created?

Regards,
    Ferenc


Stefan Groschupf wrotte:

> That are common english stop words and may be nutch removes them.
> Check if you can find this words in your index using luke.
>
> Stefan
> Am 17.06.2005 um 09:46 schrieb yoursoft@freemail.hu:
>
>> Dear Developers!
>>
>> There is a bug:
>> E.g. If you find word: 'it', the result is 0. Try it e.g. on  
>> objectssearch.com.
>>
>> If I find some Hungarian words in my engine, some works and some  
>> doesn't works. E.g. in my documents there are some:  'be-ki'.
>> If I find on 'ki', there are the results. If I find 'be', there are  
>> 0 results.
>>
>> Best Regards,
>>    Ferenc
>>
>>
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
> _______________________________________________
> Nutch-developers mailing list
> Nutch-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>
>


Re: Search bug with short words

Posted by Stefan Groschupf <sg...@media-style.com>.
That are common english stop words and may be nutch removes them.
Check if you can find this words in your index using luke.

Stefan
Am 17.06.2005 um 09:46 schrieb yoursoft@freemail.hu:

> Dear Developers!
>
> There is a bug:
> E.g. If you find word: 'it', the result is 0. Try it e.g. on  
> objectssearch.com.
>
> If I find some Hungarian words in my engine, some works and some  
> doesn't works. E.g. in my documents there are some:  'be-ki'.
> If I find on 'ki', there are the results. If I find 'be', there are  
> 0 results.
>
> Best Regards,
>    Ferenc
>
>


Search bug with short words

Posted by "yoursoft@freemail.hu" <yo...@freemail.hu>.
Dear Developers!

There is a bug:
E.g. If you find word: 'it', the result is 0. Try it e.g. on 
objectssearch.com.

If I find some Hungarian words in my engine, some works and some doesn't 
works. E.g. in my documents there are some:  'be-ki'.
If I find on 'ki', there are the results. If I find 'be', there are 0 
results.

Best Regards,
    Ferenc