You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Jack Tang <hi...@gmail.com> on 2005/06/15 12:27:03 UTC
Nutch Query
Hi All
I have customized some query filters in passed two weeks.
And one question here. As I mentioned in my previous email, the target
website is made up of two part: text-only and graphic. My goal is to
tag the index with "textonly" and "graphic". Here I show two
approaches to reach the goal. Both query filters implements
FieldQueryFilter.
1. Tagging the content(parse.getText()) with the name("textonly" and
"graphic"), so the query string should look like:
textonly:queryString
or
graphic:queryString
2. Adding another field whose name is "version", and the available
values are "textonly" and "graphic". So the query string looks like:
version:textonly queryString
or
version:graphic queryString
In my eyes, if queryString is the same, the search result should be
the same. Right? But in my test, the later query filter show all
textonly/graphic pages and ignore the queryString. The first one seems
OK.
So, can someone explain it more?
BTW:
In Query.class
Query: version:graphic file
Parsed: version:graphic file
Translated: +version:graphic +(url:file^4.0 anchor:file^2.0 content:file)
Regards
/Jack
Re: Search bug with short words
Posted by "yoursoft@freemail.hu" <yo...@freemail.hu>.
Dear Matthias,
Where is the stopword list? This is not the same with the
common-terms.utf8 file. In my common-terms file there are 'be' and 'ki' too.
Regards,
Ferenc
Matthias Jaekle wortte:
> Hi Ferenc,
> had you have a look on your stopword list?
> Matthias
>
> yoursoft@freemail.hu schrieb:
>
>> Dear Developers!
>>
>> There is a bug:
>> E.g. If you find word: 'it', the result is 0. Try it e.g. on
>> objectssearch.com.
>>
>> If I find some Hungarian words in my engine, some works and some
>> doesn't works. E.g. in my documents there are some: 'be-ki'.
>> If I find on 'ki', there are the results. If I find 'be', there are 0
>> results.
>>
>> Best Regards,
>> Ferenc
>>
>
Re: Search bug with short words
Posted by Matthias Jaekle <ja...@eventax.de>.
Hi Ferenc,
had you have a look on your stopword list?
Matthias
yoursoft@freemail.hu schrieb:
> Dear Developers!
>
> There is a bug:
> E.g. If you find word: 'it', the result is 0. Try it e.g. on
> objectssearch.com.
>
> If I find some Hungarian words in my engine, some works and some doesn't
> works. E.g. in my documents there are some: 'be-ki'.
> If I find on 'ki', there are the results. If I find 'be', there are 0
> results.
>
> Best Regards,
> Ferenc
>
--
http://www.eventax.com - eventax GmbH
http://www.umkreisfinder.de - Die Suchmaschine für Lokales und Events
Re: [Nutch-dev] Re: Search bug with short words
Posted by "yoursoft@freemail.hu" <yo...@freemail.hu>.
Dear List!
I found that there is a hardcoded stop words list in the NutchAnalysis.java.
I think this is not Language independent. Not posible to put out into
conf files? And load it only when the bean is created?
Regards,
Ferenc
Stefan Groschupf wrotte:
> That are common english stop words and may be nutch removes them.
> Check if you can find this words in your index using luke.
>
> Stefan
> Am 17.06.2005 um 09:46 schrieb yoursoft@freemail.hu:
>
>> Dear Developers!
>>
>> There is a bug:
>> E.g. If you find word: 'it', the result is 0. Try it e.g. on
>> objectssearch.com.
>>
>> If I find some Hungarian words in my engine, some works and some
>> doesn't works. E.g. in my documents there are some: 'be-ki'.
>> If I find on 'ki', there are the results. If I find 'be', there are
>> 0 results.
>>
>> Best Regards,
>> Ferenc
>>
>>
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
> _______________________________________________
> Nutch-developers mailing list
> Nutch-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>
>
Re: Search bug with short words
Posted by Stefan Groschupf <sg...@media-style.com>.
That are common english stop words and may be nutch removes them.
Check if you can find this words in your index using luke.
Stefan
Am 17.06.2005 um 09:46 schrieb yoursoft@freemail.hu:
> Dear Developers!
>
> There is a bug:
> E.g. If you find word: 'it', the result is 0. Try it e.g. on
> objectssearch.com.
>
> If I find some Hungarian words in my engine, some works and some
> doesn't works. E.g. in my documents there are some: 'be-ki'.
> If I find on 'ki', there are the results. If I find 'be', there are
> 0 results.
>
> Best Regards,
> Ferenc
>
>
Search bug with short words
Posted by "yoursoft@freemail.hu" <yo...@freemail.hu>.
Dear Developers!
There is a bug:
E.g. If you find word: 'it', the result is 0. Try it e.g. on
objectssearch.com.
If I find some Hungarian words in my engine, some works and some doesn't
works. E.g. in my documents there are some: 'be-ki'.
If I find on 'ki', there are the results. If I find 'be', there are 0
results.
Best Regards,
Ferenc