You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by karthik085 <ka...@gmail.com> on 2007/11/02 04:19:16 UTC

RE: Restricting query to a domain

Recently, I came across the same problem - these are the interesting points I
found (maybe old news for you now - but, just reporting in case). Also,
another solution at the end of my reply

a) search depends on what analyzer you use for query parsing - I found
org.apache.lucene.analysis.KeywordAnalyzer
org.apache.lucene.analysis.standard.StandardAnalyzer

to be yielding results for a similar query like yours:
+content:"abc" +(site"www.aaa.com" site:"www.bbb.com") 

Maybe nutch is using a different analyzer - if it is something similar to
this analyzer
org.apache.lucene.analysis.SimpleAnalyzer
I don't get any results.

I believe lib-lucene-analyzers plugin might contain different analyzers
which you could use - Not sure as I haven't tried - some expert opinion
could be of help here.
I found this post to be useful to change the analyzers
http://www.nabble.com/forum/ViewPost.jtp?post=1036075&framed=y


b) [This method works in nutch] Alternative way (maybe little inefficent) -
you could use in nutch
<query> -site:"ccc.com" -site:"ddd.com"
these eliminates the results from these sites from the original search
results.

Hope it helps someone.


Bogdan Kecman wrote:
> 
>  
>> > Pay notice that this is a filter, so query like
>> > 
>> >  findme andme site:"www.aaa.com"
>> > 
>> > Will limit resultset to www.aaa.com only but query
>> > 
>> >  site:"www.aaa.com"
>> > 
>> > Is empty query and will not return anything.
>> 
>> Why won't that return anything?
> 
> Well, as I understand it, and must admin I'm no nutch expert (playing with
> it for few weeks) site:"something" is just a query filter meaning it
> filters
> the main query sort of speak, so if you do not have main query, there is
> nothing to be filtered out. As I said, this might be completely untrue,
> but
> this is how I understood it.
> 
>> 
>> And is grouping with "brackets" somehow possible? I know the 
>> thing mentioned below does not work - but would be nice if it 
>> could, wouldn't it?
>> 
>> 	abc && (site:"www.aaa.com" || site:"www.bbb.com")
> 
> Well, the Lucene will allow you to do this, actually, 
> +content:"abc" +(site"www.aaa.com" site:"www.bbb.com") 
> As I remember will do the trick. For some reason this do not work in nutch
> search, there must be a way to make it work (also the beautiful syntax
> like:
> something~ som*ing some?hing )
> Now, some of the experts could remain silent or shed some light into it :)
> I
> spent a lot of time trough archives and wiki getting to the point where I
> can write useful plugins and use the system, although still miss some
> basics
> (like unanswered issue of difference between field and raw-field)
> 
> Bogdan
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Restricting-query-to-a-domain-tf1807265.html#a13541691
Sent from the Nutch - User mailing list archive at Nabble.com.