You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Sathyam Y <sa...@yahoo.com> on 2008/05/07 22:58:39 UTC

Re: Solr Integration/Stemming?

All,
   
  I am trying to integrate PorterStemming with Nutch and was able to successfully follow the changes suggested at http://wiki.apache.org/nutch/Stemming?highlight=%28stemming%29
   
  The search results are working well with stemmed words, but I am having difficulty getting correct summaries. I am using BasicSummarizer and it looks like the summarizer is trying to match non-stemmed query words with stemmed tokens from content. Any ideas how to resolve this issue. Has anyone had any experience working with summaries along with stemming.
   
  Thanks !
  Sathyam

Nick Tkach <nt...@peapod.com> wrote:
  Ah, thank you very much! Yes, that seems to have done the trick. I'd 
made the change when I was patching my copy of nutch-trunk, but hadn't 
realized the changes to nutch-default.xml there didn't get transferred 
when I did an 'ant tar' to build my "distro".

Specifically, I'd forgotten to make the change (as they have on the 
wiki) to my nutch-default.xml, in the value for plugin.includes 
replacing "query-(basic|site|url)" with "query-(stemmer|site|url)".

Howie Wang wrote:
> It sounds like the query parser is not stemming for you. Make sure
> that you activate the new stemming query filter is activated in the
> Nutch directory under your app server. Check the nutch-*.xml files
> under WEB-INF/classes to make sure that your new query filter is
> included.
> 
> Howie
> 
> 
>> Date: Mon, 11 Feb 2008 12:19:59 -0600
>> From: ntkach@gmail.com
>> To: nutch-user@lucene.apache.org
>> Subject: Solr Integration/Stemming?
>>
>> First of all, a question on stemming. We've tried applying the patches from
>> the main wiki ( http://wiki.apache.org/nutch/Stemming ) and that seems to
>> work fine for the most part. We are seeing one kind of strange result
>> though. If we index a series of pages (web crawl of 2 of our sites) and
>> search for "stamp" in them, we get results for pages containing "stamped"
>> and "stamps" as you'd expect. However if you search for "stamped" or
>> "stamps" directly, then you get no results. Does that sound like we have a
>> configuration issue using the stemming patches, or do we need to extend the
>> patches?
>>
>> Second, would we be better off just working on getting Solr & Nutch working
>> together and taking advantage of Solr's built-in stemming?
>>
>> Third, has anyone had any luck with getting Solr working with Nutch? We
>> tried applying the patches from NUTCH-442
>>
>> but get failures from Hadoop when we try to run a job.
> 
> _________________________________________________________________
> Connect and share in new ways with Windows Live.
> http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008

-- 
This email message and any attachments are for the sole use of the intended
recipient(s) and may contain information that is proprietary to Ahold and/or
its subsidiaries ("Ahold") or otherwise confidential or legally privileged.
If you have received this message in error, please notify the sender by
reply, and delete all copies of this message and any attachments. If you
are the intended recipient you may use the information contained in this
message and any files attached to this message only as authorized by Ahold.
Files attached to this message may only be transmitted using secure systems
and appropriate means of encryption, and must be secured using the same
level of password and security protection with which the file was provided
to you. Any unauthorized use, dissemination or disclosure of this message
or its attachments is strictly prohibited.


       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

stemming / summary problem

Posted by Sathyam Y <sa...@yahoo.com>.
   
  I am trying to integrate PorterStemming with Nutch and was able to
 successfully follow the changes suggested at
 http://wiki.apache.org/nutch/Stemming?highlight=%28stemming%29
   
  The search results are working well with stemmed words, but I am
 having difficulty getting correct summaries. I am using BasicSummarizer and
 it looks like the summarizer is trying to match non-stemmed query
 words with stemmed tokens from content. Any ideas how to resolve this
 issue. Has anyone had any experience working with summaries along with
 stemming.
   
  Thanks !
  Sathyam


       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.