You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by victor_emailbox <vi...@yahoo.com> on 2006/08/20 07:19:48 UTC

How to Search in Category?

Hi,
  I want to know how to search query in categories.  e.g. java can mean
coffee or computer language.  Is there a way to put something in query that
distinguish them?  Something like search "java +category:food"?
Many thanks.
-- 
View this message in context: http://www.nabble.com/How-to-Search-in-Category--tf2134560.html#a5890853
Sent from the Nutch - User forum at Nabble.com.


Re: [Nutch-general] How to Search in Category?

Posted by og...@yahoo.com.
Victor, you might want to look at UIMA (IBM's free content analysis monster), it has the ability to disambiguate your search terms.  Here are some good pointers: http://www.simpy.com/links/tag/uima (fresh, I just looked at UIMA earlier today and saved these links)

Otis

----- Original Message ----
From: victor_emailbox <vi...@yahoo.com>
To: nutch-user@lucene.apache.org
Sent: Sunday, August 20, 2006 1:19:48 AM
Subject: [Nutch-general] How to Search in Category?


Hi,
  I want to know how to search query in categories.  e.g. java can mean
coffee or computer language.  Is there a way to put something in query that
distinguish them?  Something like search "java +category:food"?
Many thanks.
-- 
View this message in context: http://www.nabble.com/How-to-Search-in-Category--tf2134560.html#a5890853
Sent from the Nutch - User forum at Nabble.com.


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general




RE: How to Search in Category?

Posted by Iain <ia...@idcl.co.uk>.
Hi,
  I want to know how to search query in categories.  e.g. java can mean
coffee or computer language.  Is there a way to put something in query that
distinguish them?  Something like search "java +category:food"?
[Iain>>] 
[Iain>>] This is really not an easy question to answer.  Or at least it is.
"You can't do it".

Automatically identifying which of many meanings of a word is correct
(disambiguation) is a research project - which is still not generally
solved.

If it were you could change the word before it got added to the index,
'java' -> 'java_food'.  However, there is no clear way to define these
categories in general (see something like Wordnet for how complex this can
get) and you would have to know that 'food' was an option for 'java' in the
search.

If you have a controlled text which only features certain themes, then you
could automatically classify each text and add the classification to a
separate lucene index.  Not easy but doable.  Then you get into trouble if a
text talks about the tendency of programmers to drink coffee.

Iain