You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "a.toraby" <al...@gmail.com> on 2012/09/03 17:41:15 UTC

How can I use topic parameter of dmozparser?

Hi
I want to filter some urls in dmoz collection. I only need those sites that
have the string "Top/World/Persian" in their topic field. Actually I want to
filter only Persian sites. But when I run the following command:
nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -topic
Top/World/Persian
It returns nothing! I also tried this one:
nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -topic Top -topic
World -topic Persian
But it returns a lot of undesirable urls!
How could I retrieve only Persian sites that exist in the topic
World/Persian??
Thanks for any help.



--
View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-use-topic-parameter-of-dmozparser-tp4005078.html
Sent from the Nutch - User mailing list archive at Nabble.com.