You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by cyanean <cy...@gmail.com> on 2007/06/12 08:53:37 UTC

How to index javascript contents

Dear all,

My client uses HTTrack with GDS (Google desktop search). While pages are
fetched much quicker using nutch (kudos to the nutch engine developers), it
doesnt seem to index the entire page like HTTrack/GDS does. As a result, he
claims if he searchs on 'hbx' (a web analytics tool that is developed by
visual science) GDS returns 26 hits and nutch returns none. I found out that
the only places that contain hbx in those documents are all in the
javascript that come with the page.

Is there anyway to get Nutch to index the javascript as a document too? Or
is there any special configuration that I should have?

Thanks!!
-- 
View this message in context: http://www.nabble.com/How-to-index-javascript-contents-tf3905819.html#a11073844
Sent from the Nutch - User mailing list archive at Nabble.com.