You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jaydip Lakhatariya <ja...@aspiresoftware.in> on 2014/01/18 09:44:52 UTC

Not Crawling images with web crawler

Hello,

Currently I am working on web crawling and indexing , i am using nutch
2.2.1, elastic search for indexing and cassandra datastore.

I am successfully crawling and indexing web pages, but images and some
other file format not crawls and indexed,

I need to index images in seperate form of elastic search.

Parser only parses web page text content, title etc.

I have made change in suffix-urlfilter.txt, regex-urlfilter.txt for
allowing images but it could not parse the image content.

My requirement is i need to crawl images in seperate field of parse table.

Appreciate any help.

Thank you.


Regards,
Jaydip Lakhatariya

This message contains confidential information and is intended only for user@nutch.apache.org. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately if you have received this e-mail by mistake and delete this e-mail from your system. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Sat, 18 Jan 2014 14:14:52 +0530

Aspire Software Solutions 10/A Dalal, New Vikasgruh Road, Paldi, Ahmedabad, India.