You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tod <li...@gmail.com> on 2011/10/21 19:48:51 UTC

can solr follow and index hyperlinks embedded in rich text documents (pdf, doc, etc)?

I have a feeling the answer is "no" since you wouldn't want to start 
indexing a large volume of office documents containing hyperlinks that 
could lead all over the internet.  But, since there might be a use case 
like "a customer just asked me if it could be done?", I thought I would 
make sure.


Thanks - Tod

Re: can solr follow and index hyperlinks embedded in rich text documents (pdf, doc, etc)?

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
Hi Tod, Solr doesn't actually crawl, If you need to feed Solr with that kind
of information you'll have to use some crawling tool or implement that
yourself.

Regards,

Tomás

On Fri, Oct 21, 2011 at 2:48 PM, Tod <li...@gmail.com> wrote:

> I have a feeling the answer is "no" since you wouldn't want to start
> indexing a large volume of office documents containing hyperlinks that could
> lead all over the internet.  But, since there might be a use case like "a
> customer just asked me if it could be done?", I thought I would make sure.
>
>
> Thanks - Tod
>