You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tod <li...@gmail.com> on 2011/10/21 19:48:51 UTC
can solr follow and index hyperlinks embedded in rich text documents
(pdf, doc, etc)?
I have a feeling the answer is "no" since you wouldn't want to start
indexing a large volume of office documents containing hyperlinks that
could lead all over the internet. But, since there might be a use case
like "a customer just asked me if it could be done?", I thought I would
make sure.
Thanks - Tod
Re: can solr follow and index hyperlinks embedded in rich text
documents (pdf, doc, etc)?
Posted by Tomás Fernández Löbbe <to...@gmail.com>.
Hi Tod, Solr doesn't actually crawl, If you need to feed Solr with that kind
of information you'll have to use some crawling tool or implement that
yourself.
Regards,
Tomás
On Fri, Oct 21, 2011 at 2:48 PM, Tod <li...@gmail.com> wrote:
> I have a feeling the answer is "no" since you wouldn't want to start
> indexing a large volume of office documents containing hyperlinks that could
> lead all over the internet. But, since there might be a use case like "a
> customer just asked me if it could be done?", I thought I would make sure.
>
>
> Thanks - Tod
>