You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2011/10/05 01:16:15 UTC
Re: indexing FTP documet with solrj
: I want to index some document with solrj API's but the URL of theses
: documents is FTP,
: How to set username and password for FTP acount in solrj
:
: in solrj API there is CommonsHttpSolrServer method but i do not find any
: method for FTP configuration
it sounds like you are getting ocnfused between using SolrJ to talk to
*solr* And using SolrJ to index arbitrary URLs.
SolrJ doesn't do any crawling -- if you have data that you want to index
then your client code needs to decide what that data is (and where it
comes from) and feed that data to SolrJ as "documents" to index. the only
URLs that SolrJ knows about are:
* the URL for tlaking to Solr
* "strings" that SolrJ passes to solr as document fields that may just so
happen to be URLs (SolrJ doesn't know/care)
-Hoss
Re: indexing FTP documet with solrj
Posted by Marc SCHNEIDER <ma...@gmail.com>.
Hello,
To crawl the document you can use Apache Tika before sending the content to
Solr (via Solrj).
Regards,
Marc.
On Wed, Oct 5, 2011 at 1:16 AM, Chris Hostetter <ho...@fucit.org>wrote:
>
> : I want to index some document with solrj API's but the URL of theses
> : documents is FTP,
> : How to set username and password for FTP acount in solrj
> :
> : in solrj API there is CommonsHttpSolrServer method but i do not find any
> : method for FTP configuration
>
> it sounds like you are getting ocnfused between using SolrJ to talk to
> *solr* And using SolrJ to index arbitrary URLs.
>
> SolrJ doesn't do any crawling -- if you have data that you want to index
> then your client code needs to decide what that data is (and where it
> comes from) and feed that data to SolrJ as "documents" to index. the only
> URLs that SolrJ knows about are:
> * the URL for tlaking to Solr
> * "strings" that SolrJ passes to solr as document fields that may just so
> happen to be URLs (SolrJ doesn't know/care)
>
> -Hoss
>