You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by hadi <md...@gmail.com> on 2011/09/23 22:53:21 UTC

indexing FTP documet with solrj

I want to index some document with solrj API's but the URL of theses
documents is FTP, 
 How to set username and password for FTP acount in solrj 

in solrj API there is CommonsHttpSolrServer method but i do not find any
method for FTP configuration 

 thanks 

--
View this message in context: http://lucene.472066.n3.nabble.com/indexing-FTP-documet-with-solrj-tp3363025p3363025.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing FTP documet with solrj

Posted by Marc SCHNEIDER <ma...@gmail.com>.
Hello,

To crawl the document you can use Apache Tika before sending the content to
Solr (via Solrj).

Regards,
Marc.

On Wed, Oct 5, 2011 at 1:16 AM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : I want to index some document with solrj API's but the URL of theses
> : documents is FTP,
> :  How to set username and password for FTP acount in solrj
> :
> : in solrj API there is CommonsHttpSolrServer method but i do not find any
> : method for FTP configuration
>
> it sounds like you are getting ocnfused between using SolrJ to talk to
> *solr* And using SolrJ to index arbitrary URLs.
>
> SolrJ doesn't do any crawling -- if you have data that you want to index
> then your client code needs to decide what that data is (and where it
> comes from) and feed that data to SolrJ as "documents" to index.  the only
> URLs that SolrJ knows about are:
>  * the URL for tlaking to Solr
>  * "strings" that SolrJ passes to solr as document fields that may just so
>   happen to be URLs (SolrJ doesn't know/care)
>
> -Hoss
>

Re: indexing FTP documet with solrj

Posted by Chris Hostetter <ho...@fucit.org>.
: I want to index some document with solrj API's but the URL of theses
: documents is FTP, 
:  How to set username and password for FTP acount in solrj 
: 
: in solrj API there is CommonsHttpSolrServer method but i do not find any
: method for FTP configuration 

it sounds like you are getting ocnfused between using SolrJ to talk to 
*solr* And using SolrJ to index arbitrary URLs.

SolrJ doesn't do any crawling -- if you have data that you want to index 
then your client code needs to decide what that data is (and where it 
comes from) and feed that data to SolrJ as "documents" to index.  the only 
URLs that SolrJ knows about are:
 * the URL for tlaking to Solr
 * "strings" that SolrJ passes to solr as document fields that may just so 
   happen to be URLs (SolrJ doesn't know/care)

-Hoss