You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by vybe3142 <vy...@gmail.com> on 2012/03/20 20:59:53 UTC

Thanks All

Here is the core of the SOLRJ client that ended up accomplishing what I
wanted

        String fileName2 = "C:\\work\\SolrClient\\data\\worldwartwo.txt";
        SolrServer server = new
StreamingUpdateSolrServer("http://localhost:8080/solr/",20,8);
        UpdateRequest req = new UpdateRequest("/update/extract");
        ModifiableSolrParams params = null ;
        params = new ModifiableSolrParams();
        params.add("stream.file", new String[]{fileName2});
        params.set("literal.id", fileName2);
        params.set("captureAttr", "false");


        req.setParams(params);
        server.request(req);
        server.commit();

To get this to work correctly, the following server side config was needed
(I started from a barebones solr config)

1. Add apache-solr-cell-3.5.0.jar to the <solrhost>/lib directory (or
wherever solr can access jars) as this contains the class
ExtractingRequestHandler
2. Add the appropriate handler for /update/extract in the solrconfig.xml
(this uses the ExtractingRequestHandler class).

I'll blog about this later on for the benefit of the community at large

I'm still puzzled that there are no readily available alternatives to using
the Tika based ExtractingRequestHandler in the situation where the input
data is plain UTF-8 text files that SOLR needs to injest and index. I may
need to look into defining a custom Request Handler  if that's the right way
to go.

Thanks again

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3843593.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Thanks All

Posted by Lance Norskog <go...@gmail.com>.

If you build it, they will come!

On Tue, Mar 20, 2012 at 12:59 PM, vybe3142 <vy...@gmail.com> wrote:

> I'm still puzzled that there are no readily available alternatives to using
> the Tika based ExtractingRequestHandler in the situation where the input
> data is plain UTF-8 text files that SOLR needs to injest and index. I may
> need to look into defining a custom Request Handler  if that's the right way
> to go.
>



-- 
Lance Norskog
goksron@gmail.com

Re: Thanks All

Posted by Chris Hostetter <ho...@fucit.org>.

: To get this to work correctly, the following server side config was needed
: (I started from a barebones solr config)

: 1. Add apache-solr-cell-3.5.0.jar to the <solrhost>/lib directory (or
: wherever solr can access jars) as this contains the class
: ExtractingRequestHandler
: 2. Add the appropriate handler for /update/extract in the solrconfig.xml
: (this uses the ExtractingRequestHandler class).

what barebones solr config did you start with?

the example configs that ship with solr have included /update/extract 
since 1.4.0


-Hoss