You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Geoffrey Young <ge...@modperlcookbook.org> on 2008/10/01 13:38:52 UTC

Re: using DataImportHandler instead of POST?


Geoffrey Young wrote:
> 
> Chris Hostetter wrote:
>> : I have a well-formed xml file, suitable for POSTting to solr.  that
>> : works just fine.  it's very large, though, and using curl in production
>> : is so very lame.  is there a very simple config that will let solr just
>> : slurp up the file via the DataImportHandler?  solr already has
>>
>> You don't even need DIH for this, just "enableRemoteStreaming" and use the 
>> stream.file param and you can load the file from local disk...
>>
>> 	http://wiki.apache.org/solr/ContentStream
> 
> this is the solution I think I'm going to go with - it seems to work
> perfectly.

well, with one exception.

I chugg away at 1.5 million records in a single file, but solr never
commits.  specifically, it ignores my <autocommit> settings.  (I can
commit separately at the end, of course :)

but I might be misunderstanding autocommit.  I have it set as the
default solrconfig.xml does, in the updateHandler section (mapped to
UpdateHandler2) but /update is mapped to XmlUpdateRequestHandler.
should I be shuffling some things around?

thanks.

--Geoff

Re: using DataImportHandler instead of POST?

Posted by Geoffrey Young <ge...@modperlcookbook.org>.

Chris Hostetter wrote:
> : I chugg away at 1.5 million records in a single file, but solr never
> : commits.  specifically, it ignores my <autocommit> settings.  (I can
> : commit separately at the end, of course :)
> 
> the way the autocommit settings work is soemthing i always get confused by 
> -- the autocommit logic may not kick in untill the <add> is 
> finished, regardless of how many docs are in it -- but i'm not certain 
> 9and if i'm correct, i'm not sure if that's a bug or a feature)

ok, that makes sense.

fwiw, I tried to break the records into <add> chunks in the same file
but solr complained about multiple root entities.  I knew you couldn't
mix adds and deletes (rats ;) but I figured multiple add blocks would be
ok.  I guess not :)

> 
> this may be a motivating reason to use DIH in your use case even though 
> you've already got it in the XmlUpdateRequestHandler format.

yeah, I'll check.  though I don't know what I'd do with trying to figure
out which records were committed and which weren't...

> 
> : but I might be misunderstanding autocommit.  I have it set as the
> : default solrconfig.xml does, in the updateHandler section (mapped to
> : UpdateHandler2) but /update is mapped to XmlUpdateRequestHandler.
> : should I be shuffling some things around?
> 
> due to some unfortunately naming decisions several years ago an "update 
> Handler" and a "Request handler" that does updates aren't the same thing 
> ... <updateHandler> (which whould always be DirectUpdateHandler2) is the 
> low level internal code that is responsible for actually making the index 
> modiciations -- XmlUpdateRequestHandler (or DataImportHandler) parses the 
> raw input and hands off to DirectUpdateHandler2 to make the changes.

ok, thanks.  I kind of implied that from the wiki, but it was still
confusing, so thanks for the clarification.

--Geoff

Re: using DataImportHandler instead of POST?

Posted by Chris Hostetter <ho...@fucit.org>.
: I chugg away at 1.5 million records in a single file, but solr never
: commits.  specifically, it ignores my <autocommit> settings.  (I can
: commit separately at the end, of course :)

the way the autocommit settings work is soemthing i always get confused by 
-- the autocommit logic may not kick in untill the <add> is 
finished, regardless of how many docs are in it -- but i'm not certain 
9and if i'm correct, i'm not sure if that's a bug or a feature)

this may be a motivating reason to use DIH in your use case even though 
you've already got it in the XmlUpdateRequestHandler format.

: but I might be misunderstanding autocommit.  I have it set as the
: default solrconfig.xml does, in the updateHandler section (mapped to
: UpdateHandler2) but /update is mapped to XmlUpdateRequestHandler.
: should I be shuffling some things around?

due to some unfortunately naming decisions several years ago an "update 
Handler" and a "Request handler" that does updates aren't the same thing 
... <updateHandler> (which whould always be DirectUpdateHandler2) is the 
low level internal code that is responsible for actually making the index 
modiciations -- XmlUpdateRequestHandler (or DataImportHandler) parses the 
raw input and hands off to DirectUpdateHandler2 to make the changes.




-Hoss