You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Geoffrey Young <ge...@modperlcookbook.org> on 2008/10/01 13:38:52 UTC
Re: using DataImportHandler instead of POST?
Geoffrey Young wrote:
>
> Chris Hostetter wrote:
>> : I have a well-formed xml file, suitable for POSTting to solr. that
>> : works just fine. it's very large, though, and using curl in production
>> : is so very lame. is there a very simple config that will let solr just
>> : slurp up the file via the DataImportHandler? solr already has
>>
>> You don't even need DIH for this, just "enableRemoteStreaming" and use the
>> stream.file param and you can load the file from local disk...
>>
>> http://wiki.apache.org/solr/ContentStream
>
> this is the solution I think I'm going to go with - it seems to work
> perfectly.
well, with one exception.
I chugg away at 1.5 million records in a single file, but solr never
commits. specifically, it ignores my <autocommit> settings. (I can
commit separately at the end, of course :)
but I might be misunderstanding autocommit. I have it set as the
default solrconfig.xml does, in the updateHandler section (mapped to
UpdateHandler2) but /update is mapped to XmlUpdateRequestHandler.
should I be shuffling some things around?
thanks.
--Geoff
Re: using DataImportHandler instead of POST?
Posted by Geoffrey Young <ge...@modperlcookbook.org>.
Chris Hostetter wrote:
> : I chugg away at 1.5 million records in a single file, but solr never
> : commits. specifically, it ignores my <autocommit> settings. (I can
> : commit separately at the end, of course :)
>
> the way the autocommit settings work is soemthing i always get confused by
> -- the autocommit logic may not kick in untill the <add> is
> finished, regardless of how many docs are in it -- but i'm not certain
> 9and if i'm correct, i'm not sure if that's a bug or a feature)
ok, that makes sense.
fwiw, I tried to break the records into <add> chunks in the same file
but solr complained about multiple root entities. I knew you couldn't
mix adds and deletes (rats ;) but I figured multiple add blocks would be
ok. I guess not :)
>
> this may be a motivating reason to use DIH in your use case even though
> you've already got it in the XmlUpdateRequestHandler format.
yeah, I'll check. though I don't know what I'd do with trying to figure
out which records were committed and which weren't...
>
> : but I might be misunderstanding autocommit. I have it set as the
> : default solrconfig.xml does, in the updateHandler section (mapped to
> : UpdateHandler2) but /update is mapped to XmlUpdateRequestHandler.
> : should I be shuffling some things around?
>
> due to some unfortunately naming decisions several years ago an "update
> Handler" and a "Request handler" that does updates aren't the same thing
> ... <updateHandler> (which whould always be DirectUpdateHandler2) is the
> low level internal code that is responsible for actually making the index
> modiciations -- XmlUpdateRequestHandler (or DataImportHandler) parses the
> raw input and hands off to DirectUpdateHandler2 to make the changes.
ok, thanks. I kind of implied that from the wiki, but it was still
confusing, so thanks for the clarification.
--Geoff
Re: using DataImportHandler instead of POST?
Posted by Chris Hostetter <ho...@fucit.org>.
: I chugg away at 1.5 million records in a single file, but solr never
: commits. specifically, it ignores my <autocommit> settings. (I can
: commit separately at the end, of course :)
the way the autocommit settings work is soemthing i always get confused by
-- the autocommit logic may not kick in untill the <add> is
finished, regardless of how many docs are in it -- but i'm not certain
9and if i'm correct, i'm not sure if that's a bug or a feature)
this may be a motivating reason to use DIH in your use case even though
you've already got it in the XmlUpdateRequestHandler format.
: but I might be misunderstanding autocommit. I have it set as the
: default solrconfig.xml does, in the updateHandler section (mapped to
: UpdateHandler2) but /update is mapped to XmlUpdateRequestHandler.
: should I be shuffling some things around?
due to some unfortunately naming decisions several years ago an "update
Handler" and a "Request handler" that does updates aren't the same thing
... <updateHandler> (which whould always be DirectUpdateHandler2) is the
low level internal code that is responsible for actually making the index
modiciations -- XmlUpdateRequestHandler (or DataImportHandler) parses the
raw input and hands off to DirectUpdateHandler2 to make the changes.
-Hoss