You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "samuele.mattiuzzo" <sa...@gmail.com> on 2011/10/17 11:00:53 UTC

Solr indexing plugin: skip single faulty document?

Hi all, as far as i know, when solr finds a faulty document (inside an xml
containing let say 1000 docs) it skips the whole file and the indexing
process exits with exception (am i correct?)

I'm using a custom indexing plugin, and i can trap the exception. Instead of
using "default" values if that exception is raised, i would like to skip the
document raising the error (example: sometimes i try to insert a string
inside a "string" field, but solr exits saying it's expecting a multiValued
field... i guess it's because of some ascii chars within the text, something
like \n or sort...) maybe logging it somewhere, and pass to the next one.
We're indexing millions of them, and we don't care much if we loose 10-20%
of them, so the best solution is skip the single faulty doc and continue
with the rest.

I guess i have to work on the super.processAdd() call, but i don't know
where i can find info about it. Can anybody help me? Is there a book talking
about advanced solr plugin developement i could read?

Thanks!

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3427646.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr indexing plugin: skip single faulty document?

Posted by "samuele.mattiuzzo" <sa...@gmail.com>.
Ok i'll surely check out what i can!

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447537.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr indexing plugin: skip single faulty document?

Posted by Erick Erickson <er...@gmail.com>.
Don't get too excited, I don't know what the state of that patch is
in. It's on my loooong
TODO list to go back and look some more. If you wanted to work on it
and bring it up
to snuff please feel free to do it and submit a modernized patch!

Erick

On Mon, Oct 24, 2011 at 9:44 AM, samuele.mattiuzzo <sa...@gmail.com> wrote:
> Thanks Erik! I'll be reading that issue, it's pretty much everything i need!
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447400.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr indexing plugin: skip single faulty document?

Posted by "samuele.mattiuzzo" <sa...@gmail.com>.
Thanks Erik! I'll be reading that issue, it's pretty much everything i need!

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3447400.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr indexing plugin: skip single faulty document?

Posted by Erick Erickson <er...@gmail.com>.
Some work has been done in this general area, see SOLR-445. That
might give you some pointers....

Best
Erick

On Mon, Oct 17, 2011 at 11:00 AM, samuele.mattiuzzo <sa...@gmail.com> wrote:
> Hi all, as far as i know, when solr finds a faulty document (inside an xml
> containing let say 1000 docs) it skips the whole file and the indexing
> process exits with exception (am i correct?)
>
> I'm using a custom indexing plugin, and i can trap the exception. Instead of
> using "default" values if that exception is raised, i would like to skip the
> document raising the error (example: sometimes i try to insert a string
> inside a "string" field, but solr exits saying it's expecting a multiValued
> field... i guess it's because of some ascii chars within the text, something
> like \n or sort...) maybe logging it somewhere, and pass to the next one.
> We're indexing millions of them, and we don't care much if we loose 10-20%
> of them, so the best solution is skip the single faulty doc and continue
> with the rest.
>
> I guess i have to work on the super.processAdd() call, but i don't know
> where i can find info about it. Can anybody help me? Is there a book talking
> about advanced solr plugin developement i could read?
>
> Thanks!
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-plugin-skip-single-faulty-document-tp3427646p3427646.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>