You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Keegan <pe...@gmail.com> on 2014/11/07 17:25:01 UTC

Solr exceptions during batch indexing

How are folks handling Solr exceptions that occur during batch indexing?
Solr stops parsing the docs stream when an error occurs (e.g. a doc with a
missing mandatory field), and stops indexing the batch. The bad document is
not identified, so it would be hard for the client to recover by skipping
over it.

Peter

Re: Solr exceptions during batch indexing

Posted by Erick Erickson <er...@gmail.com>.
bq: Just trying to understand what's the challenge in returning the bad doc

Mostly, nobody has done it yet. There's some complication about
async updates, ConcurrentUpdateSolrServer for instance. I suspect
also that one has to write error handling logic in the client anyway
so the motivation is reduced.

And now it would need to handle SolrCloud mode.

All that said, this has bugged me for a long time, but I haven't gotten around
to it. Which says something about the priority I suspect.

FWIW,
Erick

On Sat, Nov 8, 2014 at 2:51 AM, Anurag Sharma <an...@gmail.com> wrote:
> Just trying to understand what's the challenge in returning the bad doc
> id(s)?
> Solr already know which doc(s) failed on update and can return their id(s)
> in response or callback. Can we have JIRA ticket on it if it doesn't exist?
>
> This looks like a common use case and every solr consumer might be writing
> their own versions to handle this issue.
>
> On Sat, Nov 8, 2014 at 1:17 AM, Walter Underwood <wu...@wunderwood.org>
> wrote:
>
>> Right, that is why we batch.
>>
>> When a batch of 1000 fails, drop to a batch size of 1 and start the batch
>> over. Then it can report the exact document with problems.
>>
>> If you want to continue, go back to the bigger batch size. I usually fail
>> the whole batch on one error.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/
>>
>>
>> On Nov 7, 2014, at 11:44 AM, Peter Keegan <pe...@gmail.com> wrote:
>>
>> > I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a
>> single
>> > thread, so it's certainly worth it.
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>> > On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson <er...@gmail.com>
>> > wrote:
>> >
>> >> And Walter has also been around for a _long_ time ;)
>> >>
>> >> (sorry, couldn't resist)....
>> >>
>> >> Erick
>> >>
>> >> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <
>> wunder@wunderwood.org>
>> >> wrote:
>> >>> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
>> >>>
>> >>> It isn’t to hard if the code is structured for it; retry with a batch
>> >> size of 1.
>> >>>
>> >>> wunder
>> >>>
>> >>> On Nov 7, 2014, at 11:01 AM, Erick Erickson <er...@gmail.com>
>> >> wrote:
>> >>>
>> >>>> Yeah, this has been an ongoing issue for a _long_ time. Basically,
>> >>>> you can't. So far, people have essentially written fallback logic to
>> >>>> index the docs of a failing packet one at a time and report it.
>> >>>>
>> >>>> I'd really like better reporting back, but we haven't gotten there
>> yet.
>> >>>>
>> >>>> Best,
>> >>>> Erick
>> >>>>
>> >>>> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <pe...@gmail.com>
>> >> wrote:
>> >>>>> How are folks handling Solr exceptions that occur during batch
>> >> indexing?
>> >>>>> Solr stops parsing the docs stream when an error occurs (e.g. a doc
>> >> with a
>> >>>>> missing mandatory field), and stops indexing the batch. The bad
>> >> document is
>> >>>>> not identified, so it would be hard for the client to recover by
>> >> skipping
>> >>>>> over it.
>> >>>>>
>> >>>>> Peter
>> >>>
>> >>
>>
>>

Re: Solr exceptions during batch indexing

Posted by Anurag Sharma <an...@gmail.com>.
Just trying to understand what's the challenge in returning the bad doc
id(s)?
Solr already know which doc(s) failed on update and can return their id(s)
in response or callback. Can we have JIRA ticket on it if it doesn't exist?

This looks like a common use case and every solr consumer might be writing
their own versions to handle this issue.

On Sat, Nov 8, 2014 at 1:17 AM, Walter Underwood <wu...@wunderwood.org>
wrote:

> Right, that is why we batch.
>
> When a batch of 1000 fails, drop to a batch size of 1 and start the batch
> over. Then it can report the exact document with problems.
>
> If you want to continue, go back to the bigger batch size. I usually fail
> the whole batch on one error.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Nov 7, 2014, at 11:44 AM, Peter Keegan <pe...@gmail.com> wrote:
>
> > I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a
> single
> > thread, so it's certainly worth it.
> >
> > Thanks,
> > Peter
> >
> >
> > On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson <er...@gmail.com>
> > wrote:
> >
> >> And Walter has also been around for a _long_ time ;)
> >>
> >> (sorry, couldn't resist)....
> >>
> >> Erick
> >>
> >> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <
> wunder@wunderwood.org>
> >> wrote:
> >>> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
> >>>
> >>> It isn’t to hard if the code is structured for it; retry with a batch
> >> size of 1.
> >>>
> >>> wunder
> >>>
> >>> On Nov 7, 2014, at 11:01 AM, Erick Erickson <er...@gmail.com>
> >> wrote:
> >>>
> >>>> Yeah, this has been an ongoing issue for a _long_ time. Basically,
> >>>> you can't. So far, people have essentially written fallback logic to
> >>>> index the docs of a failing packet one at a time and report it.
> >>>>
> >>>> I'd really like better reporting back, but we haven't gotten there
> yet.
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <pe...@gmail.com>
> >> wrote:
> >>>>> How are folks handling Solr exceptions that occur during batch
> >> indexing?
> >>>>> Solr stops parsing the docs stream when an error occurs (e.g. a doc
> >> with a
> >>>>> missing mandatory field), and stops indexing the batch. The bad
> >> document is
> >>>>> not identified, so it would be hard for the client to recover by
> >> skipping
> >>>>> over it.
> >>>>>
> >>>>> Peter
> >>>
> >>
>
>

Re: Solr exceptions during batch indexing

Posted by Walter Underwood <wu...@wunderwood.org>.
Right, that is why we batch.

When a batch of 1000 fails, drop to a batch size of 1 and start the batch over. Then it can report the exact document with problems.

If you want to continue, go back to the bigger batch size. I usually fail the whole batch on one error.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/


On Nov 7, 2014, at 11:44 AM, Peter Keegan <pe...@gmail.com> wrote:

> I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single
> thread, so it's certainly worth it.
> 
> Thanks,
> Peter
> 
> 
> On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson <er...@gmail.com>
> wrote:
> 
>> And Walter has also been around for a _long_ time ;)
>> 
>> (sorry, couldn't resist)....
>> 
>> Erick
>> 
>> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <wu...@wunderwood.org>
>> wrote:
>>> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
>>> 
>>> It isn’t to hard if the code is structured for it; retry with a batch
>> size of 1.
>>> 
>>> wunder
>>> 
>>> On Nov 7, 2014, at 11:01 AM, Erick Erickson <er...@gmail.com>
>> wrote:
>>> 
>>>> Yeah, this has been an ongoing issue for a _long_ time. Basically,
>>>> you can't. So far, people have essentially written fallback logic to
>>>> index the docs of a failing packet one at a time and report it.
>>>> 
>>>> I'd really like better reporting back, but we haven't gotten there yet.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <pe...@gmail.com>
>> wrote:
>>>>> How are folks handling Solr exceptions that occur during batch
>> indexing?
>>>>> Solr stops parsing the docs stream when an error occurs (e.g. a doc
>> with a
>>>>> missing mandatory field), and stops indexing the batch. The bad
>> document is
>>>>> not identified, so it would be hard for the client to recover by
>> skipping
>>>>> over it.
>>>>> 
>>>>> Peter
>>> 
>> 


Re: Solr exceptions during batch indexing

Posted by Peter Keegan <pe...@gmail.com>.
I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single
thread, so it's certainly worth it.

Thanks,
Peter


On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson <er...@gmail.com>
wrote:

> And Walter has also been around for a _long_ time ;)
>
> (sorry, couldn't resist)....
>
> Erick
>
> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <wu...@wunderwood.org>
> wrote:
> > Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
> >
> > It isn’t to hard if the code is structured for it; retry with a batch
> size of 1.
> >
> > wunder
> >
> > On Nov 7, 2014, at 11:01 AM, Erick Erickson <er...@gmail.com>
> wrote:
> >
> >> Yeah, this has been an ongoing issue for a _long_ time. Basically,
> >> you can't. So far, people have essentially written fallback logic to
> >> index the docs of a failing packet one at a time and report it.
> >>
> >> I'd really like better reporting back, but we haven't gotten there yet.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <pe...@gmail.com>
> wrote:
> >>> How are folks handling Solr exceptions that occur during batch
> indexing?
> >>> Solr stops parsing the docs stream when an error occurs (e.g. a doc
> with a
> >>> missing mandatory field), and stops indexing the batch. The bad
> document is
> >>> not identified, so it would be hard for the client to recover by
> skipping
> >>> over it.
> >>>
> >>> Peter
> >
>

Re: Solr exceptions during batch indexing

Posted by Erick Erickson <er...@gmail.com>.
And Walter has also been around for a _long_ time ;)

(sorry, couldn't resist)....

Erick

On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
>
> It isn’t to hard if the code is structured for it; retry with a batch size of 1.
>
> wunder
>
> On Nov 7, 2014, at 11:01 AM, Erick Erickson <er...@gmail.com> wrote:
>
>> Yeah, this has been an ongoing issue for a _long_ time. Basically,
>> you can't. So far, people have essentially written fallback logic to
>> index the docs of a failing packet one at a time and report it.
>>
>> I'd really like better reporting back, but we haven't gotten there yet.
>>
>> Best,
>> Erick
>>
>> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <pe...@gmail.com> wrote:
>>> How are folks handling Solr exceptions that occur during batch indexing?
>>> Solr stops parsing the docs stream when an error occurs (e.g. a doc with a
>>> missing mandatory field), and stops indexing the batch. The bad document is
>>> not identified, so it would be hard for the client to recover by skipping
>>> over it.
>>>
>>> Peter
>

Re: Solr exceptions during batch indexing

Posted by Walter Underwood <wu...@wunderwood.org>.
Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.

It isn’t to hard if the code is structured for it; retry with a batch size of 1.

wunder

On Nov 7, 2014, at 11:01 AM, Erick Erickson <er...@gmail.com> wrote:

> Yeah, this has been an ongoing issue for a _long_ time. Basically,
> you can't. So far, people have essentially written fallback logic to
> index the docs of a failing packet one at a time and report it.
> 
> I'd really like better reporting back, but we haven't gotten there yet.
> 
> Best,
> Erick
> 
> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <pe...@gmail.com> wrote:
>> How are folks handling Solr exceptions that occur during batch indexing?
>> Solr stops parsing the docs stream when an error occurs (e.g. a doc with a
>> missing mandatory field), and stops indexing the batch. The bad document is
>> not identified, so it would be hard for the client to recover by skipping
>> over it.
>> 
>> Peter


Re: Solr exceptions during batch indexing

Posted by Erick Erickson <er...@gmail.com>.
Yeah, this has been an ongoing issue for a _long_ time. Basically,
you can't. So far, people have essentially written fallback logic to
index the docs of a failing packet one at a time and report it.

I'd really like better reporting back, but we haven't gotten there yet.

Best,
Erick

On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <pe...@gmail.com> wrote:
> How are folks handling Solr exceptions that occur during batch indexing?
> Solr stops parsing the docs stream when an error occurs (e.g. a doc with a
> missing mandatory field), and stops indexing the batch. The bad document is
> not identified, so it would be hard for the client to recover by skipping
> over it.
>
> Peter