You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matteo Grolla <ma...@gmail.com> on 2015/09/28 23:27:38 UTC

error reporting during indexing

Hi,
    if I need fine grained error reporting I use Http Solr server and send
1 doc per request using the add method.
I report errors on exceptions of the add method,
I'm using autocommit so I'm not seing errors related to commit.
Am I loosing some errors? Is there a better way?

Thanks

Re: error reporting during indexing

Posted by Erick Erickson <er...@gmail.com>.
bq: If there is a problem writing the segment, a permission error,

Highly doubtful that this'll occur. When an IndexWriter is opened,
the first thing that's (usually) done is write to the lock file to keep
other Solr's from writing. That should fail right off the bat, far before
any docs are actually indexed, perhaps with a lock obtain timeout
error message.

And, for that matter, when Solr first starts up it creates the ./data,
./data/index and (perhaps) the ./data/tlog directories and any
permissions errors should be hit then.

I suppose there's some "interesting" stuff possible if someone
out there changing directory permissions while Solr is running, in
which case you should find then and then slap them silly ;)

IOW I've certainly seen Solr _fail_ to start when it can't access the
right directories, but not fail part way through.

Best,
Erick

On Tue, Sep 29, 2015 at 1:55 AM, Alessandro Benedetti
<be...@gmail.com> wrote:
> Hi Matteo, at this point I would suggest you this reading by Erick:
>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> If i am not wrong when the document is indexed ( simplifying) :
> 1) The document is added to the current segment in memory
> 2) When a soft commit happens, we get the visibility ( no flush happens to
> the disk, but the document is searchable)
> 3) When the hard commit happens, we get the durability, we truncate the
> segment in memory and we flush it to the disk, so if a problem happens
> here, you  should see an error Solr side, but this does not imply that the
> document indexing is failed, actually only the last flush has failed.
>
> Related point 3, I am not sure what are the Solr reaction to this fail.
> I should investigate.
>
> Cheers
>
>
>
> 2015-09-29 8:53 GMT+01:00 Matteo Grolla <ma...@gmail.com>:
>
>> Hi Erik,
>>     it's a curiosity question. When I add a document it's buffered by Solr
>> and can (apparently is) be parsed to verify it matches the schema. But it's
>> not written to a segment file until a commit is issued. If there is a
>> problem writing the segment, a permission error, isn't this a case where I
>> would report everything OK when in fact documents are not there?
>>
>> thanks
>>
>> 2015-09-29 2:12 GMT+02:00 Erick Erickson <er...@gmail.com>:
>>
>> > You shouldn't be losing errors with HttpSolrServer. Are you
>> > seeing evidence that you are or is this mostly a curiosity question?
>> >
>> > Do not it's better to batch up docs, your throughput will increase
>> > a LOT. That said, when you do batch (e.g. send 500 docs per update
>> > or whatever) and you get an error back, you're not quite sure what
>> > doc failed. So what people do is retry a failed batch one document
>> > at a time when the batch has errors and rely on Solr overwriting
>> > any docs in the batch that were indexed the first time.
>> >
>> > Best,
>> > Erick
>> >
>> > On Mon, Sep 28, 2015 at 2:27 PM, Matteo Grolla <ma...@gmail.com>
>> > wrote:
>> > > Hi,
>> > >     if I need fine grained error reporting I use Http Solr server and
>> > send
>> > > 1 doc per request using the add method.
>> > > I report errors on exceptions of the add method,
>> > > I'm using autocommit so I'm not seing errors related to commit.
>> > > Am I loosing some errors? Is there a better way?
>> > >
>> > > Thanks
>> >
>>
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England

Re: error reporting during indexing

Posted by Alessandro Benedetti <be...@gmail.com>.
Hi Matteo, at this point I would suggest you this reading by Erick:

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

If i am not wrong when the document is indexed ( simplifying) :
1) The document is added to the current segment in memory
2) When a soft commit happens, we get the visibility ( no flush happens to
the disk, but the document is searchable)
3) When the hard commit happens, we get the durability, we truncate the
segment in memory and we flush it to the disk, so if a problem happens
here, you  should see an error Solr side, but this does not imply that the
document indexing is failed, actually only the last flush has failed.

Related point 3, I am not sure what are the Solr reaction to this fail.
I should investigate.

Cheers



2015-09-29 8:53 GMT+01:00 Matteo Grolla <ma...@gmail.com>:

> Hi Erik,
>     it's a curiosity question. When I add a document it's buffered by Solr
> and can (apparently is) be parsed to verify it matches the schema. But it's
> not written to a segment file until a commit is issued. If there is a
> problem writing the segment, a permission error, isn't this a case where I
> would report everything OK when in fact documents are not there?
>
> thanks
>
> 2015-09-29 2:12 GMT+02:00 Erick Erickson <er...@gmail.com>:
>
> > You shouldn't be losing errors with HttpSolrServer. Are you
> > seeing evidence that you are or is this mostly a curiosity question?
> >
> > Do not it's better to batch up docs, your throughput will increase
> > a LOT. That said, when you do batch (e.g. send 500 docs per update
> > or whatever) and you get an error back, you're not quite sure what
> > doc failed. So what people do is retry a failed batch one document
> > at a time when the batch has errors and rely on Solr overwriting
> > any docs in the batch that were indexed the first time.
> >
> > Best,
> > Erick
> >
> > On Mon, Sep 28, 2015 at 2:27 PM, Matteo Grolla <ma...@gmail.com>
> > wrote:
> > > Hi,
> > >     if I need fine grained error reporting I use Http Solr server and
> > send
> > > 1 doc per request using the add method.
> > > I report errors on exceptions of the add method,
> > > I'm using autocommit so I'm not seing errors related to commit.
> > > Am I loosing some errors? Is there a better way?
> > >
> > > Thanks
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: error reporting during indexing

Posted by Matteo Grolla <ma...@gmail.com>.
Hi Erik,
    it's a curiosity question. When I add a document it's buffered by Solr
and can (apparently is) be parsed to verify it matches the schema. But it's
not written to a segment file until a commit is issued. If there is a
problem writing the segment, a permission error, isn't this a case where I
would report everything OK when in fact documents are not there?

thanks

2015-09-29 2:12 GMT+02:00 Erick Erickson <er...@gmail.com>:

> You shouldn't be losing errors with HttpSolrServer. Are you
> seeing evidence that you are or is this mostly a curiosity question?
>
> Do not it's better to batch up docs, your throughput will increase
> a LOT. That said, when you do batch (e.g. send 500 docs per update
> or whatever) and you get an error back, you're not quite sure what
> doc failed. So what people do is retry a failed batch one document
> at a time when the batch has errors and rely on Solr overwriting
> any docs in the batch that were indexed the first time.
>
> Best,
> Erick
>
> On Mon, Sep 28, 2015 at 2:27 PM, Matteo Grolla <ma...@gmail.com>
> wrote:
> > Hi,
> >     if I need fine grained error reporting I use Http Solr server and
> send
> > 1 doc per request using the add method.
> > I report errors on exceptions of the add method,
> > I'm using autocommit so I'm not seing errors related to commit.
> > Am I loosing some errors? Is there a better way?
> >
> > Thanks
>

Re: error reporting during indexing

Posted by Erick Erickson <er...@gmail.com>.
You shouldn't be losing errors with HttpSolrServer. Are you
seeing evidence that you are or is this mostly a curiosity question?

Do not it's better to batch up docs, your throughput will increase
a LOT. That said, when you do batch (e.g. send 500 docs per update
or whatever) and you get an error back, you're not quite sure what
doc failed. So what people do is retry a failed batch one document
at a time when the batch has errors and rely on Solr overwriting
any docs in the batch that were indexed the first time.

Best,
Erick

On Mon, Sep 28, 2015 at 2:27 PM, Matteo Grolla <ma...@gmail.com> wrote:
> Hi,
>     if I need fine grained error reporting I use Http Solr server and send
> 1 doc per request using the add method.
> I report errors on exceptions of the add method,
> I'm using autocommit so I'm not seing errors related to commit.
> Am I loosing some errors? Is there a better way?
>
> Thanks