You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Klostermeyer, Michael" <mk...@riskexchange.com> on 2012/07/03 01:40:52 UTC

DIH - unable to ADD individual new documents

I am not able to ADD individual documents via the DIH, but updating works as expected.   The stored procedure that is called within the DIH returns the expected data for the new document, Solr appears to "do its thing", but it never makes it to the Solr server, as evidence that subsequent queries do not return it.

Is there a trick to adding new documents using the DIH?

Mike


RE: DIH - unable to ADD individual new documents

Posted by "Klostermeyer, Michael" <mk...@riskexchange.com>.
I haven't, but will consider those alternatives.  I think right now I'm going to go w/ a hybrid approach, meaning my scheduled and full updates will continue to use the DIH, as those seem to work really well.  My NTR indexing needs will be handled via the JSON processor.  For individual updates this will enable me to utilize an existing ORM infrastructure fairly easily (famous last words, I know).

Thanks for the help, as always.

Mike


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Tuesday, July 03, 2012 2:58 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH - unable to ADD individual new documents

Mike:

Have you considered using one (or several) SolrJ clients to do your indexing? That can give you a finer control granularity than DIH. Or even do your NRT with SolrJ or....

Here's an example program, you can take out the Tika stuff pretty easily..

Best
Erick

On Tue, Jul 3, 2012 at 3:35 PM, Klostermeyer, Michael <mk...@riskexchange.com> wrote:
> Well that little bit of knowledge changes things for me, doesn't it?  I appreciate your response very much.  Without knowing that about the DIH, I attempted to have my DIH handler handle all circumstances, namely the "batch", scheduled job, and immediate/NRT indexing.  Looks like I'm going to have to severely re-think that strategy.
>
> Thanks again...and if anyone has any further input how I can best/most efficiently accomplish all 3 above, please let me know.
>
> Mike
>
>
> -----Original Message-----
> From: Dyer, James [mailto:James.Dyer@ingrambook.com]
> Sent: Tuesday, July 03, 2012 1:12 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DIH - unable to ADD individual new documents
>
> A DIH request handler can only process one "run" at a time.  So if DIH is still in process and you kick off a new DIH "full-import" it will silently ignore the new command.  To have more than one DIH "run" going at a time it is necessary to configure more than one handler instance in sorlconfig.xml.  But even then you'll have to be careful to find one that is free before trying to use it.
>
> Regardless, to do what you want, you'll need to poll the DIH response screen to be sure it isn't running before starting a new one.  It would be simplest to leave it with just 1 DIH handler in solrconfig.xml.  If you've got to have an undefined # of concurrent updates going at once you're best off to not use DIH.
>
> Perhaps a better usage pattern for which DIH was designed for is to put the doc id's in an update table with a timestamp.  Have your queries join to the update table "where timestamp > ${dih.last_index_time}".  Set up crontab or whatever to kick off DIH every so often.  If the prior run is still in progress, it will just skip that run, but because we're dealing with timestamps that get written automatically when DIH finishes, you will only experience a delayed update, not a lost update.  By batching your updates like this you will also have fewer commits, which will be beneficial for performance all around.
>
> Of course if you're trying to do this with the near-real-time functionality batching isn't your answer.  But DIH isn't designed at all to work well with NRT either...
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Klostermeyer, Michael [mailto:mklostermeyer@riskexchange.com]
> Sent: Tuesday, July 03, 2012 1:55 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DIH - unable to ADD individual new documents
>
> Some interesting findings over the last hours, that may change the context of this discussion...
>
> Due to the nature of the application, I need the ability to fire off individual "ADDs" on several different entities at basically the same time.  So, I am making 2-4 Solr ADD calls within 100ms of each other.  While troubleshooting this, I found that if I only made 1 Solr ADD call (ignoring the other entities), it updated the index as expected.  However, when all were fired off, proper indexing did not occur (at least on one of the entities) and no errors were logged.  I am still attempting to figure out if ALL of the 2-4 entities failed to ADD, or if some failed and others succeeded.
>
> So...does this have something to do with Solr's index/message queuing (v3.5)?  How does Solr handle these types of rapid requests, and even more important, how do I get the status of an individual DIH call vs simply the status of the "latest" call at /dataimport?
>
> Mike
>
>
> -----Original Message-----
> From: Gora Mohanty [mailto:gora@mimirtech.com]
> Sent: Monday, July 02, 2012 10:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH - unable to ADD individual new documents
>
> On 3 July 2012 07:54, Klostermeyer, Michael <mk...@riskexchange.com> wrote:
>> I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds.
>
> What does the data-import page report at the end of the full-import, i.e., how many documents were indexed?
> Are there any error messages in the Solr logs? Please share with us your DIH configuration file, and Solr schema.xml.
>
> Regards,
> Gora

Re: DIH - unable to ADD individual new documents

Posted by Erick Erickson <er...@gmail.com>.
Mike:

Have you considered using one (or several) SolrJ clients to do your
indexing? That can give you a finer control granularity than DIH. Or
even do your NRT with SolrJ or....

Here's an example program, you can take out the Tika stuff pretty easily..

Best
Erick

On Tue, Jul 3, 2012 at 3:35 PM, Klostermeyer, Michael
<mk...@riskexchange.com> wrote:
> Well that little bit of knowledge changes things for me, doesn't it?  I appreciate your response very much.  Without knowing that about the DIH, I attempted to have my DIH handler handle all circumstances, namely the "batch", scheduled job, and immediate/NRT indexing.  Looks like I'm going to have to severely re-think that strategy.
>
> Thanks again...and if anyone has any further input how I can best/most efficiently accomplish all 3 above, please let me know.
>
> Mike
>
>
> -----Original Message-----
> From: Dyer, James [mailto:James.Dyer@ingrambook.com]
> Sent: Tuesday, July 03, 2012 1:12 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DIH - unable to ADD individual new documents
>
> A DIH request handler can only process one "run" at a time.  So if DIH is still in process and you kick off a new DIH "full-import" it will silently ignore the new command.  To have more than one DIH "run" going at a time it is necessary to configure more than one handler instance in sorlconfig.xml.  But even then you'll have to be careful to find one that is free before trying to use it.
>
> Regardless, to do what you want, you'll need to poll the DIH response screen to be sure it isn't running before starting a new one.  It would be simplest to leave it with just 1 DIH handler in solrconfig.xml.  If you've got to have an undefined # of concurrent updates going at once you're best off to not use DIH.
>
> Perhaps a better usage pattern for which DIH was designed for is to put the doc id's in an update table with a timestamp.  Have your queries join to the update table "where timestamp > ${dih.last_index_time}".  Set up crontab or whatever to kick off DIH every so often.  If the prior run is still in progress, it will just skip that run, but because we're dealing with timestamps that get written automatically when DIH finishes, you will only experience a delayed update, not a lost update.  By batching your updates like this you will also have fewer commits, which will be beneficial for performance all around.
>
> Of course if you're trying to do this with the near-real-time functionality batching isn't your answer.  But DIH isn't designed at all to work well with NRT either...
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Klostermeyer, Michael [mailto:mklostermeyer@riskexchange.com]
> Sent: Tuesday, July 03, 2012 1:55 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DIH - unable to ADD individual new documents
>
> Some interesting findings over the last hours, that may change the context of this discussion...
>
> Due to the nature of the application, I need the ability to fire off individual "ADDs" on several different entities at basically the same time.  So, I am making 2-4 Solr ADD calls within 100ms of each other.  While troubleshooting this, I found that if I only made 1 Solr ADD call (ignoring the other entities), it updated the index as expected.  However, when all were fired off, proper indexing did not occur (at least on one of the entities) and no errors were logged.  I am still attempting to figure out if ALL of the 2-4 entities failed to ADD, or if some failed and others succeeded.
>
> So...does this have something to do with Solr's index/message queuing (v3.5)?  How does Solr handle these types of rapid requests, and even more important, how do I get the status of an individual DIH call vs simply the status of the "latest" call at /dataimport?
>
> Mike
>
>
> -----Original Message-----
> From: Gora Mohanty [mailto:gora@mimirtech.com]
> Sent: Monday, July 02, 2012 10:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH - unable to ADD individual new documents
>
> On 3 July 2012 07:54, Klostermeyer, Michael <mk...@riskexchange.com> wrote:
>> I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds.
>
> What does the data-import page report at the end of the full-import, i.e., how many documents were indexed?
> Are there any error messages in the Solr logs? Please share with us your DIH configuration file, and Solr schema.xml.
>
> Regards,
> Gora

RE: DIH - unable to ADD individual new documents

Posted by "Klostermeyer, Michael" <mk...@riskexchange.com>.
Well that little bit of knowledge changes things for me, doesn't it?  I appreciate your response very much.  Without knowing that about the DIH, I attempted to have my DIH handler handle all circumstances, namely the "batch", scheduled job, and immediate/NRT indexing.  Looks like I'm going to have to severely re-think that strategy.

Thanks again...and if anyone has any further input how I can best/most efficiently accomplish all 3 above, please let me know.

Mike


-----Original Message-----
From: Dyer, James [mailto:James.Dyer@ingrambook.com] 
Sent: Tuesday, July 03, 2012 1:12 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH - unable to ADD individual new documents

A DIH request handler can only process one "run" at a time.  So if DIH is still in process and you kick off a new DIH "full-import" it will silently ignore the new command.  To have more than one DIH "run" going at a time it is necessary to configure more than one handler instance in sorlconfig.xml.  But even then you'll have to be careful to find one that is free before trying to use it.

Regardless, to do what you want, you'll need to poll the DIH response screen to be sure it isn't running before starting a new one.  It would be simplest to leave it with just 1 DIH handler in solrconfig.xml.  If you've got to have an undefined # of concurrent updates going at once you're best off to not use DIH.

Perhaps a better usage pattern for which DIH was designed for is to put the doc id's in an update table with a timestamp.  Have your queries join to the update table "where timestamp > ${dih.last_index_time}".  Set up crontab or whatever to kick off DIH every so often.  If the prior run is still in progress, it will just skip that run, but because we're dealing with timestamps that get written automatically when DIH finishes, you will only experience a delayed update, not a lost update.  By batching your updates like this you will also have fewer commits, which will be beneficial for performance all around.

Of course if you're trying to do this with the near-real-time functionality batching isn't your answer.  But DIH isn't designed at all to work well with NRT either...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Klostermeyer, Michael [mailto:mklostermeyer@riskexchange.com] 
Sent: Tuesday, July 03, 2012 1:55 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH - unable to ADD individual new documents

Some interesting findings over the last hours, that may change the context of this discussion...

Due to the nature of the application, I need the ability to fire off individual "ADDs" on several different entities at basically the same time.  So, I am making 2-4 Solr ADD calls within 100ms of each other.  While troubleshooting this, I found that if I only made 1 Solr ADD call (ignoring the other entities), it updated the index as expected.  However, when all were fired off, proper indexing did not occur (at least on one of the entities) and no errors were logged.  I am still attempting to figure out if ALL of the 2-4 entities failed to ADD, or if some failed and others succeeded.

So...does this have something to do with Solr's index/message queuing (v3.5)?  How does Solr handle these types of rapid requests, and even more important, how do I get the status of an individual DIH call vs simply the status of the "latest" call at /dataimport?

Mike


-----Original Message-----
From: Gora Mohanty [mailto:gora@mimirtech.com] 
Sent: Monday, July 02, 2012 10:02 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH - unable to ADD individual new documents

On 3 July 2012 07:54, Klostermeyer, Michael <mk...@riskexchange.com> wrote:
> I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds.

What does the data-import page report at the end of the full-import, i.e., how many documents were indexed?
Are there any error messages in the Solr logs? Please share with us your DIH configuration file, and Solr schema.xml.

Regards,
Gora

RE: DIH - unable to ADD individual new documents

Posted by "Dyer, James" <Ja...@ingrambook.com>.
A DIH request handler can only process one "run" at a time.  So if DIH is still in process and you kick off a new DIH "full-import" it will silently ignore the new command.  To have more than one DIH "run" going at a time it is necessary to configure more than one handler instance in sorlconfig.xml.  But even then you'll have to be careful to find one that is free before trying to use it.

Regardless, to do what you want, you'll need to poll the DIH response screen to be sure it isn't running before starting a new one.  It would be simplest to leave it with just 1 DIH handler in solrconfig.xml.  If you've got to have an undefined # of concurrent updates going at once you're best off to not use DIH.

Perhaps a better usage pattern for which DIH was designed for is to put the doc id's in an update table with a timestamp.  Have your queries join to the update table "where timestamp > ${dih.last_index_time}".  Set up crontab or whatever to kick off DIH every so often.  If the prior run is still in progress, it will just skip that run, but because we're dealing with timestamps that get written automatically when DIH finishes, you will only experience a delayed update, not a lost update.  By batching your updates like this you will also have fewer commits, which will be beneficial for performance all around.

Of course if you're trying to do this with the near-real-time functionality batching isn't your answer.  But DIH isn't designed at all to work well with NRT either...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Klostermeyer, Michael [mailto:mklostermeyer@riskexchange.com] 
Sent: Tuesday, July 03, 2012 1:55 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH - unable to ADD individual new documents

Some interesting findings over the last hours, that may change the context of this discussion...

Due to the nature of the application, I need the ability to fire off individual "ADDs" on several different entities at basically the same time.  So, I am making 2-4 Solr ADD calls within 100ms of each other.  While troubleshooting this, I found that if I only made 1 Solr ADD call (ignoring the other entities), it updated the index as expected.  However, when all were fired off, proper indexing did not occur (at least on one of the entities) and no errors were logged.  I am still attempting to figure out if ALL of the 2-4 entities failed to ADD, or if some failed and others succeeded.

So...does this have something to do with Solr's index/message queuing (v3.5)?  How does Solr handle these types of rapid requests, and even more important, how do I get the status of an individual DIH call vs simply the status of the "latest" call at /dataimport?

Mike


-----Original Message-----
From: Gora Mohanty [mailto:gora@mimirtech.com] 
Sent: Monday, July 02, 2012 10:02 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH - unable to ADD individual new documents

On 3 July 2012 07:54, Klostermeyer, Michael <mk...@riskexchange.com> wrote:
> I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds.

What does the data-import page report at the end of the full-import, i.e., how many documents were indexed?
Are there any error messages in the Solr logs? Please share with us your DIH configuration file, and Solr schema.xml.

Regards,
Gora

RE: DIH - unable to ADD individual new documents

Posted by "Klostermeyer, Michael" <mk...@riskexchange.com>.
Some interesting findings over the last hours, that may change the context of this discussion...

Due to the nature of the application, I need the ability to fire off individual "ADDs" on several different entities at basically the same time.  So, I am making 2-4 Solr ADD calls within 100ms of each other.  While troubleshooting this, I found that if I only made 1 Solr ADD call (ignoring the other entities), it updated the index as expected.  However, when all were fired off, proper indexing did not occur (at least on one of the entities) and no errors were logged.  I am still attempting to figure out if ALL of the 2-4 entities failed to ADD, or if some failed and others succeeded.

So...does this have something to do with Solr's index/message queuing (v3.5)?  How does Solr handle these types of rapid requests, and even more important, how do I get the status of an individual DIH call vs simply the status of the "latest" call at /dataimport?

Mike


-----Original Message-----
From: Gora Mohanty [mailto:gora@mimirtech.com] 
Sent: Monday, July 02, 2012 10:02 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH - unable to ADD individual new documents

On 3 July 2012 07:54, Klostermeyer, Michael <mk...@riskexchange.com> wrote:
> I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds.

What does the data-import page report at the end of the full-import, i.e., how many documents were indexed?
Are there any error messages in the Solr logs? Please share with us your DIH configuration file, and Solr schema.xml.

Regards,
Gora

Re: DIH - unable to ADD individual new documents

Posted by Gora Mohanty <go...@mimirtech.com>.
On 3 July 2012 07:54, Klostermeyer, Michael
<mk...@riskexchange.com> wrote:
> I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds.

What does the data-import page report at the end of the
full-import, i.e., how many documents were indexed?
Are there any error messages in the Solr logs? Please
share with us your DIH configuration file, and Solr
schema.xml.

Regards,
Gora

RE: DIH - unable to ADD individual new documents

Posted by "Klostermeyer, Michael" <mk...@riskexchange.com>.
The URL I am using is http://localhost/solr/dataimport?commit=true&wt=json&clean=false&uniqueID=2028046&command=full%2Dimport&entity=myEntityName

uniqueID is the ID of the newly created DB record.  This ID gets passed to the stored procedure and returns the expected data when I run the SP directly.

Mike


-----Original Message-----
From: Klostermeyer, Michael [mailto:mklostermeyer@riskexchange.com] 
Sent: Monday, July 02, 2012 8:24 PM
To: solr-user@lucene.apache.org
Subject: RE: DIH - unable to ADD individual new documents

I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds.

Mike


-----Original Message-----
From: Klostermeyer, Michael [mailto:mklostermeyer@riskexchange.com] 
Sent: Monday, July 02, 2012 5:41 PM
To: solr-user@lucene.apache.org
Subject: DIH - unable to ADD individual new documents

I am not able to ADD individual documents via the DIH, but updating works as expected.   The stored procedure that is called within the DIH returns the expected data for the new document, Solr appears to "do its thing", but it never makes it to the Solr server, as evidence that subsequent queries do not return it.

Is there a trick to adding new documents using the DIH?

Mike


RE: DIH - unable to ADD individual new documents

Posted by "Klostermeyer, Michael" <mk...@riskexchange.com>.
I should add that I am using the full-import command in all cases, and setting clean=false for the individual adds.

Mike


-----Original Message-----
From: Klostermeyer, Michael [mailto:mklostermeyer@riskexchange.com] 
Sent: Monday, July 02, 2012 5:41 PM
To: solr-user@lucene.apache.org
Subject: DIH - unable to ADD individual new documents

I am not able to ADD individual documents via the DIH, but updating works as expected.   The stored procedure that is called within the DIH returns the expected data for the new document, Solr appears to "do its thing", but it never makes it to the Solr server, as evidence that subsequent queries do not return it.

Is there a trick to adding new documents using the DIH?

Mike