You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Juan Manuel Alvarez <na...@gmail.com> on 2010/12/03 20:33:14 UTC

Syncing 'delta-import' with 'select' query

Hello everyone! I would like to ask you a question about DIH.

I am using a database and DIH to sync against Solr, and a GUI to
display and operate on the items retrieved from Solr.
When I change the state of an item through the GUI, the following happens:
a. The item is updated in the DB.
b. A delta-import command is fired to sync the DB with Solr.
c. The GUI is refreshed by making a query to Solr.

My problem comes between (b) and (c). The delta-import operation is
executed in a new thread, so my call returns immediately, refreshing
the GUI before the Solr index is updated causing the item state in the
GUI to be outdated.

I had two ideas so far:
1. Querying the status of the DIH after the delta-import operation and
do not return until it is "idle". The problem I see with this is that
if other users execute delta-imports, the status will be "busy" until
all operations are finished.
2. Use Zoie. The first problem is that configuring it is not as
straightforward as it seems, so I don't want to spend more time trying
it until I am sure that this will solve my issue. On the other hand, I
think that I may suffer the same problem since the delta-import is
still firing in another thread, so I can't be sure it will be called
fast enough.

Am I pointing on the right direction or is there another way to
achieve my goal?

Thanks in advance!
Juan M.

Re: Syncing 'delta-import' with 'select' query

Posted by Juan Manuel Alvarez <na...@gmail.com>.
Oops! That seems to be the problem, since I am using 1.4.

Thanks!
Juan M.

On Tue, Dec 14, 2010 at 8:40 PM, Alexey Serba <as...@gmail.com> wrote:
> What Solr version do you use?
>
> It seems that sync flag has been added to 3.1 and 4.0 (trunk) branches
> and not to 1.4
> https://issues.apache.org/jira/browse/SOLR-1721
>
> On Wed, Dec 8, 2010 at 11:21 PM, Juan Manuel Alvarez <na...@gmail.com> wrote:
>> Hello everyone!
>> I have been doing some tests, but it seems I can't make the
>> synchronize flag work.
>>
>> I have made two tests:
>> 1) DIH with commit=false
>> 2) DIH with commit=false + commit via Solr XML update protocol
>>
>> And here are the log results:
>> For (1) the command is
>> "/solr/dataimport?command=delta-import&commit=false&synchronous=true"
>> and the first part of the output is:
>>
>> Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
>> Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport
>> params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
>> status=0 QTime=4
>> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DataImporter
>> doDeltaImport
>> INFO: Starting Delta Import
>> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.SolrWriter
>> readIndexerProperties
>> INFO: Read dataimport.properties
>> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder doDelta
>> INFO: Starting delta collection.
>> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder
>> collectDelta
>>
>>
>> For (2) the commands are
>> "/solr/dataimport?command=delta-import&commit=false&synchronous=true"
>> and "/solr/update?commit=true&waitFlush=true&waitSearcher=true" and
>> the first part of the output is:
>>
>> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
>> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport
>> params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
>> status=0 QTime=1
>> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
>> Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.DataImporter
>> doDeltaImport
>> INFO: Starting Delta Import
>> Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.SolrWriter
>> readIndexerProperties
>> INFO: Read dataimport.properties
>> Dec 8, 2010 4:22:50 PM org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
>>
>> In (2) it seems like the commit is being fired before the delta-update finishes.
>>
>> Am I using the "synchronous" flag right?
>>
>> Thanks in advance!
>> Juan M.
>>
>> On Mon, Dec 6, 2010 at 6:46 PM, Juan Manuel Alvarez <na...@gmail.com> wrote:
>>> Thanks for all the help! It is really appreciated.
>>>
>>> For now, I can afford the parallel requests problem, but when I put
>>> synchronous=true in the delta import, the call still returns with
>>> outdated items.
>>> Examining the log, it seems that the commit operation is being
>>> executed after the operation returns, even when I am using
>>> commit=true.
>>> Is it possible to also execute the commit synchronously?
>>>
>>> Cheers!
>>> Juan M.
>>>
>>> On Mon, Dec 6, 2010 at 4:29 PM, Alexey Serba <as...@gmail.com> wrote:
>>>>> When you say "two parallel requests from two users to single DIH
>>>>> request handler", what do you mean by "request handler"?
>>>> I mean DIH.
>>>>
>>>>> Are you
>>>>> refering to the HTTP request? Would that mean that if I make the
>>>>> request from different HTTP sessions it would work?
>>>> No.
>>>>
>>>> It means that when you have two users that simultaneously changed two
>>>> objects in the UI then you have two HTTP requests to DIH to pull
>>>> changes from the db into Solr index. If the second request comes when
>>>> the first is not fully processed then the second request will be
>>>> rejected. As a result your index would be outdated (w/o the latest
>>>> update) until the next update.
>>>>
>>>
>>
>

Re: Syncing 'delta-import' with 'select' query

Posted by Alexey Serba <as...@gmail.com>.
What Solr version do you use?

It seems that sync flag has been added to 3.1 and 4.0 (trunk) branches
and not to 1.4
https://issues.apache.org/jira/browse/SOLR-1721

On Wed, Dec 8, 2010 at 11:21 PM, Juan Manuel Alvarez <na...@gmail.com> wrote:
> Hello everyone!
> I have been doing some tests, but it seems I can't make the
> synchronize flag work.
>
> I have made two tests:
> 1) DIH with commit=false
> 2) DIH with commit=false + commit via Solr XML update protocol
>
> And here are the log results:
> For (1) the command is
> "/solr/dataimport?command=delta-import&commit=false&synchronous=true"
> and the first part of the output is:
>
> Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
> Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport
> params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
> status=0 QTime=4
> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> INFO: Starting Delta Import
> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder doDelta
> INFO: Starting delta collection.
> Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
>
>
> For (2) the commands are
> "/solr/dataimport?command=delta-import&commit=false&synchronous=true"
> and "/solr/update?commit=true&waitFlush=true&waitSearcher=true" and
> the first part of the output is:
>
> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport
> params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
> status=0 QTime=1
> Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
> Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.DataImporter
> doDeltaImport
> INFO: Starting Delta Import
> Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> INFO: Read dataimport.properties
> Dec 8, 2010 4:22:50 PM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
>
> In (2) it seems like the commit is being fired before the delta-update finishes.
>
> Am I using the "synchronous" flag right?
>
> Thanks in advance!
> Juan M.
>
> On Mon, Dec 6, 2010 at 6:46 PM, Juan Manuel Alvarez <na...@gmail.com> wrote:
>> Thanks for all the help! It is really appreciated.
>>
>> For now, I can afford the parallel requests problem, but when I put
>> synchronous=true in the delta import, the call still returns with
>> outdated items.
>> Examining the log, it seems that the commit operation is being
>> executed after the operation returns, even when I am using
>> commit=true.
>> Is it possible to also execute the commit synchronously?
>>
>> Cheers!
>> Juan M.
>>
>> On Mon, Dec 6, 2010 at 4:29 PM, Alexey Serba <as...@gmail.com> wrote:
>>>> When you say "two parallel requests from two users to single DIH
>>>> request handler", what do you mean by "request handler"?
>>> I mean DIH.
>>>
>>>> Are you
>>>> refering to the HTTP request? Would that mean that if I make the
>>>> request from different HTTP sessions it would work?
>>> No.
>>>
>>> It means that when you have two users that simultaneously changed two
>>> objects in the UI then you have two HTTP requests to DIH to pull
>>> changes from the db into Solr index. If the second request comes when
>>> the first is not fully processed then the second request will be
>>> rejected. As a result your index would be outdated (w/o the latest
>>> update) until the next update.
>>>
>>
>

Re: Syncing 'delta-import' with 'select' query

Posted by Juan Manuel Alvarez <na...@gmail.com>.
Hello everyone!
I have been doing some tests, but it seems I can't make the
synchronize flag work.

I have made two tests:
1) DIH with commit=false
2) DIH with commit=false + commit via Solr XML update protocol

And here are the log results:
For (1) the command is
"/solr/dataimport?command=delta-import&commit=false&synchronous=true"
and the first part of the output is:

Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Dec 8, 2010 4:42:51 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport
params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
status=0 QTime=4
Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
INFO: Starting Delta Import
Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder doDelta
INFO: Starting delta collection.
Dec 8, 2010 4:42:51 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta


For (2) the commands are
"/solr/dataimport?command=delta-import&commit=false&synchronous=true"
and "/solr/update?commit=true&waitFlush=true&waitSearcher=true" and
the first part of the output is:

Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport
params={schema=testproject&dbHost=127.0.0.1&dbPassword=fuz10n!&dbName=fzm&commit=false&dbUser=fzm&command=delta-import&projectId=1&synchronous=true&dbPort=5432}
status=0 QTime=1
Dec 8, 2010 4:22:50 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=0
Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
INFO: Starting Delta Import
Dec 8, 2010 4:22:50 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Dec 8, 2010 4:22:50 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)

In (2) it seems like the commit is being fired before the delta-update finishes.

Am I using the "synchronous" flag right?

Thanks in advance!
Juan M.

On Mon, Dec 6, 2010 at 6:46 PM, Juan Manuel Alvarez <na...@gmail.com> wrote:
> Thanks for all the help! It is really appreciated.
>
> For now, I can afford the parallel requests problem, but when I put
> synchronous=true in the delta import, the call still returns with
> outdated items.
> Examining the log, it seems that the commit operation is being
> executed after the operation returns, even when I am using
> commit=true.
> Is it possible to also execute the commit synchronously?
>
> Cheers!
> Juan M.
>
> On Mon, Dec 6, 2010 at 4:29 PM, Alexey Serba <as...@gmail.com> wrote:
>>> When you say "two parallel requests from two users to single DIH
>>> request handler", what do you mean by "request handler"?
>> I mean DIH.
>>
>>> Are you
>>> refering to the HTTP request? Would that mean that if I make the
>>> request from different HTTP sessions it would work?
>> No.
>>
>> It means that when you have two users that simultaneously changed two
>> objects in the UI then you have two HTTP requests to DIH to pull
>> changes from the db into Solr index. If the second request comes when
>> the first is not fully processed then the second request will be
>> rejected. As a result your index would be outdated (w/o the latest
>> update) until the next update.
>>
>

Re: Syncing 'delta-import' with 'select' query

Posted by Juan Manuel Alvarez <na...@gmail.com>.
Thanks for all the help! It is really appreciated.

For now, I can afford the parallel requests problem, but when I put
synchronous=true in the delta import, the call still returns with
outdated items.
Examining the log, it seems that the commit operation is being
executed after the operation returns, even when I am using
commit=true.
Is it possible to also execute the commit synchronously?

Cheers!
Juan M.

On Mon, Dec 6, 2010 at 4:29 PM, Alexey Serba <as...@gmail.com> wrote:
>> When you say "two parallel requests from two users to single DIH
>> request handler", what do you mean by "request handler"?
> I mean DIH.
>
>> Are you
>> refering to the HTTP request? Would that mean that if I make the
>> request from different HTTP sessions it would work?
> No.
>
> It means that when you have two users that simultaneously changed two
> objects in the UI then you have two HTTP requests to DIH to pull
> changes from the db into Solr index. If the second request comes when
> the first is not fully processed then the second request will be
> rejected. As a result your index would be outdated (w/o the latest
> update) until the next update.
>

Re: Syncing 'delta-import' with 'select' query

Posted by Alexey Serba <as...@gmail.com>.
> When you say "two parallel requests from two users to single DIH
> request handler", what do you mean by "request handler"?
I mean DIH.

> Are you
> refering to the HTTP request? Would that mean that if I make the
> request from different HTTP sessions it would work?
No.

It means that when you have two users that simultaneously changed two
objects in the UI then you have two HTTP requests to DIH to pull
changes from the db into Solr index. If the second request comes when
the first is not fully processed then the second request will be
rejected. As a result your index would be outdated (w/o the latest
update) until the next update.

Re: Syncing 'delta-import' with 'select' query

Posted by Juan Manuel Alvarez <na...@gmail.com>.
Alex:

Thanks for the quick reply.

When you say "two parallel requests from two users to single DIH
request handler", what do you mean by "request handler"? Are you
refering to the HTTP request? Would that mean that if I make the
request from different HTTP sessions it would work?

Cheers!
Juan M.

On Mon, Dec 6, 2010 at 1:12 PM, Alexey Serba <as...@gmail.com> wrote:
> Hey Juan,
>
> It seems that DataImportHandler is not a right tool for your scenario
> and you'd better use Solr XML update protocol.
> * http://wiki.apache.org/solr/UpdateXmlMessages
>
> You still can work around your outdated GUI view problem with calling
> DIH synchronously, by adding synchronous=true to your request. But it
> won't solve the problem with two parallel requests from two users to
> single DIH request handler, because DIH doesn't support that, and if
> previous request is still running it bounces the second request.
>
> HTH,
> Alex
>
>
>
> On Fri, Dec 3, 2010 at 10:33 PM, Juan Manuel Alvarez <na...@gmail.com> wrote:
>> Hello everyone! I would like to ask you a question about DIH.
>>
>> I am using a database and DIH to sync against Solr, and a GUI to
>> display and operate on the items retrieved from Solr.
>> When I change the state of an item through the GUI, the following happens:
>> a. The item is updated in the DB.
>> b. A delta-import command is fired to sync the DB with Solr.
>> c. The GUI is refreshed by making a query to Solr.
>>
>> My problem comes between (b) and (c). The delta-import operation is
>> executed in a new thread, so my call returns immediately, refreshing
>> the GUI before the Solr index is updated causing the item state in the
>> GUI to be outdated.
>>
>> I had two ideas so far:
>> 1. Querying the status of the DIH after the delta-import operation and
>> do not return until it is "idle". The problem I see with this is that
>> if other users execute delta-imports, the status will be "busy" until
>> all operations are finished.
>> 2. Use Zoie. The first problem is that configuring it is not as
>> straightforward as it seems, so I don't want to spend more time trying
>> it until I am sure that this will solve my issue. On the other hand, I
>> think that I may suffer the same problem since the delta-import is
>> still firing in another thread, so I can't be sure it will be called
>> fast enough.
>>
>> Am I pointing on the right direction or is there another way to
>> achieve my goal?
>>
>> Thanks in advance!
>> Juan M.
>>
>

Re: Syncing 'delta-import' with 'select' query

Posted by Alexey Serba <as...@gmail.com>.
Hey Juan,

It seems that DataImportHandler is not a right tool for your scenario
and you'd better use Solr XML update protocol.
* http://wiki.apache.org/solr/UpdateXmlMessages

You still can work around your outdated GUI view problem with calling
DIH synchronously, by adding synchronous=true to your request. But it
won't solve the problem with two parallel requests from two users to
single DIH request handler, because DIH doesn't support that, and if
previous request is still running it bounces the second request.

HTH,
Alex



On Fri, Dec 3, 2010 at 10:33 PM, Juan Manuel Alvarez <na...@gmail.com> wrote:
> Hello everyone! I would like to ask you a question about DIH.
>
> I am using a database and DIH to sync against Solr, and a GUI to
> display and operate on the items retrieved from Solr.
> When I change the state of an item through the GUI, the following happens:
> a. The item is updated in the DB.
> b. A delta-import command is fired to sync the DB with Solr.
> c. The GUI is refreshed by making a query to Solr.
>
> My problem comes between (b) and (c). The delta-import operation is
> executed in a new thread, so my call returns immediately, refreshing
> the GUI before the Solr index is updated causing the item state in the
> GUI to be outdated.
>
> I had two ideas so far:
> 1. Querying the status of the DIH after the delta-import operation and
> do not return until it is "idle". The problem I see with this is that
> if other users execute delta-imports, the status will be "busy" until
> all operations are finished.
> 2. Use Zoie. The first problem is that configuring it is not as
> straightforward as it seems, so I don't want to spend more time trying
> it until I am sure that this will solve my issue. On the other hand, I
> think that I may suffer the same problem since the delta-import is
> still firing in another thread, so I can't be sure it will be called
> fast enough.
>
> Am I pointing on the right direction or is there another way to
> achieve my goal?
>
> Thanks in advance!
> Juan M.
>