You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wenca <we...@dovolenou.cz> on 2012/03/07 09:41:42 UTC

How to stop processing of DataImportHandler in EventListener

Hi,

I have 2 DataImportHandlers configured. The first one prepares data to 
berkeley backed cache (SOLR-2382, SOLR-2613) and the second one then 
indexes documents reading subentity data from the cache.

I need some way to prevent the second handler to run if the first one is 
currently runnig to prevent reading any inconsistent data. I have't 
found any clear way to achieve this yet.

I thought I can use EventListener before the second handler that will 
check whether the cache dataimport is running and if so set some flag, 
that the processing should not continue.

Or is there another way to block data import handler when another one is 
running?

in solrconfig.xml I have:

<requestHandler name="/dataimport"
   class="org.apache.solr.handler.dataimport.DataImportHandler">
     <lst name="defaults">
       <str name="config">db-data-config.xml</str>
       <str name="persistCacheBaseDir">...</str>
     </lst>
</requestHandler>

<requestHandler name="/dih-cache"
   class="org.apache.solr.handler.dataimport.DataImportHandler">
     <lst name="defaults">
         <str name="config">cache-db-data-config.xml</str>
         <str name="writerImpl">
		org.apache.solr.handler.dataimport.DIHCacheWriter
	</str>
         <str name="persistCacheImpl">
		org.apache.solr.handler.dataimport.BerkleyBackedCache
	</str>
         <str name="persistCacheBaseDir">...</str>
         <str name="persistCacheName">data_cache</str>
         <str name="cachePk">id</str>
     </lst>
</requestHandler>

Thank wenca

RE: How to stop processing of DataImportHandler in EventListener

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Wenca,

I have an app with requirements similar to yours.  We have maybe 40 caches that need to be built, then when they're done (and if they all succeed), the main indexing runs.  For this I wrote some quick-n-squirrley code that executes a configurable # of cache-building handlers at a time.  When one finishes, another starts until they're all done.  When they all finish, the main indexing DIH starts.  I just run this in a separate JVM on the master solr node.  It keeps track of which ones are running and then polls the handlers w/ http every few seconds to see if they're done (scrapeing that "experimental/subject-to-change with typos" page to get the status). 

So this is similar to Mikhail's advice.  Possibly you can script this simply if you just have a 1 or a few caches that need to be built.  You might even be able to monitor your container's log output to know when the first one finishes and the next one starts, if you don't want to scrape the http output (I forget if DIHCacheWriter logs anything useful you could use).

My opinion is this is a real missing feature with DIH.  However, I would shy away from adding more stuff like this until we can clean up some of DIHs more fundamental shortcomings.  (DIH is great for many use cases, but the code has suffered neglect and needs a facelift in my opinion)

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Mikhail Khludnev [mailto:mkhludnev@griddynamics.com] 
Sent: Wednesday, March 07, 2012 3:24 AM
To: solr-user@lucene.apache.org
Subject: Re: How to stop processing of DataImportHandler in EventListener

Hello,

It seems you have some app which triggers these DIH requests. Can't you add
a precondition in that app? Before run the second DIH, check status of the
first one whether it RUNNING or IDLE.

Regards

2012/3/7 Wenca <we...@dovolenou.cz>

> Hi,
>
> I have 2 DataImportHandlers configured. The first one prepares data to
> berkeley backed cache (SOLR-2382, SOLR-2613) and the second one then
> indexes documents reading subentity data from the cache.
>
> I need some way to prevent the second handler to run if the first one is
> currently runnig to prevent reading any inconsistent data. I have't found
> any clear way to achieve this yet.
>
> I thought I can use EventListener before the second handler that will
> check whether the cache dataimport is running and if so set some flag, that
> the processing should not continue.
>
> Or is there another way to block data import handler when another one is
> running?
>
> in solrconfig.xml I have:
>
> <requestHandler name="/dataimport"
>  class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>    <lst name="defaults">
>      <str name="config">db-data-config.**xml</str>
>      <str name="persistCacheBaseDir">...**</str>
>    </lst>
> </requestHandler>
>
> <requestHandler name="/dih-cache"
>  class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>    <lst name="defaults">
>        <str name="config">cache-db-data-**config.xml</str>
>        <str name="writerImpl">
>                org.apache.solr.handler.**dataimport.DIHCacheWriter
>        </str>
>        <str name="persistCacheImpl">
>                org.apache.solr.handler.**dataimport.BerkleyBackedCache
>        </str>
>        <str name="persistCacheBaseDir">...**</str>
>        <str name="persistCacheName">data_**cache</str>
>        <str name="cachePk">id</str>
>    </lst>
> </requestHandler>
>
> Thank wenca
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: How to stop processing of DataImportHandler in EventListener

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello,

It seems you have some app which triggers these DIH requests. Can't you add
a precondition in that app? Before run the second DIH, check status of the
first one whether it RUNNING or IDLE.

Regards

2012/3/7 Wenca <we...@dovolenou.cz>

> Hi,
>
> I have 2 DataImportHandlers configured. The first one prepares data to
> berkeley backed cache (SOLR-2382, SOLR-2613) and the second one then
> indexes documents reading subentity data from the cache.
>
> I need some way to prevent the second handler to run if the first one is
> currently runnig to prevent reading any inconsistent data. I have't found
> any clear way to achieve this yet.
>
> I thought I can use EventListener before the second handler that will
> check whether the cache dataimport is running and if so set some flag, that
> the processing should not continue.
>
> Or is there another way to block data import handler when another one is
> running?
>
> in solrconfig.xml I have:
>
> <requestHandler name="/dataimport"
>  class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>    <lst name="defaults">
>      <str name="config">db-data-config.**xml</str>
>      <str name="persistCacheBaseDir">...**</str>
>    </lst>
> </requestHandler>
>
> <requestHandler name="/dih-cache"
>  class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>    <lst name="defaults">
>        <str name="config">cache-db-data-**config.xml</str>
>        <str name="writerImpl">
>                org.apache.solr.handler.**dataimport.DIHCacheWriter
>        </str>
>        <str name="persistCacheImpl">
>                org.apache.solr.handler.**dataimport.BerkleyBackedCache
>        </str>
>        <str name="persistCacheBaseDir">...**</str>
>        <str name="persistCacheName">data_**cache</str>
>        <str name="cachePk">id</str>
>    </lst>
> </requestHandler>
>
> Thank wenca
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>