You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jonathan Lee <jo...@comcast.com> on 2008/07/21 19:43:11 UTC

DIH: post-import action & independent SQL statements

Hello,

I have been using the DataImportHandler successfully to import documents
from MySQL, and it has worked very well.  However I have three questions
about the handler:

1. Is it possible to execute a command post-import? Specifically, I would
like to run snapshooter after both full & delta imports. The postOptimize &
postCommit listeners do not quite work here since I do not want to optimize
after delta imports, and I do not want to run snapshooter for each auto
commit during a full import.

2. Is it possible to execute independent SQL statements before, during, or
after importing? I would like to created some intermediate temporary tables
and also set specific settings relevant to the import (e.g. "SET
@@group_concat_max_len=...").

3. It would be great to have a way to chain together multiple Transformers.
For instance, I'd like to perform regex operations, then template the output
and finally add a custom document boost based on a column.  This could be
done by chaining the RegexTransformer, TemplateTransformer and a custom
Transformer.

Thanks for your help!

Jonathan Lee


Re: DIH: post-import action & independent SQL statements

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
yes, It will be there in the next patch .
The Entityprocessor interface will have an extra destroy() method so
you can extend SqlEntityProcessor and override the init/destroy
methods for doing pre/post actions. init() is already there

Another addition is getSolrCore() in Context which can help you invoke
methods on solr directly


On Tue, Jul 22, 2008 at 12:09 AM, Jonathan Lee <jo...@comcast.com> wrote:
> Thanks for the solutions to #2 & #3. I assume by your last comment that the
> call back hooks are not yet in the DIH are features that will be released in
> the future as patches, correct?
>
>
>> From: Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>
>> Reply-To: <so...@lucene.apache.org>
>> Date: Mon, 21 Jul 2008 23:25:15 +0530
>> To: <so...@lucene.apache.org>
>> Subject: Re: DIH: post-import action & independent SQL statements
>>
>> Overtime we have realized this as a common pattern of requirements.
>> a pre-import, post-import call back hooks are something I can think of.
>>
>>
>>
>> On Mon, Jul 21, 2008 at 11:13 PM, Jonathan Lee <jo...@comcast.com>
>> wrote:
>>> Hello,
>>>
>>> I have been using the DataImportHandler successfully to import documents
>>> from MySQL, and it has worked very well.  However I have three questions
>>> about the handler:
>>>
>>> 1. Is it possible to execute a command post-import? Specifically, I would
>>> like to run snapshooter after both full & delta imports. The postOptimize &
>>> postCommit listeners do not quite work here since I do not want to optimize
>>> after delta imports, and I do not want to run snapshooter for each auto
>>> commit during a full import.
>>>
>>> 2. Is it possible to execute independent SQL statements before, during, or
>>> after importing? I would like to created some intermediate temporary tables
>>> and also set specific settings relevant to the import (e.g. "SET
>>> @@group_concat_max_len=...").
>> During the import it is definitely possible. Any transformer can
>> obtain a DataSource  as context.getDataSource(<name>) and invoke any
>> methods .The getData(string query can actually execute anything)
>>
>>>
>>> 3. It would be great to have a way to chain together multiple Transformers.
>>> For instance, I'd like to perform regex operations, then template the output
>>> and finally add a custom document boost based on a column.  This could be
>>> done by chaining the RegexTransformer, TemplateTransformer and a custom
>>> Transformer.
>> I guess chaining is possible already.
>> transformer="RegexTransformer,TemplateTransformer,my.CustomTransformer"
>> can chain the 3 transformers
>>>
>>> Thanks for your help!
>>>
>>> Jonathan Lee
>>>
>>>
>> There are a bunch of features we have in mind. We do not want to make
>> the patch bigger than it already is and we are waiting for it get
>> committed so that we can provide incremental patches for these
>>
>> --
>> --Noble Paul
>
>



-- 
--Noble Paul

Re: DIH: post-import action & independent SQL statements

Posted by Jonathan Lee <jo...@comcast.com>.
Thanks for the solutions to #2 & #3. I assume by your last comment that the
call back hooks are not yet in the DIH are features that will be released in
the future as patches, correct?


> From: Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>
> Reply-To: <so...@lucene.apache.org>
> Date: Mon, 21 Jul 2008 23:25:15 +0530
> To: <so...@lucene.apache.org>
> Subject: Re: DIH: post-import action & independent SQL statements
> 
> Overtime we have realized this as a common pattern of requirements.
> a pre-import, post-import call back hooks are something I can think of.
> 
> 
> 
> On Mon, Jul 21, 2008 at 11:13 PM, Jonathan Lee <jo...@comcast.com>
> wrote:
>> Hello,
>> 
>> I have been using the DataImportHandler successfully to import documents
>> from MySQL, and it has worked very well.  However I have three questions
>> about the handler:
>> 
>> 1. Is it possible to execute a command post-import? Specifically, I would
>> like to run snapshooter after both full & delta imports. The postOptimize &
>> postCommit listeners do not quite work here since I do not want to optimize
>> after delta imports, and I do not want to run snapshooter for each auto
>> commit during a full import.
>> 
>> 2. Is it possible to execute independent SQL statements before, during, or
>> after importing? I would like to created some intermediate temporary tables
>> and also set specific settings relevant to the import (e.g. "SET
>> @@group_concat_max_len=...").
> During the import it is definitely possible. Any transformer can
> obtain a DataSource  as context.getDataSource(<name>) and invoke any
> methods .The getData(string query can actually execute anything)
> 
>> 
>> 3. It would be great to have a way to chain together multiple Transformers.
>> For instance, I'd like to perform regex operations, then template the output
>> and finally add a custom document boost based on a column.  This could be
>> done by chaining the RegexTransformer, TemplateTransformer and a custom
>> Transformer.
> I guess chaining is possible already.
> transformer="RegexTransformer,TemplateTransformer,my.CustomTransformer"
> can chain the 3 transformers
>> 
>> Thanks for your help!
>> 
>> Jonathan Lee
>> 
>> 
> There are a bunch of features we have in mind. We do not want to make
> the patch bigger than it already is and we are waiting for it get
> committed so that we can provide incremental patches for these
> 
> -- 
> --Noble Paul


Re: DIH: post-import action & independent SQL statements

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
Overtime we have realized this as a common pattern of requirements.
a pre-import, post-import call back hooks are something I can think of.



On Mon, Jul 21, 2008 at 11:13 PM, Jonathan Lee <jo...@comcast.com> wrote:
> Hello,
>
> I have been using the DataImportHandler successfully to import documents
> from MySQL, and it has worked very well.  However I have three questions
> about the handler:
>
> 1. Is it possible to execute a command post-import? Specifically, I would
> like to run snapshooter after both full & delta imports. The postOptimize &
> postCommit listeners do not quite work here since I do not want to optimize
> after delta imports, and I do not want to run snapshooter for each auto
> commit during a full import.
>
> 2. Is it possible to execute independent SQL statements before, during, or
> after importing? I would like to created some intermediate temporary tables
> and also set specific settings relevant to the import (e.g. "SET
> @@group_concat_max_len=...").
During the import it is definitely possible. Any transformer can
obtain a DataSource  as context.getDataSource(<name>) and invoke any
methods .The getData(string query can actually execute anything)

>
> 3. It would be great to have a way to chain together multiple Transformers.
> For instance, I'd like to perform regex operations, then template the output
> and finally add a custom document boost based on a column.  This could be
> done by chaining the RegexTransformer, TemplateTransformer and a custom
> Transformer.
I guess chaining is possible already.
transformer="RegexTransformer,TemplateTransformer,my.CustomTransformer"
can chain the 3 transformers
>
> Thanks for your help!
>
> Jonathan Lee
>
>
There are a bunch of features we have in mind. We do not want to make
the patch bigger than it already is and we are waiting for it get
committed so that we can provide incremental patches for these

-- 
--Noble Paul