You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dileepa Jayakody <di...@gmail.com> on 2013/10/29 21:47:50 UTC

Need additional data processing in Data Import Handler prior to indexing

Hi All,

I'm a newbie to Solr, and I have a requirement to import data from a mysql
database; enhance  the imported content to identify Persons mentioned  and
index it as a separate field in Solr along with the other fields defined
for the original db query.

I'm using Apache Stanbol [1] for the content enhancement requirement.
I can get enhancement results for 'Person' type data in the content as the
enhancement result.

The data flow will be;
mysql-db > Solr data-import handler > Stanbol enhancer > Solr index

For the above requirement I need to perform additional processing at the
data-import handler prior to indexing to send a request to Stanbol and
process the enhancement response. I found some related examples on
modifying mysql data import handler to customize the query results in
db-data-config.xml by using a transformer script.
As per my requirement, In the data-import-handler I need to send a request
to Stanbol and process the response prior to indexing. But I'm not sure if
this can be achieved using a simple javascript.

Is there any other better way of achieving my requirement? Maybe writing a
custom filter in Solr?
Please share your thoughts. Appreciate any pointers as I'm a beginner for
Solr.

Thanks,
Dileepa


[1] https://stanbol.apache.org

Re: Need additional data processing in Data Import Handler prior to indexing

Posted by Dileepa Jayakody <di...@gmail.com>.
Thanks guys for your ideas.

I will go through them and come back with questions.

Regards,
Dileepa


On Wed, Oct 30, 2013 at 7:00 AM, Erick Erickson <er...@gmail.com>wrote:

> Third time tonight I've been able to paste this link....
>
> Also, you can consider just moving to SolrJ and
> taking DIH out of the process, see:
> http://searchhub.org/2012/02/14/indexing-with-solrj/
>
> Whichever approach fits your needs of course.
>
> Best,
> Erick
>
>
> On Tue, Oct 29, 2013 at 7:15 PM, Alexandre Rafalovitch
> <ar...@gmail.com>wrote:
>
> > It's also possible to combine Update Request Processor with DIH. That way
> > if a debug entry needs to be inserted it could go through the same
> Stanbol
> > process.
> >
> > Just define a processing chain the DIH handler and write custom URP to
> call
> > out to Stanbol web service. You have access to a full record in URP, so
> can
> > add/delete/change the fields at will.
> >
> > Regards,
> >    Alex.
> >
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta <
> > michael.della.bitta@appinions.com> wrote:
> >
> > > Hi Dileepa,
> > >
> > > You can write your own Transformers in Java. If it doesn't make sense
> to
> > > run Stanbol calls in a Transformer, maybe setting up a web service that
> > > grabs a record out of MySQL, sends the data to Stanbol, and displays
> the
> > > results could be used in conjunction with HttpDataSource rather than
> > > JdbcDataSource.
> > >
> > > http://wiki.apache.org/solr/DIHCustomTransformer
> > >
> > >
> >
> http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource
> > >
> > > Michael Della Bitta
> > >
> > > Applications Developer
> > >
> > > o: +1 646 532 3062  | c: +1 917 477 7906
> > >
> > > appinions inc.
> > >
> > > “The Science of Influence Marketing”
> > >
> > > 18 East 41st Street
> > >
> > > New York, NY 10017
> > >
> > > t: @appinions <https://twitter.com/Appinions> | g+:
> > > plus.google.com/appinions<
> > >
> >
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> > > >
> > > w: appinions.com <http://www.appinions.com/>
> > >
> > >
> > > On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody <
> > > dileepajayakody@gmail.com
> > > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'm a newbie to Solr, and I have a requirement to import data from a
> > > mysql
> > > > database; enhance  the imported content to identify Persons mentioned
> > >  and
> > > > index it as a separate field in Solr along with the other fields
> > defined
> > > > for the original db query.
> > > >
> > > > I'm using Apache Stanbol [1] for the content enhancement requirement.
> > > > I can get enhancement results for 'Person' type data in the content
> as
> > > the
> > > > enhancement result.
> > > >
> > > > The data flow will be;
> > > > mysql-db > Solr data-import handler > Stanbol enhancer > Solr index
> > > >
> > > > For the above requirement I need to perform additional processing at
> > the
> > > > data-import handler prior to indexing to send a request to Stanbol
> and
> > > > process the enhancement response. I found some related examples on
> > > > modifying mysql data import handler to customize the query results in
> > > > db-data-config.xml by using a transformer script.
> > > > As per my requirement, In the data-import-handler I need to send a
> > > request
> > > > to Stanbol and process the response prior to indexing. But I'm not
> sure
> > > if
> > > > this can be achieved using a simple javascript.
> > > >
> > > > Is there any other better way of achieving my requirement? Maybe
> > writing
> > > a
> > > > custom filter in Solr?
> > > > Please share your thoughts. Appreciate any pointers as I'm a beginner
> > for
> > > > Solr.
> > > >
> > > > Thanks,
> > > > Dileepa
> > > >
> > > >
> > > > [1] https://stanbol.apache.org
> > > >
> > >
> >
>

Re: Need additional data processing in Data Import Handler prior to indexing

Posted by Erick Erickson <er...@gmail.com>.
Third time tonight I've been able to paste this link....

Also, you can consider just moving to SolrJ and
taking DIH out of the process, see:
http://searchhub.org/2012/02/14/indexing-with-solrj/

Whichever approach fits your needs of course.

Best,
Erick


On Tue, Oct 29, 2013 at 7:15 PM, Alexandre Rafalovitch
<ar...@gmail.com>wrote:

> It's also possible to combine Update Request Processor with DIH. That way
> if a debug entry needs to be inserted it could go through the same Stanbol
> process.
>
> Just define a processing chain the DIH handler and write custom URP to call
> out to Stanbol web service. You have access to a full record in URP, so can
> add/delete/change the fields at will.
>
> Regards,
>    Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta <
> michael.della.bitta@appinions.com> wrote:
>
> > Hi Dileepa,
> >
> > You can write your own Transformers in Java. If it doesn't make sense to
> > run Stanbol calls in a Transformer, maybe setting up a web service that
> > grabs a record out of MySQL, sends the data to Stanbol, and displays the
> > results could be used in conjunction with HttpDataSource rather than
> > JdbcDataSource.
> >
> > http://wiki.apache.org/solr/DIHCustomTransformer
> >
> >
> http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062  | c: +1 917 477 7906
> >
> > appinions inc.
> >
> > “The Science of Influence Marketing”
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions <https://twitter.com/Appinions> | g+:
> > plus.google.com/appinions<
> >
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> > >
> > w: appinions.com <http://www.appinions.com/>
> >
> >
> > On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody <
> > dileepajayakody@gmail.com
> > > wrote:
> >
> > > Hi All,
> > >
> > > I'm a newbie to Solr, and I have a requirement to import data from a
> > mysql
> > > database; enhance  the imported content to identify Persons mentioned
> >  and
> > > index it as a separate field in Solr along with the other fields
> defined
> > > for the original db query.
> > >
> > > I'm using Apache Stanbol [1] for the content enhancement requirement.
> > > I can get enhancement results for 'Person' type data in the content as
> > the
> > > enhancement result.
> > >
> > > The data flow will be;
> > > mysql-db > Solr data-import handler > Stanbol enhancer > Solr index
> > >
> > > For the above requirement I need to perform additional processing at
> the
> > > data-import handler prior to indexing to send a request to Stanbol and
> > > process the enhancement response. I found some related examples on
> > > modifying mysql data import handler to customize the query results in
> > > db-data-config.xml by using a transformer script.
> > > As per my requirement, In the data-import-handler I need to send a
> > request
> > > to Stanbol and process the response prior to indexing. But I'm not sure
> > if
> > > this can be achieved using a simple javascript.
> > >
> > > Is there any other better way of achieving my requirement? Maybe
> writing
> > a
> > > custom filter in Solr?
> > > Please share your thoughts. Appreciate any pointers as I'm a beginner
> for
> > > Solr.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > > [1] https://stanbol.apache.org
> > >
> >
>

Re: Need additional data processing in Data Import Handler prior to indexing

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
It's also possible to combine Update Request Processor with DIH. That way
if a debug entry needs to be inserted it could go through the same Stanbol
process.

Just define a processing chain the DIH handler and write custom URP to call
out to Stanbol web service. You have access to a full record in URP, so can
add/delete/change the fields at will.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta <
michael.della.bitta@appinions.com> wrote:

> Hi Dileepa,
>
> You can write your own Transformers in Java. If it doesn't make sense to
> run Stanbol calls in a Transformer, maybe setting up a web service that
> grabs a record out of MySQL, sends the data to Stanbol, and displays the
> results could be used in conjunction with HttpDataSource rather than
> JdbcDataSource.
>
> http://wiki.apache.org/solr/DIHCustomTransformer
>
> http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com <http://www.appinions.com/>
>
>
> On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody <
> dileepajayakody@gmail.com
> > wrote:
>
> > Hi All,
> >
> > I'm a newbie to Solr, and I have a requirement to import data from a
> mysql
> > database; enhance  the imported content to identify Persons mentioned
>  and
> > index it as a separate field in Solr along with the other fields defined
> > for the original db query.
> >
> > I'm using Apache Stanbol [1] for the content enhancement requirement.
> > I can get enhancement results for 'Person' type data in the content as
> the
> > enhancement result.
> >
> > The data flow will be;
> > mysql-db > Solr data-import handler > Stanbol enhancer > Solr index
> >
> > For the above requirement I need to perform additional processing at the
> > data-import handler prior to indexing to send a request to Stanbol and
> > process the enhancement response. I found some related examples on
> > modifying mysql data import handler to customize the query results in
> > db-data-config.xml by using a transformer script.
> > As per my requirement, In the data-import-handler I need to send a
> request
> > to Stanbol and process the response prior to indexing. But I'm not sure
> if
> > this can be achieved using a simple javascript.
> >
> > Is there any other better way of achieving my requirement? Maybe writing
> a
> > custom filter in Solr?
> > Please share your thoughts. Appreciate any pointers as I'm a beginner for
> > Solr.
> >
> > Thanks,
> > Dileepa
> >
> >
> > [1] https://stanbol.apache.org
> >
>

Re: Need additional data processing in Data Import Handler prior to indexing

Posted by Michael Della Bitta <mi...@appinions.com>.
Hi Dileepa,

You can write your own Transformers in Java. If it doesn't make sense to
run Stanbol calls in a Transformer, maybe setting up a web service that
grabs a record out of MySQL, sends the data to Stanbol, and displays the
results could be used in conjunction with HttpDataSource rather than
JdbcDataSource.

http://wiki.apache.org/solr/DIHCustomTransformer
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody <dileepajayakody@gmail.com
> wrote:

> Hi All,
>
> I'm a newbie to Solr, and I have a requirement to import data from a mysql
> database; enhance  the imported content to identify Persons mentioned  and
> index it as a separate field in Solr along with the other fields defined
> for the original db query.
>
> I'm using Apache Stanbol [1] for the content enhancement requirement.
> I can get enhancement results for 'Person' type data in the content as the
> enhancement result.
>
> The data flow will be;
> mysql-db > Solr data-import handler > Stanbol enhancer > Solr index
>
> For the above requirement I need to perform additional processing at the
> data-import handler prior to indexing to send a request to Stanbol and
> process the enhancement response. I found some related examples on
> modifying mysql data import handler to customize the query results in
> db-data-config.xml by using a transformer script.
> As per my requirement, In the data-import-handler I need to send a request
> to Stanbol and process the response prior to indexing. But I'm not sure if
> this can be achieved using a simple javascript.
>
> Is there any other better way of achieving my requirement? Maybe writing a
> custom filter in Solr?
> Please share your thoughts. Appreciate any pointers as I'm a beginner for
> Solr.
>
> Thanks,
> Dileepa
>
>
> [1] https://stanbol.apache.org
>

RE: Need additional data processing in Data Import Handler prior to indexing

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Would an "onImportEnd" event listener serve your needs?

See http://wiki.apache.org/solr/DataImportHandler#EventListeners

James Dyer
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Dileepa Jayakody [mailto:dileepajayakody@gmail.com] 
Sent: Tuesday, October 29, 2013 3:48 PM
To: solr-user@lucene.apache.org
Subject: Need additional data processing in Data Import Handler prior to indexing

Hi All,

I'm a newbie to Solr, and I have a requirement to import data from a mysql
database; enhance  the imported content to identify Persons mentioned  and
index it as a separate field in Solr along with the other fields defined
for the original db query.

I'm using Apache Stanbol [1] for the content enhancement requirement.
I can get enhancement results for 'Person' type data in the content as the
enhancement result.

The data flow will be;
mysql-db > Solr data-import handler > Stanbol enhancer > Solr index

For the above requirement I need to perform additional processing at the
data-import handler prior to indexing to send a request to Stanbol and
process the enhancement response. I found some related examples on
modifying mysql data import handler to customize the query results in
db-data-config.xml by using a transformer script.
As per my requirement, In the data-import-handler I need to send a request
to Stanbol and process the response prior to indexing. But I'm not sure if
this can be achieved using a simple javascript.

Is there any other better way of achieving my requirement? Maybe writing a
custom filter in Solr?
Please share your thoughts. Appreciate any pointers as I'm a beginner for
Solr.

Thanks,
Dileepa


[1] https://stanbol.apache.org