You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dileepa Jayakody <di...@gmail.com> on 2013/11/02 22:51:18 UTC

Writing a Solr custom analyzer to post content to Stanbol {was: Need additional data processing in Data Import Handler prior to indexing}

Hi All,

I went through possible solutions for my requirement of triggering a
Stanbol enhancement during Solr indexing, and I got the requirement
simplified.

I only need to process the field named "content" to perform the Stanbol
enhancement to extract Person and Organizations.
So I think it will be easier to do the Stanbol request during indexing the
"content" field , after the data is imported (from DIH).

I think the best solution will be to write a custom Analyzer to process the
content and post it to Stanbol.
In the analyzer I also need to process the Stanbol enhancement response.
The response should be processed as a new document to index and store the
identified Person and Organization entities in a field called
"extractedEntities".

So my current idea is as follows;

in the schema.xml

<copyField source="content" dest="stanbolRequest" />

<field name="stanbolRequest" type="stanbolRequestType" indexed="true"
stored="true" docValues="true"required="false"/>

 <fieldType name="stanbolRequestType" class="solr.TextField">
  <analyzer class="MyCustomAnalyzer"/>
 </fieldType>

In the : MyCustomAnalyzer class the content will be posted and enhanced
from Stanbol. The Person and Organization entities in the response should
be indexed into the Solr field "extractedEntities".
Am I going in the correct path for my requirement? Please share your ideas.
Appreciate any relevant pointers to samples/documentation.

Thanks,
Dileea

On Wed, Oct 30, 2013 at 11:26 AM, Dileepa Jayakody <
dileepajayakody@gmail.com> wrote:

> Thanks guys for your ideas.
>
> I will go through them and come back with questions.
>
> Regards,
> Dileepa
>
>
> On Wed, Oct 30, 2013 at 7:00 AM, Erick Erickson <er...@gmail.com>wrote:
>
>> Third time tonight I've been able to paste this link....
>>
>> Also, you can consider just moving to SolrJ and
>> taking DIH out of the process, see:
>> http://searchhub.org/2012/02/14/indexing-with-solrj/
>>
>> Whichever approach fits your needs of course.
>>
>> Best,
>> Erick
>>
>>
>> On Tue, Oct 29, 2013 at 7:15 PM, Alexandre Rafalovitch
>> <ar...@gmail.com>wrote:
>>
>> > It's also possible to combine Update Request Processor with DIH. That
>> way
>> > if a debug entry needs to be inserted it could go through the same
>> Stanbol
>> > process.
>> >
>> > Just define a processing chain the DIH handler and write custom URP to
>> call
>> > out to Stanbol web service. You have access to a full record in URP, so
>> can
>> > add/delete/change the fields at will.
>> >
>> > Regards,
>> >    Alex.
>> >
>> > Personal website: http://www.outerthoughts.com/
>> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> > - Time is the quality of nature that keeps events from happening all at
>> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>> >
>> >
>> > On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta <
>> > michael.della.bitta@appinions.com> wrote:
>> >
>> > > Hi Dileepa,
>> > >
>> > > You can write your own Transformers in Java. If it doesn't make sense
>> to
>> > > run Stanbol calls in a Transformer, maybe setting up a web service
>> that
>> > > grabs a record out of MySQL, sends the data to Stanbol, and displays
>> the
>> > > results could be used in conjunction with HttpDataSource rather than
>> > > JdbcDataSource.
>> > >
>> > > http://wiki.apache.org/solr/DIHCustomTransformer
>> > >
>> > >
>> >
>> http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource
>> > >
>> > > Michael Della Bitta
>> > >
>> > > Applications Developer
>> > >
>> > > o: +1 646 532 3062  | c: +1 917 477 7906
>> > >
>> > > appinions inc.
>> > >
>> > > “The Science of Influence Marketing”
>> > >
>> > > 18 East 41st Street
>> > >
>> > > New York, NY 10017
>> > >
>> > > t: @appinions <https://twitter.com/Appinions> | g+:
>> > > plus.google.com/appinions<
>> > >
>> >
>> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
>> > > >
>> > > w: appinions.com <http://www.appinions.com/>
>> > >
>> > >
>> > > On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody <
>> > > dileepajayakody@gmail.com
>> > > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I'm a newbie to Solr, and I have a requirement to import data from a
>> > > mysql
>> > > > database; enhance  the imported content to identify Persons
>> mentioned
>> > >  and
>> > > > index it as a separate field in Solr along with the other fields
>> > defined
>> > > > for the original db query.
>> > > >
>> > > > I'm using Apache Stanbol [1] for the content enhancement
>> requirement.
>> > > > I can get enhancement results for 'Person' type data in the content
>> as
>> > > the
>> > > > enhancement result.
>> > > >
>> > > > The data flow will be;
>> > > > mysql-db > Solr data-import handler > Stanbol enhancer > Solr index
>> > > >
>> > > > For the above requirement I need to perform additional processing at
>> > the
>> > > > data-import handler prior to indexing to send a request to Stanbol
>> and
>> > > > process the enhancement response. I found some related examples on
>> > > > modifying mysql data import handler to customize the query results
>> in
>> > > > db-data-config.xml by using a transformer script.
>> > > > As per my requirement, In the data-import-handler I need to send a
>> > > request
>> > > > to Stanbol and process the response prior to indexing. But I'm not
>> sure
>> > > if
>> > > > this can be achieved using a simple javascript.
>> > > >
>> > > > Is there any other better way of achieving my requirement? Maybe
>> > writing
>> > > a
>> > > > custom filter in Solr?
>> > > > Please share your thoughts. Appreciate any pointers as I'm a
>> beginner
>> > for
>> > > > Solr.
>> > > >
>> > > > Thanks,
>> > > > Dileepa
>> > > >
>> > > >
>> > > > [1] https://stanbol.apache.org
>> > > >
>> > >
>> >
>>
>
>