You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by meghana <me...@amultek.com> on 2011/12/23 10:24:55 UTC
PlainTexttransformer and RegexTransformer in DataImport Handler
Hi all,
I need to import data from my text file (which have HTML text). and need to
apply some formatting on it. i want all text with in <p> tag , and i want it
to be preceded by one element of p tag in my output, like below.
Original Text
------------------------------------------------------------------------------------------
<div><p myvar="12" myvar1="xyz">Hello World!!</p><p myvar="14"
myvar1="abc">Welcome to Solr.</p><p myvar="15" myvar1="def">Enjoy</p></div>
Needed Text After Formattting
------------------------------------------------------------------------------------------
12 : Hello World!!
14 : Welcome to Solr.
15 : Enjoy
I have applied combination of PlainTextTransformer , RegexTransformer and
TemplateTransformer for that as below. but i am receiving ConfigurationError
when i set that.
<entity name="xx" onError="continue"
processor="PlainTextEntityProcessor,TemplateTransformer,RegexTransformer"
url="${URL.MyTxtFile}" dataSource="MDataSource">
<field column="plainText" name="FullText" />
<field column="FullText"
template="${xx.FullText}" regex='<p (?:\s+[^>]+)?
myvar="([^<"]*)" (?:\s+[^>]+)?>([^<]*)</p>' replaceWith="$2 : $4"/>
</entity>
I like to add here that i am able do this using TempleteTransformer and
multivalued field. but i need above format in signle valued field, for which
i am failed to do it.
Can any body help me, how can i get my desired result? or what i am doing
wrong on above transformer?
Thanks
Meghana
--
View this message in context: http://lucene.472066.n3.nabble.com/PlainTexttransformer-and-RegexTransformer-in-DataImport-Handler-tp3608415p3608415.html
Sent from the Solr - User mailing list archive at Nabble.com.