You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ahmed Hammad <ah...@gmail.com> on 2008/11/05 19:18:33 UTC

Regex Transformer Error

Hi,

I am using Solr 1.3 data import handler. One of my table fields has html
tags, I want to strip it of the field text. So obviously I need the Regex
Transformer.

I added transformer="RegexTransformer" attribute to my entity and a new
field with:

<field sourceColName="content" column="content" regex="English"
replaceWith="XXXXX"/>

Every thing works fine. The text is replace without any problem. The provlem
happend with my regular experession to strip html tags. So I use
regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not allowed in
XML. I tried the following
regex="&lt;(.|\n)*?&gt;" and regex="&#3C;(.|\n)*?&#3E;" but I get the
following error:

The value of attribute "regex" associated with an element type "field" must
not contain the '<' character. at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
...

The full stack trace is following:

*FATAL: Could not create importer. DataImporter config invalid
org.apache.solr.common.SolrException: FATAL: Could not create importer.
DataImporter config invalid at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
at java.lang.Thread.run(Unknown Source) Caused by:
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
occurred while initializing context Processing Document # at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
at
org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.java:93)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute
"regex" associated with an element type "field" must not contain the '<'
character. at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
Source) at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:166)
... 19 more *

*description* *The server encountered an internal error (FATAL: Could not
create importer. DataImporter config invalid
org.apache.solr.common.SolrException: FATAL: Could not create importer.
DataImporter config invalid at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
at
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
at java.lang.Thread.run(Unknown Source) Caused by:
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
occurred while initializing context Processing Document # at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
at
org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.java:93)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute
"regex" associated with an element type "field" must not contain the '<'
character. at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
Source) at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:166)
... 19 more ) that prevented it from fulfilling this request.*

I appreciate your help.

Regards,
ahmd

Re: Regex Transformer Error

Posted by Ahmed Hammad <ah...@gmail.com>.
OK, I contributed it at:
https://issues.apache.org/jira/browse/SOLR-887

I changed it to use Solr class org.apache.solr.analysis.HTMLStripReader

Thank you all.

Ahmed



On Tue, Nov 18, 2008 at 5:49 AM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.paul@gmail.com> wrote:

> On Tue, Nov 18, 2008 at 2:49 AM, Ahmed Hammad <ah...@gmail.com> wrote:
> > Hi All,
> >
> > Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it
> > will be stored in the index and needed to be removed while searching. In
> my
> > case the HTML tags has no need at all. So I created HTMLStripTransformer
> for
> > the DIH to remove the HTML tags and save space on the index. I have used
> the
> > HTML parser included with Lucene ( org.apache.lucene.demo.html). It is
> well
> > performing and worked with me (while working with Lucene before moving to
> > Solr)
> >
> > What do you think? Does it worth contribution?
> Yes. You can contribute this new transformer as an enhancement .
> >
> > My best wishes,
> >
> > Regards,
> > Ahmed
> >
> > On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance <la...@divvio.com> wrote:
> >
> >> There is a nice HTML stripper inside Solr.
> >> "solr.HTMLStripStandardTokenizerFactory"
> >>
> >> -----Original Message-----
> >> From: Ahmed Hammad [mailto:ahm507@gmail.com]
> >> Sent: Wednesday, November 05, 2008 10:43 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Regex Transformer Error
> >>
> >> Hi,
> >>
> >> It works with the attribute regex="&lt;(.|\n)*?&gt;"
> >>
> >> Sorry for the disturbance.
> >>
> >> Regards,
> >>
> >> ahmd
> >>
> >>
> >> On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <ah...@gmail.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am using Solr 1.3 data import handler. One of my table fields has
> >> > html tags, I want to strip it of the field text. So obviously I need
> >> > the Regex Transformer.
> >> >
> >> > I added transformer="RegexTransformer" attribute to my entity and a
> >> > new field with:
> >> >
> >> > <field sourceColName="content" column="content" regex="English"
> >> > replaceWith="XXXXX"/>
> >> >
> >> > Every thing works fine. The text is replace without any problem. The
> >> > provlem happend with my regular experession to strip html tags. So I
> >> > use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not
> >> > allowed in XML. I tried the following regex="&lt;(.|\n)*?&gt;" and
> >> > regex="&#3C;(.|\n)*?&#3E;" but I get the following error:
> >> >
> >> > The value of attribute "regex" associated with an element type "field"
> >>
> >> > must not contain the '<' character. at
> >> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> >> > Source) ...
> >> >
> >> > The full stack trace is following:
> >> >
> >> > *FATAL: Could not create importer. DataImporter config invalid
> >> > org.apache.solr.common.SolrException: FATAL: Could not create
> >> importer.
> >> > DataImporter config invalid at
> >> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> >> > Handler.java:114)
> >> > at
> >> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> >> > (DataImportHandler.java:206)
> >> > at
> >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> >> > rBase.java:131) at
> >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> >> > java:303)
> >> > at
> >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> >> > .java:232)
> >> > at
> >> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> >> > cationFilterChain.java:235)
> >> > at
> >> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> >> > lterChain.java:206)
> >> > at
> >> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> >> > lve.java:233)
> >> > at
> >> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> >> > lve.java:191)
> >> > at
> >> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> >> > va:128)
> >> > at
> >> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> >> > va:102)
> >> > at
> >> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> >> > e.java:109)
> >> > at
> >> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> >> > :286)
> >> > at
> >> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> >> > .java:857)
> >> > at
> >> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> >> > cess(Http11AprProtocol.java:565) at
> >> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> >> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> >> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> >> > Exception occurred while initializing context Processing Document # at
> >> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> >> > orter.java:176)
> >> > at
> >> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> >> > va:93)
> >> > at
> >> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> >> > Handler.java:106) ... 17 more Caused by:
> >> > org.xml.sax.SAXParseException: The value of attribute "regex"
> >> > associated with an element type "field" must not contain the '<'
> >> > character. at
> >> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> >> > Source) at
> >> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> >> > own
> >> > Source) at
> >> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> >> > orter.java:166)
> >> > ... 19 more *
> >> >
> >> > *description* *The server encountered an internal error (FATAL: Could
> >> > not create importer. DataImporter config invalid
> >> > org.apache.solr.common.SolrException: FATAL: Could not create
> >> importer.
> >> > DataImporter config invalid at
> >> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> >> > Handler.java:114)
> >> > at
> >> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> >> > (DataImportHandler.java:206)
> >> > at
> >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> >> > rBase.java:131) at
> >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> >> > java:303)
> >> > at
> >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> >> > .java:232)
> >> > at
> >> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> >> > cationFilterChain.java:235)
> >> > at
> >> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> >> > lterChain.java:206)
> >> > at
> >> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> >> > lve.java:233)
> >> > at
> >> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> >> > lve.java:191)
> >> > at
> >> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> >> > va:128)
> >> > at
> >> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> >> > va:102)
> >> > at
> >> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> >> > e.java:109)
> >> > at
> >> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> >> > :286)
> >> > at
> >> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> >> > .java:857)
> >> > at
> >> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> >> > cess(Http11AprProtocol.java:565) at
> >> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> >> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> >> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> >> > Exception occurred while initializing context Processing Document # at
> >> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> >> > orter.java:176)
> >> > at
> >> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> >> > va:93)
> >> > at
> >> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> >> > Handler.java:106) ... 17 more Caused by:
> >> > org.xml.sax.SAXParseException: The value of attribute "regex"
> >> > associated with an element type "field" must not contain the '<'
> >> > character. at
> >> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> >> > Source) at
> >> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> >> > own
> >> > Source) at
> >> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> >> > orter.java:166) ... 19 more ) that prevented it from fulfilling this
> >> > request.*
> >> >
> >> > I appreciate your help.
> >> >
> >> > Regards,
> >> > ahmd
> >> >
> >> >
> >>
> >
>
>
>
> --
> --Noble Paul
>

Re: Regex Transformer Error

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
On Tue, Nov 18, 2008 at 2:49 AM, Ahmed Hammad <ah...@gmail.com> wrote:
> Hi All,
>
> Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it
> will be stored in the index and needed to be removed while searching. In my
> case the HTML tags has no need at all. So I created HTMLStripTransformer for
> the DIH to remove the HTML tags and save space on the index. I have used the
> HTML parser included with Lucene ( org.apache.lucene.demo.html). It is well
> performing and worked with me (while working with Lucene before moving to
> Solr)
>
> What do you think? Does it worth contribution?
Yes. You can contribute this new transformer as an enhancement .
>
> My best wishes,
>
> Regards,
> Ahmed
>
> On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance <la...@divvio.com> wrote:
>
>> There is a nice HTML stripper inside Solr.
>> "solr.HTMLStripStandardTokenizerFactory"
>>
>> -----Original Message-----
>> From: Ahmed Hammad [mailto:ahm507@gmail.com]
>> Sent: Wednesday, November 05, 2008 10:43 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Regex Transformer Error
>>
>> Hi,
>>
>> It works with the attribute regex="&lt;(.|\n)*?&gt;"
>>
>> Sorry for the disturbance.
>>
>> Regards,
>>
>> ahmd
>>
>>
>> On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <ah...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I am using Solr 1.3 data import handler. One of my table fields has
>> > html tags, I want to strip it of the field text. So obviously I need
>> > the Regex Transformer.
>> >
>> > I added transformer="RegexTransformer" attribute to my entity and a
>> > new field with:
>> >
>> > <field sourceColName="content" column="content" regex="English"
>> > replaceWith="XXXXX"/>
>> >
>> > Every thing works fine. The text is replace without any problem. The
>> > provlem happend with my regular experession to strip html tags. So I
>> > use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not
>> > allowed in XML. I tried the following regex="&lt;(.|\n)*?&gt;" and
>> > regex="&#3C;(.|\n)*?&#3E;" but I get the following error:
>> >
>> > The value of attribute "regex" associated with an element type "field"
>>
>> > must not contain the '<' character. at
>> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
>> > Source) ...
>> >
>> > The full stack trace is following:
>> >
>> > *FATAL: Could not create importer. DataImporter config invalid
>> > org.apache.solr.common.SolrException: FATAL: Could not create
>> importer.
>> > DataImporter config invalid at
>> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
>> > Handler.java:114)
>> > at
>> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
>> > (DataImportHandler.java:206)
>> > at
>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
>> > rBase.java:131) at
>> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
>> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
>> > java:303)
>> > at
>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
>> > .java:232)
>> > at
>> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
>> > cationFilterChain.java:235)
>> > at
>> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
>> > lterChain.java:206)
>> > at
>> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
>> > lve.java:233)
>> > at
>> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
>> > lve.java:191)
>> > at
>> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
>> > va:128)
>> > at
>> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
>> > va:102)
>> > at
>> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
>> > e.java:109)
>> > at
>> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
>> > :286)
>> > at
>> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
>> > .java:857)
>> > at
>> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
>> > cess(Http11AprProtocol.java:565) at
>> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
>> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
>> > org.apache.solr.handler.dataimport.DataImportHandlerException:
>> > Exception occurred while initializing context Processing Document # at
>> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
>> > orter.java:176)
>> > at
>> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
>> > va:93)
>> > at
>> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
>> > Handler.java:106) ... 17 more Caused by:
>> > org.xml.sax.SAXParseException: The value of attribute "regex"
>> > associated with an element type "field" must not contain the '<'
>> > character. at
>> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
>> > Source) at
>> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
>> > own
>> > Source) at
>> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
>> > orter.java:166)
>> > ... 19 more *
>> >
>> > *description* *The server encountered an internal error (FATAL: Could
>> > not create importer. DataImporter config invalid
>> > org.apache.solr.common.SolrException: FATAL: Could not create
>> importer.
>> > DataImporter config invalid at
>> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
>> > Handler.java:114)
>> > at
>> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
>> > (DataImportHandler.java:206)
>> > at
>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
>> > rBase.java:131) at
>> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
>> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
>> > java:303)
>> > at
>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
>> > .java:232)
>> > at
>> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
>> > cationFilterChain.java:235)
>> > at
>> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
>> > lterChain.java:206)
>> > at
>> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
>> > lve.java:233)
>> > at
>> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
>> > lve.java:191)
>> > at
>> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
>> > va:128)
>> > at
>> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
>> > va:102)
>> > at
>> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
>> > e.java:109)
>> > at
>> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
>> > :286)
>> > at
>> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
>> > .java:857)
>> > at
>> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
>> > cess(Http11AprProtocol.java:565) at
>> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
>> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
>> > org.apache.solr.handler.dataimport.DataImportHandlerException:
>> > Exception occurred while initializing context Processing Document # at
>> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
>> > orter.java:176)
>> > at
>> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
>> > va:93)
>> > at
>> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
>> > Handler.java:106) ... 17 more Caused by:
>> > org.xml.sax.SAXParseException: The value of attribute "regex"
>> > associated with an element type "field" must not contain the '<'
>> > character. at
>> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
>> > Source) at
>> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
>> > own
>> > Source) at
>> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
>> > orter.java:166) ... 19 more ) that prevented it from fulfilling this
>> > request.*
>> >
>> > I appreciate your help.
>> >
>> > Regards,
>> > ahmd
>> >
>> >
>>
>



-- 
--Noble Paul

Re: Regex Transformer Error

Posted by Ahmed Hammad <ah...@gmail.com>.
Hi All,

Although the HTMLStripStandardTokenizerFactory will remove HTML tags, it
will be stored in the index and needed to be removed while searching. In my
case the HTML tags has no need at all. So I created HTMLStripTransformer for
the DIH to remove the HTML tags and save space on the index. I have used the
HTML parser included with Lucene ( org.apache.lucene.demo.html). It is well
performing and worked with me (while working with Lucene before moving to
Solr)

What do you think? Does it worth contribution?

My best wishes,

Regards,
Ahmed

On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance <la...@divvio.com> wrote:

> There is a nice HTML stripper inside Solr.
> "solr.HTMLStripStandardTokenizerFactory"
>
> -----Original Message-----
> From: Ahmed Hammad [mailto:ahm507@gmail.com]
> Sent: Wednesday, November 05, 2008 10:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Regex Transformer Error
>
> Hi,
>
> It works with the attribute regex="&lt;(.|\n)*?&gt;"
>
> Sorry for the disturbance.
>
> Regards,
>
> ahmd
>
>
> On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <ah...@gmail.com> wrote:
>
> > Hi,
> >
> > I am using Solr 1.3 data import handler. One of my table fields has
> > html tags, I want to strip it of the field text. So obviously I need
> > the Regex Transformer.
> >
> > I added transformer="RegexTransformer" attribute to my entity and a
> > new field with:
> >
> > <field sourceColName="content" column="content" regex="English"
> > replaceWith="XXXXX"/>
> >
> > Every thing works fine. The text is replace without any problem. The
> > provlem happend with my regular experession to strip html tags. So I
> > use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not
> > allowed in XML. I tried the following regex="&lt;(.|\n)*?&gt;" and
> > regex="&#3C;(.|\n)*?&#3E;" but I get the following error:
> >
> > The value of attribute "regex" associated with an element type "field"
>
> > must not contain the '<' character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) ...
> >
> > The full stack trace is following:
> >
> > *FATAL: Could not create importer. DataImporter config invalid
> > org.apache.solr.common.SolrException: FATAL: Could not create
> importer.
> > DataImporter config invalid at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:114)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> > (DataImportHandler.java:206)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:131) at
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:303)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:232)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:235)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > at
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > at
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > at
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:128)
> > at
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > at
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :286)
> > at
> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> > .java:857)
> > at
> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> > cess(Http11AprProtocol.java:565) at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Exception occurred while initializing context Processing Document # at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:176)
> > at
> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> > va:93)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:106) ... 17 more Caused by:
> > org.xml.sax.SAXParseException: The value of attribute "regex"
> > associated with an element type "field" must not contain the '<'
> > character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) at
> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> > own
> > Source) at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:166)
> > ... 19 more *
> >
> > *description* *The server encountered an internal error (FATAL: Could
> > not create importer. DataImporter config invalid
> > org.apache.solr.common.SolrException: FATAL: Could not create
> importer.
> > DataImporter config invalid at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:114)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> > (DataImportHandler.java:206)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:131) at
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:303)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:232)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:235)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > at
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > at
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > at
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:128)
> > at
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > at
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :286)
> > at
> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> > .java:857)
> > at
> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> > cess(Http11AprProtocol.java:565) at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Exception occurred while initializing context Processing Document # at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:176)
> > at
> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> > va:93)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:106) ... 17 more Caused by:
> > org.xml.sax.SAXParseException: The value of attribute "regex"
> > associated with an element type "field" must not contain the '<'
> > character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) at
> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> > own
> > Source) at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:166) ... 19 more ) that prevented it from fulfilling this
> > request.*
> >
> > I appreciate your help.
> >
> > Regards,
> > ahmd
> >
> >
>

Re: Regex Transformer Error

Posted by Ahmed Hammad <ah...@gmail.com>.
It worked by replace < with &lt; and > with &gt;

Thank you for your support,
ahmd

On Thu, Nov 6, 2008 at 2:39 AM, Norskog, Lance <la...@divvio.com> wrote:

> There is a nice HTML stripper inside Solr.
> "solr.HTMLStripStandardTokenizerFactory"
>



>
> -----Original Message-----
> From: Ahmed Hammad [mailto:ahm507@gmail.com]
> Sent: Wednesday, November 05, 2008 10:43 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Regex Transformer Error
>
> Hi,
>
> It works with the attribute regex="&lt;(.|\n)*?&gt;"
>
> Sorry for the disturbance.
>
> Regards,
>
> ahmd
>
>
> On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <ah...@gmail.com> wrote:
>
> > Hi,
> >
> > I am using Solr 1.3 data import handler. One of my table fields has
> > html tags, I want to strip it of the field text. So obviously I need
> > the Regex Transformer.
> >
> > I added transformer="RegexTransformer" attribute to my entity and a
> > new field with:
> >
> > <field sourceColName="content" column="content" regex="English"
> > replaceWith="XXXXX"/>
> >
> > Every thing works fine. The text is replace without any problem. The
> > provlem happend with my regular experession to strip html tags. So I
> > use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not
> > allowed in XML. I tried the following regex="&lt;(.|\n)*?&gt;" and
> > regex="&#3C;(.|\n)*?&#3E;" but I get the following error:
> >
> > The value of attribute "regex" associated with an element type "field"
>
> > must not contain the '<' character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) ...
> >
> > The full stack trace is following:
> >
> > *FATAL: Could not create importer. DataImporter config invalid
> > org.apache.solr.common.SolrException: FATAL: Could not create
> importer.
> > DataImporter config invalid at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:114)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> > (DataImportHandler.java:206)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:131) at
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:303)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:232)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:235)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > at
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > at
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > at
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:128)
> > at
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > at
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :286)
> > at
> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> > .java:857)
> > at
> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> > cess(Http11AprProtocol.java:565) at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Exception occurred while initializing context Processing Document # at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:176)
> > at
> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> > va:93)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:106) ... 17 more Caused by:
> > org.xml.sax.SAXParseException: The value of attribute "regex"
> > associated with an element type "field" must not contain the '<'
> > character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) at
> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> > own
> > Source) at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:166)
> > ... 19 more *
> >
> > *description* *The server encountered an internal error (FATAL: Could
> > not create importer. DataImporter config invalid
> > org.apache.solr.common.SolrException: FATAL: Could not create
> importer.
> > DataImporter config invalid at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:114)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> > (DataImportHandler.java:206)
> > at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:131) at
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:303)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:232)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:235)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206)
> > at
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:233)
> > at
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:191)
> > at
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:128)
> > at
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:102)
> > at
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:109)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :286)
> > at
> > org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> > .java:857)
> > at
> > org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> > cess(Http11AprProtocol.java:565) at
> > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> > 9) at java.lang.Thread.run(Unknown Source) Caused by:
> > org.apache.solr.handler.dataimport.DataImportHandlerException:
> > Exception occurred while initializing context Processing Document # at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:176)
> > at
> > org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> > va:93)
> > at
> > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> > Handler.java:106) ... 17 more Caused by:
> > org.xml.sax.SAXParseException: The value of attribute "regex"
> > associated with an element type "field" must not contain the '<'
> > character. at
> > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
> > Source) at
> > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> > own
> > Source) at
> > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> > orter.java:166) ... 19 more ) that prevented it from fulfilling this
> > request.*
> >
> > I appreciate your help.
> >
> > Regards,
> > ahmd
> >
> >
>

RE: Regex Transformer Error

Posted by "Norskog, Lance" <la...@divvio.com>.
There is a nice HTML stripper inside Solr.
"solr.HTMLStripStandardTokenizerFactory" 

-----Original Message-----
From: Ahmed Hammad [mailto:ahm507@gmail.com] 
Sent: Wednesday, November 05, 2008 10:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Regex Transformer Error

Hi,

It works with the attribute regex="&lt;(.|\n)*?&gt;"

Sorry for the disturbance.

Regards,

ahmd


On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <ah...@gmail.com> wrote:

> Hi,
>
> I am using Solr 1.3 data import handler. One of my table fields has 
> html tags, I want to strip it of the field text. So obviously I need 
> the Regex Transformer.
>
> I added transformer="RegexTransformer" attribute to my entity and a 
> new field with:
>
> <field sourceColName="content" column="content" regex="English"
> replaceWith="XXXXX"/>
>
> Every thing works fine. The text is replace without any problem. The 
> provlem happend with my regular experession to strip html tags. So I 
> use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not 
> allowed in XML. I tried the following regex="&lt;(.|\n)*?&gt;" and 
> regex="&#3C;(.|\n)*?&#3E;" but I get the following error:
>
> The value of attribute "regex" associated with an element type "field"

> must not contain the '<' character. at 
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown 
> Source) ...
>
> The full stack trace is following:
>
> *FATAL: Could not create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create
importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> Handler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> (DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:131) at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> cationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> lterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> lve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> lve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> va:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> va:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> e.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :286)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> .java:857)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> cess(Http11AprProtocol.java:565) at 
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> 9) at java.lang.Thread.run(Unknown Source) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: 
> Exception occurred while initializing context Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> orter.java:176)
> at
> org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> va:93)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> Handler.java:106) ... 17 more Caused by: 
> org.xml.sax.SAXParseException: The value of attribute "regex" 
> associated with an element type "field" must not contain the '<'
> character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown 
> Source) at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> own
> Source) at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> orter.java:166)
> ... 19 more *
>
> *description* *The server encountered an internal error (FATAL: Could 
> not create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create
importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> Handler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> (DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:131) at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> cationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> lterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> lve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> lve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> va:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> va:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> e.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :286)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> .java:857)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> cess(Http11AprProtocol.java:565) at 
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> 9) at java.lang.Thread.run(Unknown Source) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: 
> Exception occurred while initializing context Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> orter.java:176)
> at
> org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> va:93)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> Handler.java:106) ... 17 more Caused by: 
> org.xml.sax.SAXParseException: The value of attribute "regex" 
> associated with an element type "field" must not contain the '<'
> character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown 
> Source) at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> own
> Source) at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> orter.java:166) ... 19 more ) that prevented it from fulfilling this 
> request.*
>
> I appreciate your help.
>
> Regards,
> ahmd
>
>

Re: Regex Transformer Error

Posted by Ahmed Hammad <ah...@gmail.com>.
Hi,

It works with the attribute regex="&lt;(.|\n)*?&gt;"

Sorry for the disturbance.

Regards,

ahmd


On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <ah...@gmail.com> wrote:

> Hi,
>
> I am using Solr 1.3 data import handler. One of my table fields has html
> tags, I want to strip it of the field text. So obviously I need the Regex
> Transformer.
>
> I added transformer="RegexTransformer" attribute to my entity and a new
> field with:
>
> <field sourceColName="content" column="content" regex="English"
> replaceWith="XXXXX"/>
>
> Every thing works fine. The text is replace without any problem. The
> provlem happend with my regular experession to strip html tags. So I use
> regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not allowed in
> XML. I tried the following
> regex="&lt;(.|\n)*?&gt;" and regex="&#3C;(.|\n)*?&#3E;" but I get the
> following error:
>
> The value of attribute "regex" associated with an element type "field" must
> not contain the '<' character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
> ...
>
> The full stack trace is following:
>
> *FATAL: Could not create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
> at java.lang.Thread.run(Unknown Source) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
> occurred while initializing context Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
> at
> org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.java:93)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
> ... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute
> "regex" associated with an element type "field" must not contain the '<'
> character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
> at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
> Source) at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:166)
> ... 19 more *
>
> *description* *The server encountered an internal error (FATAL: Could not
> create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
> at java.lang.Thread.run(Unknown Source) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
> occurred while initializing context Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
> at
> org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.java:93)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
> ... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute
> "regex" associated with an element type "field" must not contain the '<'
> character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
> at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
> Source) at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:166)
> ... 19 more ) that prevented it from fulfilling this request.*
>
> I appreciate your help.
>
> Regards,
> ahmd
>
>

Re: Regex Transformer Error

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
did you try w/o escaping the '<' characters?

On Wed, Nov 5, 2008 at 11:48 PM, Ahmed Hammad <ah...@gmail.com> wrote:
> Hi,
>
> I am using Solr 1.3 data import handler. One of my table fields has html
> tags, I want to strip it of the field text. So obviously I need the Regex
> Transformer.
>
> I added transformer="RegexTransformer" attribute to my entity and a new
> field with:
>
> <field sourceColName="content" column="content" regex="English"
> replaceWith="XXXXX"/>
>
> Every thing works fine. The text is replace without any problem. The provlem
> happend with my regular experession to strip html tags. So I use
> regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not allowed in
> XML. I tried the following
> regex="&lt;(.|\n)*?&gt;" and regex="&#3C;(.|\n)*?&#3E;" but I get the
> following error:
>
> The value of attribute "regex" associated with an element type "field" must
> not contain the '<' character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
> ...
>
> The full stack trace is following:
>
> *FATAL: Could not create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
> at java.lang.Thread.run(Unknown Source) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
> occurred while initializing context Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
> at
> org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.java:93)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
> ... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute
> "regex" associated with an element type "field" must not contain the '<'
> character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
> at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
> Source) at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:166)
> ... 19 more *
>
> *description* *The server encountered an internal error (FATAL: Could not
> create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
> at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
> at java.lang.Thread.run(Unknown Source) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
> occurred while initializing context Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
> at
> org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.java:93)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
> ... 17 more Caused by: org.xml.sax.SAXParseException: The value of attribute
> "regex" associated with an element type "field" must not contain the '<'
> character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
> at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
> Source) at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:166)
> ... 19 more ) that prevented it from fulfilling this request.*
>
> I appreciate your help.
>
> Regards,
> ahmd
>



-- 
--Noble Paul