You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Prasi S <pr...@gmail.com> on 2014/03/20 10:23:30 UTC
Solr dih to read Clob contents
Hi,
I have a requirement to index a database table with clob content. Each row
in my table a column which is an xml stored as clob. I want to read the
contents of xmlthrough dih and map each of the xml tag to a separate solr
field,
Below is my clob content.
<root>
<author>A</author>
<date>02-Dec-2013</date>
.
.
.
</root>
i want to read the contents of the clob and map author to author_solr and
date to date_solr . Is this possible with a clob tranformer or a script
tranformer.
Thanks,
Prasi
Re: Solr dih to read Clob contents
Posted by Prasi S <pr...@gmail.com>.
The column in my database is of xml datatype. But if I do not use
XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY , and instead take SMRY field
directly as
select ID,SMRY from BOOK_REC, i get the below error,
Exception while processing: x document : SolrInputDocument(fields:
[id=45768734]):org.apache.solr.handler.dataimport.DataImportHandlerException:
Parsing failed for xml, url:null rows processed:0 Processing Document # 1
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
character 'c' (code 99) in prolog; expected '<'
at javax.xml.stream.SerializableLocation@5780578
Thanks,
Prasi
On Mon, Mar 24, 2014 at 3:51 PM, Prasi S <pr...@gmail.com> wrote:
> Below is my full configuration,
>
> <dataConfig>
> <dataSource driver="com.ibm.db2.jcc.DB2Driver"
> url="jdbc:db2://IP:port/dbname" user="" password="" />
>
> <dataSource name="xmldata" type="FieldReaderDataSource"/>
>
> <document>
>
> <entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
> FROM BOOK_REC fetch first 40 rows only"
> transformer="ClobTransformer" >
> <field column="MBR" name="mbr" />
> <entity name="y" dataSource="xmldata" dataField="x.SMRY"
> processor="XPathEntityProcessor"
> forEach="/*:summary" rootEntity="true" >
> <field column="card_no" xpath="/cardNo" />
>
> </entity>
> </entity>
> </document>
> </dataConfig>
>
> And this is my xml data
>
> <ns:summary xmlns:ns="***">
> <cardNo>ZAYQ5181</tripId>
> <firstName>Sam</firstName>
> <lastName>Mathews</lastName>
> <date>2013-01-18T23:29:04.492</date>
> </ns:summary>
>
> Thanks,
> Prasi
>
>
> On Mon, Mar 24, 2014 at 3:23 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
>> 1. I don't see the definition of a datasource named 'xmldata' in your
>> data-config.
>> 2. You have forEach="/*:summary" but I don't think that is a syntax
>> supported by XPathRecordReader.
>>
>> If you can give a sample of the xml stored as Clob in your database,
>> then we can help you write the right xpaths.
>>
>> On Mon, Mar 24, 2014 at 12:55 PM, Prasi S <pr...@gmail.com> wrote:
>> > My database configuration is as below
>> >
>> > <entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as
>> SMRY
>> > FROM BOOK_REC fetch first 40 rows only"
>> > transformer="ClobTransformer" >
>> > <field column="MBR" name="mbr" />
>> > <entity name="y" dataSource="xmldata" dataField="x.SMRY"
>> > processor="XPathEntityProcessor"
>> > forEach="/*:summary" rootEntity="true" >
>> > <field column="card_no" xpath="/cardNo" />
>> >
>> > </entity>
>> > </entity>
>> >
>> > and i get my response from solr as below
>> >
>> > <doc>
>> > <str name="card_no">org.......@1c8e807</str>
>> >
>> > Am i mising anything?
>> >
>> >
>> >
>> > Thanks,
>> > Prasi
>> >
>> >
>> > On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty <go...@mimirtech.com>
>> wrote:
>> >
>> >> On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
>> >> >
>> >> > Hi,
>> >> > I have a requirement to index a database table with clob content.
>> Each
>> >> row
>> >> > in my table a column which is an xml stored as clob. I want to read
>> the
>> >> > contents of xmlthrough dih and map each of the xml tag to a separate
>> solr
>> >> > field,
>> >> >
>> >> > Below is my clob content.
>> >> > <root>
>> >> > <author>A</author>
>> >> > <date>02-Dec-2013</date>
>> >> > .
>> >> > .
>> >> > .
>> >> > </root>
>> >> >
>> >> > i want to read the contents of the clob and map author to
>> author_solr and
>> >> > date to date_solr . Is this possible with a clob tranformer or a
>> script
>> >> > tranformer.
>> >>
>> >> You will need to use a FieldReaderDataSource, and a
>> XPathEntityProcessor
>> >> along with the ClobTransformer. You do not provide details of your DIH
>> data
>> >> configuration file, but this should look something like:
>> >>
>> >> <dataSource name="xmldata" type="FieldReaderDataSource"/>
>> >> ...
>> >> <document>
>> >> <entity name="x" query="..." transformer="ClobTransformer">
>> >> <entity name="y" dataSource="xmldata" dataField="x.clob_column"
>> >> processor="XPathEntityProcessor" forEach="/root">
>> >> <field column="author_solr" xpath="/author" />
>> >> <field column="date_solr" xpath="/date" />
>> >> </entity>
>> >> </entity>
>> >> </document>
>> >>
>> >> Regards,
>> >> Gora
>> >>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>
Re: Solr dih to read Clob contents
Posted by Prasi S <pr...@gmail.com>.
Below is my full configuration,
<dataConfig>
<dataSource driver="com.ibm.db2.jcc.DB2Driver"
url="jdbc:db2://IP:port/dbname" user="" password="" />
<dataSource name="xmldata" type="FieldReaderDataSource"/>
<document>
<entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
FROM BOOK_REC fetch first 40 rows only"
transformer="ClobTransformer" >
<field column="MBR" name="mbr" />
<entity name="y" dataSource="xmldata" dataField="x.SMRY"
processor="XPathEntityProcessor"
forEach="/*:summary" rootEntity="true" >
<field column="card_no" xpath="/cardNo" />
</entity>
</entity>
</document>
</dataConfig>
And this is my xml data
<ns:summary xmlns:ns="***">
<cardNo>ZAYQ5181</tripId>
<firstName>Sam</firstName>
<lastName>Mathews</lastName>
<date>2013-01-18T23:29:04.492</date>
</ns:summary>
Thanks,
Prasi
On Mon, Mar 24, 2014 at 3:23 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:
> 1. I don't see the definition of a datasource named 'xmldata' in your
> data-config.
> 2. You have forEach="/*:summary" but I don't think that is a syntax
> supported by XPathRecordReader.
>
> If you can give a sample of the xml stored as Clob in your database,
> then we can help you write the right xpaths.
>
> On Mon, Mar 24, 2014 at 12:55 PM, Prasi S <pr...@gmail.com> wrote:
> > My database configuration is as below
> >
> > <entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as
> SMRY
> > FROM BOOK_REC fetch first 40 rows only"
> > transformer="ClobTransformer" >
> > <field column="MBR" name="mbr" />
> > <entity name="y" dataSource="xmldata" dataField="x.SMRY"
> > processor="XPathEntityProcessor"
> > forEach="/*:summary" rootEntity="true" >
> > <field column="card_no" xpath="/cardNo" />
> >
> > </entity>
> > </entity>
> >
> > and i get my response from solr as below
> >
> > <doc>
> > <str name="card_no">org.......@1c8e807</str>
> >
> > Am i mising anything?
> >
> >
> >
> > Thanks,
> > Prasi
> >
> >
> > On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty <go...@mimirtech.com>
> wrote:
> >
> >> On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
> >> >
> >> > Hi,
> >> > I have a requirement to index a database table with clob content. Each
> >> row
> >> > in my table a column which is an xml stored as clob. I want to read
> the
> >> > contents of xmlthrough dih and map each of the xml tag to a separate
> solr
> >> > field,
> >> >
> >> > Below is my clob content.
> >> > <root>
> >> > <author>A</author>
> >> > <date>02-Dec-2013</date>
> >> > .
> >> > .
> >> > .
> >> > </root>
> >> >
> >> > i want to read the contents of the clob and map author to author_solr
> and
> >> > date to date_solr . Is this possible with a clob tranformer or a
> script
> >> > tranformer.
> >>
> >> You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
> >> along with the ClobTransformer. You do not provide details of your DIH
> data
> >> configuration file, but this should look something like:
> >>
> >> <dataSource name="xmldata" type="FieldReaderDataSource"/>
> >> ...
> >> <document>
> >> <entity name="x" query="..." transformer="ClobTransformer">
> >> <entity name="y" dataSource="xmldata" dataField="x.clob_column"
> >> processor="XPathEntityProcessor" forEach="/root">
> >> <field column="author_solr" xpath="/author" />
> >> <field column="date_solr" xpath="/date" />
> >> </entity>
> >> </entity>
> >> </document>
> >>
> >> Regards,
> >> Gora
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
Re: Solr dih to read Clob contents
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
1. I don't see the definition of a datasource named 'xmldata' in your
data-config.
2. You have forEach="/*:summary" but I don't think that is a syntax
supported by XPathRecordReader.
If you can give a sample of the xml stored as Clob in your database,
then we can help you write the right xpaths.
On Mon, Mar 24, 2014 at 12:55 PM, Prasi S <pr...@gmail.com> wrote:
> My database configuration is as below
>
> <entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
> FROM BOOK_REC fetch first 40 rows only"
> transformer="ClobTransformer" >
> <field column="MBR" name="mbr" />
> <entity name="y" dataSource="xmldata" dataField="x.SMRY"
> processor="XPathEntityProcessor"
> forEach="/*:summary" rootEntity="true" >
> <field column="card_no" xpath="/cardNo" />
>
> </entity>
> </entity>
>
> and i get my response from solr as below
>
> <doc>
> <str name="card_no">org.......@1c8e807</str>
>
> Am i mising anything?
>
>
>
> Thanks,
> Prasi
>
>
> On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty <go...@mimirtech.com> wrote:
>
>> On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
>> >
>> > Hi,
>> > I have a requirement to index a database table with clob content. Each
>> row
>> > in my table a column which is an xml stored as clob. I want to read the
>> > contents of xmlthrough dih and map each of the xml tag to a separate solr
>> > field,
>> >
>> > Below is my clob content.
>> > <root>
>> > <author>A</author>
>> > <date>02-Dec-2013</date>
>> > .
>> > .
>> > .
>> > </root>
>> >
>> > i want to read the contents of the clob and map author to author_solr and
>> > date to date_solr . Is this possible with a clob tranformer or a script
>> > tranformer.
>>
>> You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
>> along with the ClobTransformer. You do not provide details of your DIH data
>> configuration file, but this should look something like:
>>
>> <dataSource name="xmldata" type="FieldReaderDataSource"/>
>> ...
>> <document>
>> <entity name="x" query="..." transformer="ClobTransformer">
>> <entity name="y" dataSource="xmldata" dataField="x.clob_column"
>> processor="XPathEntityProcessor" forEach="/root">
>> <field column="author_solr" xpath="/author" />
>> <field column="date_solr" xpath="/date" />
>> </entity>
>> </entity>
>> </document>
>>
>> Regards,
>> Gora
>>
--
Regards,
Shalin Shekhar Mangar.
Re: Solr dih to read Clob contents
Posted by Prasi S <pr...@gmail.com>.
My database configuration is as below
<entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
FROM BOOK_REC fetch first 40 rows only"
transformer="ClobTransformer" >
<field column="MBR" name="mbr" />
<entity name="y" dataSource="xmldata" dataField="x.SMRY"
processor="XPathEntityProcessor"
forEach="/*:summary" rootEntity="true" >
<field column="card_no" xpath="/cardNo" />
</entity>
</entity>
and i get my response from solr as below
<doc>
<str name="card_no">org.......@1c8e807</str>
Am i mising anything?
Thanks,
Prasi
On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty <go...@mimirtech.com> wrote:
> On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
> >
> > Hi,
> > I have a requirement to index a database table with clob content. Each
> row
> > in my table a column which is an xml stored as clob. I want to read the
> > contents of xmlthrough dih and map each of the xml tag to a separate solr
> > field,
> >
> > Below is my clob content.
> > <root>
> > <author>A</author>
> > <date>02-Dec-2013</date>
> > .
> > .
> > .
> > </root>
> >
> > i want to read the contents of the clob and map author to author_solr and
> > date to date_solr . Is this possible with a clob tranformer or a script
> > tranformer.
>
> You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
> along with the ClobTransformer. You do not provide details of your DIH data
> configuration file, but this should look something like:
>
> <dataSource name="xmldata" type="FieldReaderDataSource"/>
> ...
> <document>
> <entity name="x" query="..." transformer="ClobTransformer">
> <entity name="y" dataSource="xmldata" dataField="x.clob_column"
> processor="XPathEntityProcessor" forEach="/root">
> <field column="author_solr" xpath="/author" />
> <field column="date_solr" xpath="/date" />
> </entity>
> </entity>
> </document>
>
> Regards,
> Gora
>
Re: Solr dih to read Clob contents
Posted by Gora Mohanty <go...@mimirtech.com>.
On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
>
> Hi,
> I have a requirement to index a database table with clob content. Each row
> in my table a column which is an xml stored as clob. I want to read the
> contents of xmlthrough dih and map each of the xml tag to a separate solr
> field,
>
> Below is my clob content.
> <root>
> <author>A</author>
> <date>02-Dec-2013</date>
> .
> .
> .
> </root>
>
> i want to read the contents of the clob and map author to author_solr and
> date to date_solr . Is this possible with a clob tranformer or a script
> tranformer.
You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
along with the ClobTransformer. You do not provide details of your DIH data
configuration file, but this should look something like:
<dataSource name="xmldata" type="FieldReaderDataSource"/>
...
<document>
<entity name="x" query="..." transformer="ClobTransformer">
<entity name="y" dataSource="xmldata" dataField="x.clob_column"
processor="XPathEntityProcessor" forEach="/root">
<field column="author_solr" xpath="/author" />
<field column="date_solr" xpath="/date" />
</entity>
</entity>
</document>
Regards,
Gora