You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Prasi S <pr...@gmail.com> on 2014/03/20 10:23:30 UTC

Solr dih to read Clob contents

Hi,
I have a requirement to index a database table with clob content. Each row
in my table a column which is an xml stored as clob. I want to read the
contents of xmlthrough dih and map each of the xml tag to a separate solr
field,

Below is my clob content.
<root>
   <author>A</author>
   <date>02-Dec-2013</date>
   .
   .
   .
</root>

i want to read the contents of the clob and map author to author_solr and
date to date_solr . Is this possible with a clob tranformer or a script
tranformer.


Thanks,
Prasi

Re: Solr dih to read Clob contents

Posted by Prasi S <pr...@gmail.com>.
The column in my database is of xml datatype. But if I do not use
XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY , and instead take SMRY field
directly as

select ID,SMRY from BOOK_REC, i get the below error,

Exception while processing: x document : SolrInputDocument(fields:
[id=45768734]):org.apache.solr.handler.dataimport.DataImportHandlerException:

Parsing failed for xml, url:null rows processed:0 Processing Document # 1



Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected
character 'c' (code 99) in prolog; expected '<'
 at javax.xml.stream.SerializableLocation@5780578


Thanks,
Prasi


On Mon, Mar 24, 2014 at 3:51 PM, Prasi S <pr...@gmail.com> wrote:

> Below is my full configuration,
>
> <dataConfig>
> <dataSource driver="com.ibm.db2.jcc.DB2Driver"
> url="jdbc:db2://IP:port/dbname" user="" password="" />
>
> <dataSource name="xmldata" type="FieldReaderDataSource"/>
>
>  <document>
>
> <entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
> FROM BOOK_REC fetch first 40 rows only"
> transformer="ClobTransformer" >
> <field column="MBR" name="mbr" />
> <entity name="y" dataSource="xmldata" dataField="x.SMRY"
> processor="XPathEntityProcessor"
> forEach="/*:summary" rootEntity="true" >
> <field column="card_no" xpath="/cardNo" />
>
> </entity>
> </entity>
>   </document>
> </dataConfig>
>
> And this is my xml data
>
> <ns:summary xmlns:ns="***">
> <cardNo>ZAYQ5181</tripId>
> <firstName>Sam</firstName>
> <lastName>Mathews</lastName>
> <date>2013-01-18T23:29:04.492</date>
> </ns:summary>
>
> Thanks,
> Prasi
>
>
> On Mon, Mar 24, 2014 at 3:23 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
>> 1. I don't see the definition of a datasource named 'xmldata' in your
>> data-config.
>> 2. You have forEach="/*:summary" but I don't think that is a syntax
>> supported by XPathRecordReader.
>>
>> If you can give a sample of the xml stored as Clob in your database,
>> then we can help you write the right xpaths.
>>
>> On Mon, Mar 24, 2014 at 12:55 PM, Prasi S <pr...@gmail.com> wrote:
>> > My database configuration is  as below
>> >
>> >   <entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as
>> SMRY
>> > FROM BOOK_REC  fetch first 40 rows only"
>> >    transformer="ClobTransformer" >
>> >     <field column="MBR" name="mbr" />
>> >            <entity name="y" dataSource="xmldata" dataField="x.SMRY"
>> > processor="XPathEntityProcessor"
>> >             forEach="/*:summary" rootEntity="true" >
>> >                          <field column="card_no" xpath="/cardNo" />
>> >
>> >    </entity>
>> >  </entity>
>> >
>> > and i get my response from solr as below
>> >
>> > <doc>
>> > <str name="card_no">org.......@1c8e807</str>
>> >
>> > Am i mising anything?
>> >
>> >
>> >
>> > Thanks,
>> > Prasi
>> >
>> >
>> > On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty <go...@mimirtech.com>
>> wrote:
>> >
>> >> On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
>> >> >
>> >> > Hi,
>> >> > I have a requirement to index a database table with clob content.
>> Each
>> >> row
>> >> > in my table a column which is an xml stored as clob. I want to read
>> the
>> >> > contents of xmlthrough dih and map each of the xml tag to a separate
>> solr
>> >> > field,
>> >> >
>> >> > Below is my clob content.
>> >> > <root>
>> >> >    <author>A</author>
>> >> >    <date>02-Dec-2013</date>
>> >> >    .
>> >> >    .
>> >> >    .
>> >> > </root>
>> >> >
>> >> > i want to read the contents of the clob and map author to
>> author_solr and
>> >> > date to date_solr . Is this possible with a clob tranformer or a
>> script
>> >> > tranformer.
>> >>
>> >> You will need to use a FieldReaderDataSource, and a
>> XPathEntityProcessor
>> >> along with the ClobTransformer. You do not provide details of your DIH
>> data
>> >> configuration file, but this should look something like:
>> >>
>> >> <dataSource name="xmldata" type="FieldReaderDataSource"/>
>> >> ...
>> >> <document>
>> >>   <entity name="x" query="..." transformer="ClobTransformer">
>> >>      <entity name="y" dataSource="xmldata" dataField="x.clob_column"
>> >> processor="XPathEntityProcessor" forEach="/root">
>> >>        <field column="author_solr" xpath="/author" />
>> >>        <field column="date_solr" xpath="/date" />
>> >>      </entity>
>> >>   </entity>
>> >> </document>
>> >>
>> >> Regards,
>> >> Gora
>> >>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>

Re: Solr dih to read Clob contents

Posted by Prasi S <pr...@gmail.com>.
Below is my full configuration,

<dataConfig>
<dataSource driver="com.ibm.db2.jcc.DB2Driver"
url="jdbc:db2://IP:port/dbname" user="" password="" />
<dataSource name="xmldata" type="FieldReaderDataSource"/>

 <document>

<entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
FROM BOOK_REC fetch first 40 rows only"
transformer="ClobTransformer" >
<field column="MBR" name="mbr" />
<entity name="y" dataSource="xmldata" dataField="x.SMRY"
processor="XPathEntityProcessor"
forEach="/*:summary" rootEntity="true" >
<field column="card_no" xpath="/cardNo" />

</entity>
</entity>
  </document>
</dataConfig>

And this is my xml data

<ns:summary xmlns:ns="***">
<cardNo>ZAYQ5181</tripId>
<firstName>Sam</firstName>
<lastName>Mathews</lastName>
<date>2013-01-18T23:29:04.492</date>
</ns:summary>

Thanks,
Prasi


On Mon, Mar 24, 2014 at 3:23 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> 1. I don't see the definition of a datasource named 'xmldata' in your
> data-config.
> 2. You have forEach="/*:summary" but I don't think that is a syntax
> supported by XPathRecordReader.
>
> If you can give a sample of the xml stored as Clob in your database,
> then we can help you write the right xpaths.
>
> On Mon, Mar 24, 2014 at 12:55 PM, Prasi S <pr...@gmail.com> wrote:
> > My database configuration is  as below
> >
> >   <entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as
> SMRY
> > FROM BOOK_REC  fetch first 40 rows only"
> >    transformer="ClobTransformer" >
> >     <field column="MBR" name="mbr" />
> >            <entity name="y" dataSource="xmldata" dataField="x.SMRY"
> > processor="XPathEntityProcessor"
> >             forEach="/*:summary" rootEntity="true" >
> >                          <field column="card_no" xpath="/cardNo" />
> >
> >    </entity>
> >  </entity>
> >
> > and i get my response from solr as below
> >
> > <doc>
> > <str name="card_no">org.......@1c8e807</str>
> >
> > Am i mising anything?
> >
> >
> >
> > Thanks,
> > Prasi
> >
> >
> > On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty <go...@mimirtech.com>
> wrote:
> >
> >> On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
> >> >
> >> > Hi,
> >> > I have a requirement to index a database table with clob content. Each
> >> row
> >> > in my table a column which is an xml stored as clob. I want to read
> the
> >> > contents of xmlthrough dih and map each of the xml tag to a separate
> solr
> >> > field,
> >> >
> >> > Below is my clob content.
> >> > <root>
> >> >    <author>A</author>
> >> >    <date>02-Dec-2013</date>
> >> >    .
> >> >    .
> >> >    .
> >> > </root>
> >> >
> >> > i want to read the contents of the clob and map author to author_solr
> and
> >> > date to date_solr . Is this possible with a clob tranformer or a
> script
> >> > tranformer.
> >>
> >> You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
> >> along with the ClobTransformer. You do not provide details of your DIH
> data
> >> configuration file, but this should look something like:
> >>
> >> <dataSource name="xmldata" type="FieldReaderDataSource"/>
> >> ...
> >> <document>
> >>   <entity name="x" query="..." transformer="ClobTransformer">
> >>      <entity name="y" dataSource="xmldata" dataField="x.clob_column"
> >> processor="XPathEntityProcessor" forEach="/root">
> >>        <field column="author_solr" xpath="/author" />
> >>        <field column="date_solr" xpath="/date" />
> >>      </entity>
> >>   </entity>
> >> </document>
> >>
> >> Regards,
> >> Gora
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Solr dih to read Clob contents

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
1. I don't see the definition of a datasource named 'xmldata' in your
data-config.
2. You have forEach="/*:summary" but I don't think that is a syntax
supported by XPathRecordReader.

If you can give a sample of the xml stored as Clob in your database,
then we can help you write the right xpaths.

On Mon, Mar 24, 2014 at 12:55 PM, Prasi S <pr...@gmail.com> wrote:
> My database configuration is  as below
>
>   <entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
> FROM BOOK_REC  fetch first 40 rows only"
>    transformer="ClobTransformer" >
>     <field column="MBR" name="mbr" />
>            <entity name="y" dataSource="xmldata" dataField="x.SMRY"
> processor="XPathEntityProcessor"
>             forEach="/*:summary" rootEntity="true" >
>                          <field column="card_no" xpath="/cardNo" />
>
>    </entity>
>  </entity>
>
> and i get my response from solr as below
>
> <doc>
> <str name="card_no">org.......@1c8e807</str>
>
> Am i mising anything?
>
>
>
> Thanks,
> Prasi
>
>
> On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty <go...@mimirtech.com> wrote:
>
>> On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
>> >
>> > Hi,
>> > I have a requirement to index a database table with clob content. Each
>> row
>> > in my table a column which is an xml stored as clob. I want to read the
>> > contents of xmlthrough dih and map each of the xml tag to a separate solr
>> > field,
>> >
>> > Below is my clob content.
>> > <root>
>> >    <author>A</author>
>> >    <date>02-Dec-2013</date>
>> >    .
>> >    .
>> >    .
>> > </root>
>> >
>> > i want to read the contents of the clob and map author to author_solr and
>> > date to date_solr . Is this possible with a clob tranformer or a script
>> > tranformer.
>>
>> You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
>> along with the ClobTransformer. You do not provide details of your DIH data
>> configuration file, but this should look something like:
>>
>> <dataSource name="xmldata" type="FieldReaderDataSource"/>
>> ...
>> <document>
>>   <entity name="x" query="..." transformer="ClobTransformer">
>>      <entity name="y" dataSource="xmldata" dataField="x.clob_column"
>> processor="XPathEntityProcessor" forEach="/root">
>>        <field column="author_solr" xpath="/author" />
>>        <field column="date_solr" xpath="/date" />
>>      </entity>
>>   </entity>
>> </document>
>>
>> Regards,
>> Gora
>>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr dih to read Clob contents

Posted by Prasi S <pr...@gmail.com>.
My database configuration is  as below

  <entity name="x" query="SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY
FROM BOOK_REC  fetch first 40 rows only"
   transformer="ClobTransformer" >
    <field column="MBR" name="mbr" />
           <entity name="y" dataSource="xmldata" dataField="x.SMRY"
processor="XPathEntityProcessor"
            forEach="/*:summary" rootEntity="true" >
                         <field column="card_no" xpath="/cardNo" />

   </entity>
 </entity>

and i get my response from solr as below

<doc>
<str name="card_no">org.......@1c8e807</str>

Am i mising anything?



Thanks,
Prasi


On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty <go...@mimirtech.com> wrote:

> On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
> >
> > Hi,
> > I have a requirement to index a database table with clob content. Each
> row
> > in my table a column which is an xml stored as clob. I want to read the
> > contents of xmlthrough dih and map each of the xml tag to a separate solr
> > field,
> >
> > Below is my clob content.
> > <root>
> >    <author>A</author>
> >    <date>02-Dec-2013</date>
> >    .
> >    .
> >    .
> > </root>
> >
> > i want to read the contents of the clob and map author to author_solr and
> > date to date_solr . Is this possible with a clob tranformer or a script
> > tranformer.
>
> You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
> along with the ClobTransformer. You do not provide details of your DIH data
> configuration file, but this should look something like:
>
> <dataSource name="xmldata" type="FieldReaderDataSource"/>
> ...
> <document>
>   <entity name="x" query="..." transformer="ClobTransformer">
>      <entity name="y" dataSource="xmldata" dataField="x.clob_column"
> processor="XPathEntityProcessor" forEach="/root">
>        <field column="author_solr" xpath="/author" />
>        <field column="date_solr" xpath="/date" />
>      </entity>
>   </entity>
> </document>
>
> Regards,
> Gora
>

Re: Solr dih to read Clob contents

Posted by Gora Mohanty <go...@mimirtech.com>.
On 20 March 2014 14:53, Prasi S <pr...@gmail.com> wrote:
>
> Hi,
> I have a requirement to index a database table with clob content. Each row
> in my table a column which is an xml stored as clob. I want to read the
> contents of xmlthrough dih and map each of the xml tag to a separate solr
> field,
>
> Below is my clob content.
> <root>
>    <author>A</author>
>    <date>02-Dec-2013</date>
>    .
>    .
>    .
> </root>
>
> i want to read the contents of the clob and map author to author_solr and
> date to date_solr . Is this possible with a clob tranformer or a script
> tranformer.

You will need to use a FieldReaderDataSource, and a XPathEntityProcessor
along with the ClobTransformer. You do not provide details of your DIH data
configuration file, but this should look something like:

<dataSource name="xmldata" type="FieldReaderDataSource"/>
...
<document>
  <entity name="x" query="..." transformer="ClobTransformer">
     <entity name="y" dataSource="xmldata" dataField="x.clob_column"
processor="XPathEntityProcessor" forEach="/root">
       <field column="author_solr" xpath="/author" />
       <field column="date_solr" xpath="/date" />
     </entity>
  </entity>
</document>

Regards,
Gora