You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ravi1984 <ra...@gmail.com> on 2013/12/05 10:30:18 UTC

Importing/Indexing the DB2 XML FieldType in SOLR

I'm using DB2 9.x and I have a column named DTL_XML of type XML. Following
are the snippets of data-config.xml & schema.xml within my SOLR instance.

data-config.xml:
    <dataSource driver="com.ibm.db2.jcc.DB2Driver"
url="jdbc:db2://myIP:myPort/DBName" user="testUsr" password="testPwd" />
    <document>
    <entity name="cust_data" query="select DTL_XML from testSch.SUMMRY_DTL
FETCH FIRST 50 ROWS ONLY">
        <field column="DTL_XML" name="cust_data" />
        </entity>
    </document>

Snippets from schema.xml:
<field name="cust_data" type="text" indexed="true" stored="true"/>
<uniqueKey>cust_data</uniqueKey>
<copyField source="cust_data" dest="text"/>

When I query the SOLR, my requirement is to retrieve the cust_data as XML
string, but instead I was getting the data as following:
"cust_data": "com.ibm.db2.jcc.c.fc@628f628f"

Can somebody help me to find a way to retrieve the cust_data as XML String
(instead of above object representation) from SOLR? Thanks in advance!



--
View this message in context: http://lucene.472066.n3.nabble.com/Importing-Indexing-the-DB2-XML-FieldType-in-SOLR-tp4105079.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing/Indexing the DB2 XML FieldType in SOLR

Posted by ravi1984 <ra...@gmail.com>.
Thanks again Shawn. The CLOB transformer helped.

Thanks
Ravi



--
View this message in context: http://lucene.472066.n3.nabble.com/Importing-Indexing-the-DB2-XML-FieldType-in-SOLR-tp4105079p4106307.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing/Indexing the DB2 XML FieldType in SOLR

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/11/2013 3:09 AM, ravi1984 wrote:
> Thanks much for the reply! We have this working through Java layer and
> facing the issues when integrated with SOLR layer. 
> 
> We tried to tweak the Query itself to return the String - following are the
> options we tried, and still there was no luck. Do we need to tweak anything
> in the Schema.xml, with regard to the field: cust_data? 
> 
> select XMLSERIALIZE(DTL_XML as CLOB(1M)) as DTL_XML from testSch.SUMMRY_DTL
> where DTL_XML IS NOT NULL FETCH FIRST 50 ROWS ONLY 
> 
> Select XMLCAST(DTL_XML AS CLOB(1M)) as DTL_XML from testSch.SUMMRY_DTL where
> DTL_XML IS NOT NULL FETCH FIRST 50 ROWS ONLY 

Upgrading the JDBC driver and talking to IBM are your best options to
try first.  In a newer driver version, IBM might have implemented a
toString() method on the XML object type.  I have no idea whether this
is the case or not -- I'm completely unfamiliar with DB2.

You could try returning the XML column as a CLOB and then using the
ClobTransformer.  Reading what I can find on XMLCAST, it seems as though
this probably is returning as CLOB, which Solr is unable to deal with
natively -- requiring the transformer.

http://wiki.apache.org/solr/DataImportHandler#ClobTransformer

If those don't yield something useful, you might need to write some
custom code.  That might be a straight modification of DIH to use the
IBM data types directly, or you might write a custom transformer or a
custom update handler.

Thanks,
Shawn


Re: Importing/Indexing the DB2 XML FieldType in SOLR

Posted by ravi1984 <ra...@gmail.com>.
Hi Shawn, 
Thanks much for the reply! We have this working through Java layer and
facing the issues when integrated with SOLR layer. 

We tried to tweak the Query itself to return the String - following are the
options we tried, and still there was no luck. Do we need to tweak anything
in the Schema.xml, with regard to the field: cust_data? 

select XMLSERIALIZE(DTL_XML as CLOB(1M)) as DTL_XML from testSch.SUMMRY_DTL
where DTL_XML IS NOT NULL FETCH FIRST 50 ROWS ONLY 

Select XMLCAST(DTL_XML AS CLOB(1M)) as DTL_XML from testSch.SUMMRY_DTL where
DTL_XML IS NOT NULL FETCH FIRST 50 ROWS ONLY 

Thanks 
Ravi



--
View this message in context: http://lucene.472066.n3.nabble.com/Importing-Indexing-the-DB2-XML-FieldType-in-SOLR-tp4105079p4106111.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing/Indexing the DB2 XML FieldType in SOLR

Posted by ravi1984 <ra...@gmail.com>.
Hi Shawn,
Thanks much for the reply! We have this working through Java layer and
facing the issues when integrated with SOLR layer.

We tried to tweak the Query itself to return the String - following are the
options we tried, and still there was no luck. Do we need to tweak anything
in the Schema.xml, with regard to the field: cust_data?

select XMLSERIALIZE(ITNRY_DTL_TX as CLOB(1M)) as ITNRY_DTL_TX from
TITIN00.ITNRY_DTL where ITNRY_DTL_TX IS NOT NULL FETCH FIRST 50 ROWS ONLY

Select XMLCAST(ITNRY_DTL_TX AS CLOB(1M)) as ITNRY_DTL_TX from
TITIN00.ITNRY_DTL where ITNRY_DTL_TX IS NOT NULL FETCH FIRST 50 ROWS ONLY

Thanks
Ravi



--
View this message in context: http://lucene.472066.n3.nabble.com/Importing-Indexing-the-DB2-XML-FieldType-in-SOLR-tp4105079p4106110.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing/Indexing the DB2 XML FieldType in SOLR

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/5/2013 8:20 AM, Shawn Heisey wrote:
> one with a proper toString() method.  If not, you'll need to write your
> own indexing application or modify the dataimport handler source code to
> handle the XML object and recompile it.

I just noticed something on the IBM URL.

http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.java%2Fsrc%2Ftpc%2Fimjcc_c0021817.htm

It says "You need to use this technique if you are not using a version
of the IBM® Data Server Driver for JDBC and SQLJ that supports JDBC 4.0."

This MIGHT mean that if you just obtain the latest JDBC driver from IBM,
it might automatically work right.  I could be wrong about this.

It looks like you need a login to IBM's website to download JDBC
drivers, so I can only give you this URL as a starting point:

http://www-01.ibm.com/support/docview.wss?uid=swg21363866

Thanks,
Shawn


Re: Importing/Indexing the DB2 XML FieldType in SOLR

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/5/2013 2:30 AM, ravi1984 wrote:
> I'm using DB2 9.x and I have a column named DTL_XML of type XML. Following
> are the snippets of data-config.xml & schema.xml within my SOLR instance.
> 
> data-config.xml:
>     <dataSource driver="com.ibm.db2.jcc.DB2Driver"
> url="jdbc:db2://myIP:myPort/DBName" user="testUsr" password="testPwd" />
>     <document>
>     <entity name="cust_data" query="select DTL_XML from testSch.SUMMRY_DTL
> FETCH FIRST 50 ROWS ONLY">
>         <field column="DTL_XML" name="cust_data" />
>         </entity>
>     </document>
> 
> Snippets from schema.xml:
> <field name="cust_data" type="text" indexed="true" stored="true"/>
> <uniqueKey>cust_data</uniqueKey>
> <copyField source="cust_data" dest="text"/>
> 
> When I query the SOLR, my requirement is to retrieve the cust_data as XML
> string, but instead I was getting the data as following:
> "cust_data": "com.ibm.db2.jcc.c.fc@628f628f"

That value tells me that IBM's JDBC driver does not have a proper
toString() method for the XML object, so you get the typical Java object
identifier.  Research indicates that in order to use the XML column type
in a JDBC application, special code is required.

http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.java%2Fsrc%2Ftpc%2Fimjcc_c0021817.htm
http://stackoverflow.com/questions/4451156/how-to-read-xml-column-from-db2-using-jdbc

If I were you I'd call IBM.  Since you are using their database,
hopefully you also have a support contract.  Perhaps they can give you a
JDBC option for your dataimport config that changes the XML object to
one with a proper toString() method.  If not, you'll need to write your
own indexing application or modify the dataimport handler source code to
handle the XML object and recompile it.

Thanks,
Shawn