You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by geeky2 <ge...@hotmail.com> on 2013/05/16 20:32:21 UTC

having trouble storing large text blob fields - returns binary address in search results

hello 

environment: solr 3.5

can someone help me with the correct configuration for some large text blob
fields?

we have two fields in informix tables that are of type text. 

when we do a search the results for these fields come back looking like
this: 

<str name="attributes">[B@17c232ee</str>

i have tried setting them up as clob fields - but this is not working (see
details below)

i have also tried treating them as plain string fields (removing the
references to clob in the DIH) - but this does not work either.


DIH configuration:


      <entity transformer=&quot;TemplateTransformer,ClobTransformer&quot;
name=&quot;core1-parts&quot; query=&quot;select 
    summ.*, 
    1 as item_type, 
    1 as part_cnt, 
    '' as brand, 
...

 &lt;field column=&quot;attr_val&quot;       name=&quot;attributes&quot;
clob=&quot;true&quot; />
            <field column="rsr_val"        name="restrictions" clob="true"
/>


Schema.xml

  <field name="attributes" type="string" indexed="false" stored="true"/>
    <field name="restrictions" type="string" indexed="false" stored="true"/>

thx
mark





--
View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-storing-large-text-blob-fields-returns-binary-address-in-search-results-tp4063979.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: having trouble storing large text blob fields - returns binary address in search results

Posted by geeky2 <ge...@hotmail.com>.
hello

your comment made me think - so i decided to double check myself.

i opened up the schema in squirrel and made sure that the two columns in
question were actually of type TEXT in the schema - check

i went in to the db-config.xml and removed all references to
ClobTransformer, removed the cast directives from the fields as well as the
clob="true" on the two fields - i pasted the db-config.xml below for
reference - check

i restarted jboss - thus restarting solr - check

i went in to the solr dataimport admin screen and did a clean import - check

after the import was complete - i queried a part that i knew would have one
of the clob fields - results are pasted below as well - you can see the
binary address in the field.


<?xml version="1.0"?>
<result name="response" numFound="1" start="0">
  <doc>
    <str name="accessoryIndicator">N</str>
 *   <str name="attributes">[B@5b372219</str>*
    <str name="availabilityStatus">PIA</str>
    <arr name="divProductTypeDesc">
      <str>Refrigerators and Freezers</str>
    </arr>
    <str name="divProductTypeId">0046</str>
    <str name="id">12001892,0046,464</str>
    <str name="itemModelDesc">VALVE, WATER</str>
    <str name="itemModelNo">12001892</str>
    <str name="itemModelNoExactMatchStr">12001892</str>
    <int name="itemType">1</int>
    <str name="otcStockIndicator">Y</str>
    <int name="partCnt">1</int>
    <str name="partCondition">N</str>
    <arr name="plsBrandDesc">
      <str/>
    </arr>
    <str name="plsBrandId">464</str>
    <str name="productIndicator">N</str>
    <int name="rankNo">13</int>
    <float name="sellingPrice">53.54</float>
    <str name="sourceOrderNo">464 </str>
    <str name="subbedFlag">Y</str>
  </doc>
</result>








        <document>
                <entity transformer="TemplateTransformer" name="core1-parts"
query="select 
        summ.*, 
        1 as item_type, 
        1 as part_cnt, 
        '' as brand, 
        mst.acy_prt_fl,
        mst.dil_tx,
        mst.hzd_mtl_typ_cd,
        mst.otc_cre_stk_fl,
        mst.prd_fl,
        mst.prt_cmt_tx,
        mst.prt_cnd_cd,
        mst.prt_inc_qt,
        mst.prt_made_by,
        mst.sug_qt,
        att.attr_val,
        rsr.rsr_val,
        case when sub.orb_itm_id is null then 'N' else 'Y' end as
subbed_flag
from 
        prtxtps_prt_summ as summ
        left outer join prtxtpm_prt_mast as mst on mst.orb_itm_id =
summ.orb_itm_id and mst.prd_gro_id = summ.prd_gro_id and mst.spp_id =
summ.spp_id
    left outer join tmpxtpa_prt_attr as att on att.orb_itm_id =
summ.orb_itm_id and att.prd_gro_id = summ.prd_gro_id and att.spp_id =
summ.spp_id 
    left outer join tmpxtpr_prt_rsr as rsr on rsr.orb_itm_id =
summ.orb_itm_id and rsr.prd_gro_id = summ.prd_gro_id and rsr.spp_id =
summ.spp_id 
    left outer join tmpxtps_prt_sub as sub on sub.orb_itm_id =
summ.orb_itm_id and sub.prd_gro_id = summ.prd_gro_id and sub.spp_id =
summ.spp_id
where 
        summ.spp_id = '464'"> 
                        
                        <field column ="id" name="id" 
template="${core1-parts.orb_itm_id},${core1-parts.prd_gro_id},${core1-parts.spp_id}"/>
                        <field column="orb_itm_id"     name="itemModelNo"/>              
                        <field column="prd_gro_id"    
name="divProductTypeId"/> 
                        <field column="ds_tx"         
name="itemModelDesc"/> 
                        <field column="spp_id"         name="plsBrandId"/> 
                        <field column="rnk_no"         name="rankNo"/> 
                        <field column="item_type"      name="itemType"/> 
                        <field column="brand"          name="plsBrandDesc"/> 
                        <field column="prd_gro_ds"    
name="divProductTypeDesc"/> 
                        <field column="part_cnt"       name="partCnt"/> 
                        <field column="avail"         
name="availabilityStatus"/> 
                        <field column="price"          name="sellingPrice"/> 
                        <field column="prt_son"       
name="sourceOrderNo"/> 
                        <field column="prt_src_cd"     name="sourceIdCode"/> 
                        <field column="rte_cd"        
name="sourceRouteCode"/> 
                        
                        <field column="acy_prt_fl"    
name="accessoryIndicator"/> 
                        <field column="dil_tx"         name="disclosure"/> 
                        <field column="hzd_mtl_typ_cd"
name="hazardousMaterialCode"/> 
                        <field column="otc_cre_stk_fl"
name="otcStockIndicator"/> 
                        <field column="prd_fl"        
name="productIndicator"/> 
                        <field column="prt_cmt_tx"     name="comment"/> 
                        <field column="prt_cnd_cd"    
name="partCondition"/> 
                        <field column="prt_inc_qt"     name="qtyIncluded"/> 
                        <field column="prt_made_by"    name="madeBy"/> 
                        <field column="sug_qt"         name="suggestedQty"/> 
                        
                        <field column="attr_val"       name="attributes"/> 
                        <field column="rsr_val"        name="restrictions"/> 
                        
                        <field column="subbed_flag"    name="subbedFlag"/> 
                        <entity name="partImages" query="SELECT
pi.prt_img_url, pi.orb_itm_id || ',' || pi.prd_gro_id || ',' || pi.spp_id as
pk FROM prtxtpi_prt_img pi" processor="CachedSqlEntityProcessor"
cacheKey="pk" cacheLookup="core1-parts.id">
                                <field column="prt_img_url"
name="partImageURL"/>
                        </entity>
                </entity>
                <entity transformer="TemplateTransformer" name="core1-model"
query="select prtxtms_mdl_summ.prd_typ_id, prtxtms_mdl_summ.bnd_id,
prtxtms_mdl_summ.mdl_nm, prtxtms_mdl_summ.mdl_desc, 0 as rnk_no, 2 as
item_type, prtxtms_mdl_summ.bnd_desc as brand, prtxtms_mdl_summ.prd_desc as
div_product_type_ds, prtxtms_mdl_summ.pg_cnt, case when
prtxtms_mdl_summ.part_cnt is null then 0 else prtxtms_mdl_summ.part_cnt end
as part_cnt from prtxtms_mdl_summ"> 
                        <field column ="id" name="id"
template="${core1-model.prd_typ_id},${core1-model.bnd_id},${core1-model.mdl_nm}"/>
                        <field column="mdl_nm"         name="itemModelNo"/>              
                        <field column="prd_typ_id"    
name="divProductTypeId"/> 
                        <field column="mdl_desc"      
name="itemModelDesc"/>
                        <field column="bnd_id"         name="plsBrandId"/> 
                        <field column="rnk_no"         name="rankNo"/> 
                        <field column="item_type"      name="itemType"/> 
                        <field column="brand"          name="plsBrandDesc"/> 
                        <field column="div_product_type_ds"  
name="divProductTypeDesc"/> 
                        <field column="part_cnt"       name="partCnt"/> 
                        <field column="pg_cnt"       name="pageCnt"/> 
                </entity>
   </document>

mark



--
View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-storing-large-text-blob-fields-returns-binary-address-in-search-results-tp4063979p4064407.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: having trouble storing large text blob fields - returns binary address in search results

Posted by Gora Mohanty <go...@mimirtech.com>.
On 18 May 2013 02:24, geeky2 <ge...@hotmail.com> wrote:
> Hello Gora,
>
>
> thank you for the reply -
>
> i did finally get this to work.  i had to cast the column in the DIH to a
> clob - like this.
>
>     cast(att.attr_val AS clob) as attr_val,
>     cast(rsr.rsr_val AS clob) as rsr_val,
>
> once this was done, the ClobTransformer worked.
[...]

This would seem to indicate that the column was not a
blob to start with. Did you try to index directly without
the ClobTransformer?

Regards,
Gora

Re: having trouble storing large text blob fields - returns binary address in search results

Posted by geeky2 <ge...@hotmail.com>.
Hello Gora,


thank you for the reply - 

i did finally get this to work.  i had to cast the column in the DIH to a
clob - like this.

    cast(att.attr_val AS clob) as attr_val,
    cast(rsr.rsr_val AS clob) as rsr_val,

once this was done, the ClobTransformer worked.

to my knowledge - this particular use case and the need for the cast is not
documented anywhere.  i checked the solr wiki and searched the threads on
this forum for things like clobtransformer, informix and blob without luck. 
i also did quite a few google searches as well but no luck (but maybe i
missed something ;)

maybe this is just some "edge case".  i also realize that informix is not
that common.

i have a question in to the solr developers list - just so i can better
understand what actually is happening, why it was necessary for the "cast",
and the limitations / parameters of the ClobTransformer.  

the thread on the developers list is located here:

http://lucene.472066.n3.nabble.com/have-developer-question-about-ClobTransformer-and-DIH-td4064256.html

thx
mark






--
View this message in context: http://lucene.472066.n3.nabble.com/having-trouble-storing-large-text-blob-fields-returns-binary-address-in-search-results-tp4063979p4064286.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: having trouble storing large text blob fields - returns binary address in search results

Posted by Gora Mohanty <go...@mimirtech.com>.
On 17 May 2013 00:02, geeky2 <ge...@hotmail.com> wrote:
[...]
> i have tried setting them up as clob fields - but this is not working (see
> details below)
>
> i have also tried treating them as plain string fields (removing the
> references to clob in the DIH) - but this does not work either.
>
>
> DIH configuration:
>
>
>       <entity transformer="TemplateTransformer,ClobTransformer"
> name="core1-parts" query="select
>     summ.*,
>     1 as item_type,
>     1 as part_cnt,
>     '' as brand,
> ...
>
>  <field column="attr_val"       name="attributes"
> clob="true" />
>             <field column="rsr_val"        name="restrictions" clob="true"
> />
[...]

This should work, provided that the SELECT statement is
getting the correct columns from the database. Are you sure
that the entity is enclosing the field in question? Could you
share the entire DIH data configuration file?

You should also try setting sourceColName for the ClobTransformer
field. Have never tried ClobTransformer where the field name differs
from the database column name. Please see:
http://wiki.apache.org/solr/DataImportHandler#ClobTransformer

I am not familiar with Informix, and the final possibility is that
there is some bug with ClobTransformer in translating blobs
from Informix. It should be possible to cajole the database to
convert a blob into a VARCHAR, e.g., something like
  SELECT SUBSTR( attr_val, 0 )
Not sure if the syntax is database dependent.

Regards,
Gora