You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pranav Prakash <pr...@gmail.com> on 2012/07/18 21:37:32 UTC

How To apply transformation in DIH for multivalued numeric field?

I have a multivalued integer field and a multivalued string field defined
in my schema as

<field name="community_tag_ids"
        type="integer"
        indexed="true"
        stored="true"
        multiValued="true"
        omitNorms="true" />
<field name="community_tags"
        type="text"
        indexed="true"
        termVectors="true"
        stored="true"
        multiValued="true"
        omitNorms="true" />


The DIH entity and field defn for the same goes as

<entity name="document"
      dataSource="app"
      onError="skip"
      transformer="RegexTransformer"
      query="...">

 <entity name="community_tags"
        transformer="RegexTransformer"
        query="SELECT
        group_concat(a.id SEPARATOR ',') AS community_tag_ids,
        group_concat(a.title SEPARATOR ',') AS community_tags
        FROM tags a JOIN tag_dets b ON a.id = b.tag_id
        WHERE b.doc_id = ${document.id}" >
        <field column="community_tag_ids" name="community_tag_ids"/>
        <field column="community_tags" splitBy="," />
      </entity>

</entity>

The value for field community_tags comes correctly as an array of strings.
However the value of field community_tag_ids is not proper

<arr name="community_tag_ids">
<int>[B@390c0a18</int>
</arr>

I tried chaining NumberFormatTransformer with formatStyle="number" but that
throws DataImportHandlerException: Failed to apply NumberFormat on column.
Could it be due to NULL values from database or because the value is not
proper? How do we handle NULL in this case?


*Pranav Prakash*

"temet nosce"

Re: How To apply transformation in DIH for multivalued numeric field?

Posted by jmlucjav <jm...@gmail.com>.
I have seen that issue several times, in my case it was always with an id
field, mysql db and linux. Same config but on windows did not show that
issue. 

Never got to the bottom of it...as it was an id it was just working as it
was unique. 

--
View this message in context: http://lucene.472066.n3.nabble.com/How-To-apply-transformation-in-DIH-for-multivalued-numeric-field-tp3995810p3995927.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How To apply transformation in DIH for multivalued numeric field?

Posted by Pranav Prakash <pr...@gmail.com>.
I had tried with splitBy for numeric field, but that also did not worked
for me. However I got rid of group_concat and it was all good to go.

Thanks a lot!! I really had a difficult time understanding this behavior.


*Pranav Prakash*

"temet nosce"



On Thu, Jul 19, 2012 at 1:34 AM, Dyer, James <Ja...@ingrambook.com>wrote:

> Don't you want to specify "splitBy" for the integer field too?
>
> Actually though, you shouldn't need to use GROUP_CONCAT and
> RegexTransformer at all.  DIH is designed to handle "1>many" relations
> between parent and child entities by populating all the child fields as
> multi-valued automatically.  I guess your approach leads to a lot fewer
> rows getting sent from your db to Solr though.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Pranav Prakash [mailto:pranny@gmail.com]
> Sent: Wednesday, July 18, 2012 2:38 PM
> To: solr-user@lucene.apache.org
> Subject: How To apply transformation in DIH for multivalued numeric field?
>
> I have a multivalued integer field and a multivalued string field defined
> in my schema as
>
> <field name="community_tag_ids"
>         type="integer"
>         indexed="true"
>         stored="true"
>         multiValued="true"
>         omitNorms="true" />
> <field name="community_tags"
>         type="text"
>         indexed="true"
>         termVectors="true"
>         stored="true"
>         multiValued="true"
>         omitNorms="true" />
>
>
> The DIH entity and field defn for the same goes as
>
> <entity name="document"
>       dataSource="app"
>       onError="skip"
>       transformer="RegexTransformer"
>       query="...">
>
>  <entity name="community_tags"
>         transformer="RegexTransformer"
>         query="SELECT
>         group_concat(a.id SEPARATOR ',') AS community_tag_ids,
>         group_concat(a.title SEPARATOR ',') AS community_tags
>         FROM tags a JOIN tag_dets b ON a.id = b.tag_id
>         WHERE b.doc_id = ${document.id}" >
>         <field column="community_tag_ids" name="community_tag_ids"/>
>         <field column="community_tags" splitBy="," />
>       </entity>
>
> </entity>
>
> The value for field community_tags comes correctly as an array of strings.
> However the value of field community_tag_ids is not proper
>
> <arr name="community_tag_ids">
> <int>[B@390c0a18</int>
> </arr>
>
> I tried chaining NumberFormatTransformer with formatStyle="number" but that
> throws DataImportHandlerException: Failed to apply NumberFormat on column.
> Could it be due to NULL values from database or because the value is not
> proper? How do we handle NULL in this case?
>
>
> *Pranav Prakash*
>
> "temet nosce"
>
>

RE: How To apply transformation in DIH for multivalued numeric field?

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Don't you want to specify "splitBy" for the integer field too?

Actually though, you shouldn't need to use GROUP_CONCAT and RegexTransformer at all.  DIH is designed to handle "1>many" relations between parent and child entities by populating all the child fields as multi-valued automatically.  I guess your approach leads to a lot fewer rows getting sent from your db to Solr though.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Pranav Prakash [mailto:pranny@gmail.com] 
Sent: Wednesday, July 18, 2012 2:38 PM
To: solr-user@lucene.apache.org
Subject: How To apply transformation in DIH for multivalued numeric field?

I have a multivalued integer field and a multivalued string field defined
in my schema as

<field name="community_tag_ids"
        type="integer"
        indexed="true"
        stored="true"
        multiValued="true"
        omitNorms="true" />
<field name="community_tags"
        type="text"
        indexed="true"
        termVectors="true"
        stored="true"
        multiValued="true"
        omitNorms="true" />


The DIH entity and field defn for the same goes as

<entity name="document"
      dataSource="app"
      onError="skip"
      transformer="RegexTransformer"
      query="...">

 <entity name="community_tags"
        transformer="RegexTransformer"
        query="SELECT
        group_concat(a.id SEPARATOR ',') AS community_tag_ids,
        group_concat(a.title SEPARATOR ',') AS community_tags
        FROM tags a JOIN tag_dets b ON a.id = b.tag_id
        WHERE b.doc_id = ${document.id}" >
        <field column="community_tag_ids" name="community_tag_ids"/>
        <field column="community_tags" splitBy="," />
      </entity>

</entity>

The value for field community_tags comes correctly as an array of strings.
However the value of field community_tag_ids is not proper

<arr name="community_tag_ids">
<int>[B@390c0a18</int>
</arr>

I tried chaining NumberFormatTransformer with formatStyle="number" but that
throws DataImportHandlerException: Failed to apply NumberFormat on column.
Could it be due to NULL values from database or because the value is not
proper? How do we handle NULL in this case?


*Pranav Prakash*

"temet nosce"