You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Briggs Thompson <w....@gmail.com> on 2011/12/01 18:46:12 UTC

DataImportHandler w/ multivalued fields

Hello Solr Community!

I am implementing a data connection to Solr through the Data Import Handler
and non-multivalued fields are working correctly, but multivalued fields
are not getting indexed properly.

I am new to DataImportHandler, but from what I could find, the entity is
the way to go for multivalued field. The weird thing is that data is being
indexed for one row, meaning first raw_tag gets populated.


Anyone have any ideas?
Thanks,
Briggs

This is the relevant part of the schema:

   <field name ="raw_tag" type="text_en_lessAggressive" indexed="true"
stored="false" multivalued="true"/>
   <field name ="raw_tag_string" type="string" indexed="false"
stored="true" multivalued="true"/>
   <copyField source="raw_tag" dest="raw_tag_string"/>

And the relevant part of data-import.xml:

<document name="merchant">
        <entity name="site"
                  query="select * from site ">
            <field column="siteId" name="siteId" />
            <field column="domain" name="domain" />
            <field column="aliasFor" name="aliasFor" />
            <field column="title" name="title" />
            <field column="description" name="description" />
            <field column="requests" name="requests" />
            <field column="requiresModeration" name="requiresModeration" />
            <field column="blocked" name="blocked" />
            <field column="affiliateLink" name="affiliateLink" />
            <field column="affiliateTracker" name="affiliateTracker" />
            <field column="affiliateNetwork" name="affiliateNetwork" />
            <field column="cjMerchantId" name="cjMerchantId" />
            <field column="thumbNail" name="thumbNail" />
            <field column="updateRankings" name="updateRankings" />
            <field column="couponCount" name="couponCount" />
            <field column="category" name="category" />
            <field column="adult" name="adult" />
            <field column="rank" name="rank" />
            <field column="redirectsTo" name="redirectsTo" />
            <field column="wwwRequired" name="wwwRequired" />
            <field column="avgSavings" name="avgSavings" />
            <field column="products" name="products" />
            <field column="nameChecked" name="nameChecked" />
            <field column="tempFlag" name="tempFlag" />
            <field column="created" name="created" />
            <field column="enableSplitTesting" name="enableSplitTesting" />
            <field column="affiliateLinklock" name="affiliateLinklock" />
            <field column="hasMobileSite" name="hasMobileSite" />
            <field column="blockSite" name="blockSite" />
            <entity name="merchant_tags" pk="siteId"
            query="select raw_tag, freetags.id,
freetagged_objects.object_id as siteId
               from freetags
   inner join freetagged_objects
   on freetags.id=freetagged_objects.tag_id
   where freetagged_objects.object_id='${site.siteId}'">
<field column="raw_tag" name="raw_tag"/>
    </entity>
        </entity>
    </document>

Re: DataImportHandler w/ multivalued fields

Posted by Briggs Thompson <w....@gmail.com>.
Hey Rahul,

Thanks for the response. I actually just figured it thankfully :). To
answer your question, the raw_tag is indexed and not stored (tokenized),
and then there is a copyField for raw_tag to "raw_tag_string" which would
be used for facets. That *should have* been displayed in the results.

The silly mistake I made was not camel casing "multiValued", which is
clearly the source of the problem.

The second email I sent changing the query and using the split for the
multivalued field had an error in it in the form of a missing line:
transformer="RegexTransformer"
in the entity declaration.

Anyhow, thanks for the quick response!

Briggs


On Thu, Dec 1, 2011 at 12:57 PM, Rahul Warawdekar <
rahul.warawdekar@gmail.com> wrote:

> Hi Briggs,
>
> By saying "multivalued fields are not getting indexed prperly", do you mean
> to say that you are not able to search on those fields ?
> Have you tried actually searching your Solr index for those multivalued
> terms and make sure if it returns the search results ?
>
> One possibility could be that the multivalued fields are getting indexed
> correctly and are searchable.
> However, since your schema.xml has a "raw_tag" field whose "stored"
> attribute is set to false, you may not be able to see those fields.
>
>
>
> On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson <
> w.briggs.thompson@gmail.com
> > wrote:
>
> > In addition, I tried a query like below and changed the column definition
> > to
> >            <field column="raw_tag" name="raw_tag" splitBy="," />
> > and still no luck. It is indexing the full content now but not
> multivalued.
> > It seems like the "splitBy" ins't working properly.
> >
> >    select group_concat(freetags.raw_tag separator ', ') as raw_tag,
> site.*
> > from site
> > left outer join
> >  (freetags inner join freetagged_objects)
> >     on (freetags.id = freetagged_objects.tag_id
> >       and site.siteId = freetagged_objects.object_id)
> > group  by site.siteId
> >
> > Am I doing something wrong?
> > Thanks,
> > Briggs Thompson
> >
> > On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson <
> > w.briggs.thompson@gmail.com> wrote:
> >
> > > Hello Solr Community!
> > >
> > > I am implementing a data connection to Solr through the Data Import
> > > Handler and non-multivalued fields are working correctly, but
> multivalued
> > > fields are not getting indexed properly.
> > >
> > > I am new to DataImportHandler, but from what I could find, the entity
> is
> > > the way to go for multivalued field. The weird thing is that data is
> > being
> > > indexed for one row, meaning first raw_tag gets populated.
> > >
> > >
> > > Anyone have any ideas?
> > > Thanks,
> > > Briggs
> > >
> > > This is the relevant part of the schema:
> > >
> > >    <field name ="raw_tag" type="text_en_lessAggressive" indexed="true"
> > > stored="false" multivalued="true"/>
> > >    <field name ="raw_tag_string" type="string" indexed="false"
> > > stored="true" multivalued="true"/>
> > >    <copyField source="raw_tag" dest="raw_tag_string"/>
> > >
> > > And the relevant part of data-import.xml:
> > >
> > > <document name="merchant">
> > >         <entity name="site"
> > >                   query="select * from site ">
> > >             <field column="siteId" name="siteId" />
> > >             <field column="domain" name="domain" />
> > >             <field column="aliasFor" name="aliasFor" />
> > >             <field column="title" name="title" />
> > >             <field column="description" name="description" />
> > >             <field column="requests" name="requests" />
> > >             <field column="requiresModeration"
> name="requiresModeration"
> > />
> > >             <field column="blocked" name="blocked" />
> > >             <field column="affiliateLink" name="affiliateLink" />
> > >             <field column="affiliateTracker" name="affiliateTracker" />
> > >             <field column="affiliateNetwork" name="affiliateNetwork" />
> > >             <field column="cjMerchantId" name="cjMerchantId" />
> > >             <field column="thumbNail" name="thumbNail" />
> > >             <field column="updateRankings" name="updateRankings" />
> > >             <field column="couponCount" name="couponCount" />
> > >             <field column="category" name="category" />
> > >             <field column="adult" name="adult" />
> > >             <field column="rank" name="rank" />
> > >             <field column="redirectsTo" name="redirectsTo" />
> > >             <field column="wwwRequired" name="wwwRequired" />
> > >             <field column="avgSavings" name="avgSavings" />
> > >             <field column="products" name="products" />
> > >             <field column="nameChecked" name="nameChecked" />
> > >             <field column="tempFlag" name="tempFlag" />
> > >             <field column="created" name="created" />
> > >             <field column="enableSplitTesting"
> name="enableSplitTesting"
> > />
> > >             <field column="affiliateLinklock" name="affiliateLinklock"
> />
> > >             <field column="hasMobileSite" name="hasMobileSite" />
> > >             <field column="blockSite" name="blockSite" />
> > >             <entity name="merchant_tags" pk="siteId"
> > >             query="select raw_tag, freetags.id,
> > > freetagged_objects.object_id as siteId
> > >                from freetags
> > >    inner join freetagged_objects
> > >    on freetags.id=freetagged_objects.tag_id
> > >     where freetagged_objects.object_id='${site.siteId}'">
> > > <field column="raw_tag" name="raw_tag"/>
> > >      </entity>
> > >         </entity>
> > >     </document>
> > >
> >
>
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>

Re: DataImportHandler w/ multivalued fields

Posted by Rahul Warawdekar <ra...@gmail.com>.
Hi Briggs,

By saying "multivalued fields are not getting indexed prperly", do you mean
to say that you are not able to search on those fields ?
Have you tried actually searching your Solr index for those multivalued
terms and make sure if it returns the search results ?

One possibility could be that the multivalued fields are getting indexed
correctly and are searchable.
However, since your schema.xml has a "raw_tag" field whose "stored"
attribute is set to false, you may not be able to see those fields.



On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson <w.briggs.thompson@gmail.com
> wrote:

> In addition, I tried a query like below and changed the column definition
> to
>            <field column="raw_tag" name="raw_tag" splitBy="," />
> and still no luck. It is indexing the full content now but not multivalued.
> It seems like the "splitBy" ins't working properly.
>
>    select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
> from site
> left outer join
>  (freetags inner join freetagged_objects)
>     on (freetags.id = freetagged_objects.tag_id
>       and site.siteId = freetagged_objects.object_id)
> group  by site.siteId
>
> Am I doing something wrong?
> Thanks,
> Briggs Thompson
>
> On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson <
> w.briggs.thompson@gmail.com> wrote:
>
> > Hello Solr Community!
> >
> > I am implementing a data connection to Solr through the Data Import
> > Handler and non-multivalued fields are working correctly, but multivalued
> > fields are not getting indexed properly.
> >
> > I am new to DataImportHandler, but from what I could find, the entity is
> > the way to go for multivalued field. The weird thing is that data is
> being
> > indexed for one row, meaning first raw_tag gets populated.
> >
> >
> > Anyone have any ideas?
> > Thanks,
> > Briggs
> >
> > This is the relevant part of the schema:
> >
> >    <field name ="raw_tag" type="text_en_lessAggressive" indexed="true"
> > stored="false" multivalued="true"/>
> >    <field name ="raw_tag_string" type="string" indexed="false"
> > stored="true" multivalued="true"/>
> >    <copyField source="raw_tag" dest="raw_tag_string"/>
> >
> > And the relevant part of data-import.xml:
> >
> > <document name="merchant">
> >         <entity name="site"
> >                   query="select * from site ">
> >             <field column="siteId" name="siteId" />
> >             <field column="domain" name="domain" />
> >             <field column="aliasFor" name="aliasFor" />
> >             <field column="title" name="title" />
> >             <field column="description" name="description" />
> >             <field column="requests" name="requests" />
> >             <field column="requiresModeration" name="requiresModeration"
> />
> >             <field column="blocked" name="blocked" />
> >             <field column="affiliateLink" name="affiliateLink" />
> >             <field column="affiliateTracker" name="affiliateTracker" />
> >             <field column="affiliateNetwork" name="affiliateNetwork" />
> >             <field column="cjMerchantId" name="cjMerchantId" />
> >             <field column="thumbNail" name="thumbNail" />
> >             <field column="updateRankings" name="updateRankings" />
> >             <field column="couponCount" name="couponCount" />
> >             <field column="category" name="category" />
> >             <field column="adult" name="adult" />
> >             <field column="rank" name="rank" />
> >             <field column="redirectsTo" name="redirectsTo" />
> >             <field column="wwwRequired" name="wwwRequired" />
> >             <field column="avgSavings" name="avgSavings" />
> >             <field column="products" name="products" />
> >             <field column="nameChecked" name="nameChecked" />
> >             <field column="tempFlag" name="tempFlag" />
> >             <field column="created" name="created" />
> >             <field column="enableSplitTesting" name="enableSplitTesting"
> />
> >             <field column="affiliateLinklock" name="affiliateLinklock" />
> >             <field column="hasMobileSite" name="hasMobileSite" />
> >             <field column="blockSite" name="blockSite" />
> >             <entity name="merchant_tags" pk="siteId"
> >             query="select raw_tag, freetags.id,
> > freetagged_objects.object_id as siteId
> >                from freetags
> >    inner join freetagged_objects
> >    on freetags.id=freetagged_objects.tag_id
> >     where freetagged_objects.object_id='${site.siteId}'">
> > <field column="raw_tag" name="raw_tag"/>
> >      </entity>
> >         </entity>
> >     </document>
> >
>



-- 
Thanks and Regards
Rahul A. Warawdekar

Re: DataImportHandler w/ multivalued fields

Posted by Briggs Thompson <w....@gmail.com>.
In addition, I tried a query like below and changed the column definition
to
            <field column="raw_tag" name="raw_tag" splitBy="," />
and still no luck. It is indexing the full content now but not multivalued.
It seems like the "splitBy" ins't working properly.

    select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
from site
left outer join
  (freetags inner join freetagged_objects)
     on (freetags.id = freetagged_objects.tag_id
       and site.siteId = freetagged_objects.object_id)
group  by site.siteId

Am I doing something wrong?
Thanks,
Briggs Thompson

On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson <
w.briggs.thompson@gmail.com> wrote:

> Hello Solr Community!
>
> I am implementing a data connection to Solr through the Data Import
> Handler and non-multivalued fields are working correctly, but multivalued
> fields are not getting indexed properly.
>
> I am new to DataImportHandler, but from what I could find, the entity is
> the way to go for multivalued field. The weird thing is that data is being
> indexed for one row, meaning first raw_tag gets populated.
>
>
> Anyone have any ideas?
> Thanks,
> Briggs
>
> This is the relevant part of the schema:
>
>    <field name ="raw_tag" type="text_en_lessAggressive" indexed="true"
> stored="false" multivalued="true"/>
>    <field name ="raw_tag_string" type="string" indexed="false"
> stored="true" multivalued="true"/>
>    <copyField source="raw_tag" dest="raw_tag_string"/>
>
> And the relevant part of data-import.xml:
>
> <document name="merchant">
>         <entity name="site"
>                   query="select * from site ">
>             <field column="siteId" name="siteId" />
>             <field column="domain" name="domain" />
>             <field column="aliasFor" name="aliasFor" />
>             <field column="title" name="title" />
>             <field column="description" name="description" />
>             <field column="requests" name="requests" />
>             <field column="requiresModeration" name="requiresModeration" />
>             <field column="blocked" name="blocked" />
>             <field column="affiliateLink" name="affiliateLink" />
>             <field column="affiliateTracker" name="affiliateTracker" />
>             <field column="affiliateNetwork" name="affiliateNetwork" />
>             <field column="cjMerchantId" name="cjMerchantId" />
>             <field column="thumbNail" name="thumbNail" />
>             <field column="updateRankings" name="updateRankings" />
>             <field column="couponCount" name="couponCount" />
>             <field column="category" name="category" />
>             <field column="adult" name="adult" />
>             <field column="rank" name="rank" />
>             <field column="redirectsTo" name="redirectsTo" />
>             <field column="wwwRequired" name="wwwRequired" />
>             <field column="avgSavings" name="avgSavings" />
>             <field column="products" name="products" />
>             <field column="nameChecked" name="nameChecked" />
>             <field column="tempFlag" name="tempFlag" />
>             <field column="created" name="created" />
>             <field column="enableSplitTesting" name="enableSplitTesting" />
>             <field column="affiliateLinklock" name="affiliateLinklock" />
>             <field column="hasMobileSite" name="hasMobileSite" />
>             <field column="blockSite" name="blockSite" />
>             <entity name="merchant_tags" pk="siteId"
>             query="select raw_tag, freetags.id,
> freetagged_objects.object_id as siteId
>                from freetags
>    inner join freetagged_objects
>    on freetags.id=freetagged_objects.tag_id
>     where freetagged_objects.object_id='${site.siteId}'">
> <field column="raw_tag" name="raw_tag"/>
>      </entity>
>         </entity>
>     </document>
>