You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Briggs Thompson <w....@gmail.com> on 2011/12/01 18:46:12 UTC
DataImportHandler w/ multivalued fields
Hello Solr Community!
I am implementing a data connection to Solr through the Data Import Handler
and non-multivalued fields are working correctly, but multivalued fields
are not getting indexed properly.
I am new to DataImportHandler, but from what I could find, the entity is
the way to go for multivalued field. The weird thing is that data is being
indexed for one row, meaning first raw_tag gets populated.
Anyone have any ideas?
Thanks,
Briggs
This is the relevant part of the schema:
<field name ="raw_tag" type="text_en_lessAggressive" indexed="true"
stored="false" multivalued="true"/>
<field name ="raw_tag_string" type="string" indexed="false"
stored="true" multivalued="true"/>
<copyField source="raw_tag" dest="raw_tag_string"/>
And the relevant part of data-import.xml:
<document name="merchant">
<entity name="site"
query="select * from site ">
<field column="siteId" name="siteId" />
<field column="domain" name="domain" />
<field column="aliasFor" name="aliasFor" />
<field column="title" name="title" />
<field column="description" name="description" />
<field column="requests" name="requests" />
<field column="requiresModeration" name="requiresModeration" />
<field column="blocked" name="blocked" />
<field column="affiliateLink" name="affiliateLink" />
<field column="affiliateTracker" name="affiliateTracker" />
<field column="affiliateNetwork" name="affiliateNetwork" />
<field column="cjMerchantId" name="cjMerchantId" />
<field column="thumbNail" name="thumbNail" />
<field column="updateRankings" name="updateRankings" />
<field column="couponCount" name="couponCount" />
<field column="category" name="category" />
<field column="adult" name="adult" />
<field column="rank" name="rank" />
<field column="redirectsTo" name="redirectsTo" />
<field column="wwwRequired" name="wwwRequired" />
<field column="avgSavings" name="avgSavings" />
<field column="products" name="products" />
<field column="nameChecked" name="nameChecked" />
<field column="tempFlag" name="tempFlag" />
<field column="created" name="created" />
<field column="enableSplitTesting" name="enableSplitTesting" />
<field column="affiliateLinklock" name="affiliateLinklock" />
<field column="hasMobileSite" name="hasMobileSite" />
<field column="blockSite" name="blockSite" />
<entity name="merchant_tags" pk="siteId"
query="select raw_tag, freetags.id,
freetagged_objects.object_id as siteId
from freetags
inner join freetagged_objects
on freetags.id=freetagged_objects.tag_id
where freetagged_objects.object_id='${site.siteId}'">
<field column="raw_tag" name="raw_tag"/>
</entity>
</entity>
</document>
Re: DataImportHandler w/ multivalued fields
Posted by Briggs Thompson <w....@gmail.com>.
Hey Rahul,
Thanks for the response. I actually just figured it thankfully :). To
answer your question, the raw_tag is indexed and not stored (tokenized),
and then there is a copyField for raw_tag to "raw_tag_string" which would
be used for facets. That *should have* been displayed in the results.
The silly mistake I made was not camel casing "multiValued", which is
clearly the source of the problem.
The second email I sent changing the query and using the split for the
multivalued field had an error in it in the form of a missing line:
transformer="RegexTransformer"
in the entity declaration.
Anyhow, thanks for the quick response!
Briggs
On Thu, Dec 1, 2011 at 12:57 PM, Rahul Warawdekar <
rahul.warawdekar@gmail.com> wrote:
> Hi Briggs,
>
> By saying "multivalued fields are not getting indexed prperly", do you mean
> to say that you are not able to search on those fields ?
> Have you tried actually searching your Solr index for those multivalued
> terms and make sure if it returns the search results ?
>
> One possibility could be that the multivalued fields are getting indexed
> correctly and are searchable.
> However, since your schema.xml has a "raw_tag" field whose "stored"
> attribute is set to false, you may not be able to see those fields.
>
>
>
> On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson <
> w.briggs.thompson@gmail.com
> > wrote:
>
> > In addition, I tried a query like below and changed the column definition
> > to
> > <field column="raw_tag" name="raw_tag" splitBy="," />
> > and still no luck. It is indexing the full content now but not
> multivalued.
> > It seems like the "splitBy" ins't working properly.
> >
> > select group_concat(freetags.raw_tag separator ', ') as raw_tag,
> site.*
> > from site
> > left outer join
> > (freetags inner join freetagged_objects)
> > on (freetags.id = freetagged_objects.tag_id
> > and site.siteId = freetagged_objects.object_id)
> > group by site.siteId
> >
> > Am I doing something wrong?
> > Thanks,
> > Briggs Thompson
> >
> > On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson <
> > w.briggs.thompson@gmail.com> wrote:
> >
> > > Hello Solr Community!
> > >
> > > I am implementing a data connection to Solr through the Data Import
> > > Handler and non-multivalued fields are working correctly, but
> multivalued
> > > fields are not getting indexed properly.
> > >
> > > I am new to DataImportHandler, but from what I could find, the entity
> is
> > > the way to go for multivalued field. The weird thing is that data is
> > being
> > > indexed for one row, meaning first raw_tag gets populated.
> > >
> > >
> > > Anyone have any ideas?
> > > Thanks,
> > > Briggs
> > >
> > > This is the relevant part of the schema:
> > >
> > > <field name ="raw_tag" type="text_en_lessAggressive" indexed="true"
> > > stored="false" multivalued="true"/>
> > > <field name ="raw_tag_string" type="string" indexed="false"
> > > stored="true" multivalued="true"/>
> > > <copyField source="raw_tag" dest="raw_tag_string"/>
> > >
> > > And the relevant part of data-import.xml:
> > >
> > > <document name="merchant">
> > > <entity name="site"
> > > query="select * from site ">
> > > <field column="siteId" name="siteId" />
> > > <field column="domain" name="domain" />
> > > <field column="aliasFor" name="aliasFor" />
> > > <field column="title" name="title" />
> > > <field column="description" name="description" />
> > > <field column="requests" name="requests" />
> > > <field column="requiresModeration"
> name="requiresModeration"
> > />
> > > <field column="blocked" name="blocked" />
> > > <field column="affiliateLink" name="affiliateLink" />
> > > <field column="affiliateTracker" name="affiliateTracker" />
> > > <field column="affiliateNetwork" name="affiliateNetwork" />
> > > <field column="cjMerchantId" name="cjMerchantId" />
> > > <field column="thumbNail" name="thumbNail" />
> > > <field column="updateRankings" name="updateRankings" />
> > > <field column="couponCount" name="couponCount" />
> > > <field column="category" name="category" />
> > > <field column="adult" name="adult" />
> > > <field column="rank" name="rank" />
> > > <field column="redirectsTo" name="redirectsTo" />
> > > <field column="wwwRequired" name="wwwRequired" />
> > > <field column="avgSavings" name="avgSavings" />
> > > <field column="products" name="products" />
> > > <field column="nameChecked" name="nameChecked" />
> > > <field column="tempFlag" name="tempFlag" />
> > > <field column="created" name="created" />
> > > <field column="enableSplitTesting"
> name="enableSplitTesting"
> > />
> > > <field column="affiliateLinklock" name="affiliateLinklock"
> />
> > > <field column="hasMobileSite" name="hasMobileSite" />
> > > <field column="blockSite" name="blockSite" />
> > > <entity name="merchant_tags" pk="siteId"
> > > query="select raw_tag, freetags.id,
> > > freetagged_objects.object_id as siteId
> > > from freetags
> > > inner join freetagged_objects
> > > on freetags.id=freetagged_objects.tag_id
> > > where freetagged_objects.object_id='${site.siteId}'">
> > > <field column="raw_tag" name="raw_tag"/>
> > > </entity>
> > > </entity>
> > > </document>
> > >
> >
>
>
>
> --
> Thanks and Regards
> Rahul A. Warawdekar
>
Re: DataImportHandler w/ multivalued fields
Posted by Rahul Warawdekar <ra...@gmail.com>.
Hi Briggs,
By saying "multivalued fields are not getting indexed prperly", do you mean
to say that you are not able to search on those fields ?
Have you tried actually searching your Solr index for those multivalued
terms and make sure if it returns the search results ?
One possibility could be that the multivalued fields are getting indexed
correctly and are searchable.
However, since your schema.xml has a "raw_tag" field whose "stored"
attribute is set to false, you may not be able to see those fields.
On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson <w.briggs.thompson@gmail.com
> wrote:
> In addition, I tried a query like below and changed the column definition
> to
> <field column="raw_tag" name="raw_tag" splitBy="," />
> and still no luck. It is indexing the full content now but not multivalued.
> It seems like the "splitBy" ins't working properly.
>
> select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
> from site
> left outer join
> (freetags inner join freetagged_objects)
> on (freetags.id = freetagged_objects.tag_id
> and site.siteId = freetagged_objects.object_id)
> group by site.siteId
>
> Am I doing something wrong?
> Thanks,
> Briggs Thompson
>
> On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson <
> w.briggs.thompson@gmail.com> wrote:
>
> > Hello Solr Community!
> >
> > I am implementing a data connection to Solr through the Data Import
> > Handler and non-multivalued fields are working correctly, but multivalued
> > fields are not getting indexed properly.
> >
> > I am new to DataImportHandler, but from what I could find, the entity is
> > the way to go for multivalued field. The weird thing is that data is
> being
> > indexed for one row, meaning first raw_tag gets populated.
> >
> >
> > Anyone have any ideas?
> > Thanks,
> > Briggs
> >
> > This is the relevant part of the schema:
> >
> > <field name ="raw_tag" type="text_en_lessAggressive" indexed="true"
> > stored="false" multivalued="true"/>
> > <field name ="raw_tag_string" type="string" indexed="false"
> > stored="true" multivalued="true"/>
> > <copyField source="raw_tag" dest="raw_tag_string"/>
> >
> > And the relevant part of data-import.xml:
> >
> > <document name="merchant">
> > <entity name="site"
> > query="select * from site ">
> > <field column="siteId" name="siteId" />
> > <field column="domain" name="domain" />
> > <field column="aliasFor" name="aliasFor" />
> > <field column="title" name="title" />
> > <field column="description" name="description" />
> > <field column="requests" name="requests" />
> > <field column="requiresModeration" name="requiresModeration"
> />
> > <field column="blocked" name="blocked" />
> > <field column="affiliateLink" name="affiliateLink" />
> > <field column="affiliateTracker" name="affiliateTracker" />
> > <field column="affiliateNetwork" name="affiliateNetwork" />
> > <field column="cjMerchantId" name="cjMerchantId" />
> > <field column="thumbNail" name="thumbNail" />
> > <field column="updateRankings" name="updateRankings" />
> > <field column="couponCount" name="couponCount" />
> > <field column="category" name="category" />
> > <field column="adult" name="adult" />
> > <field column="rank" name="rank" />
> > <field column="redirectsTo" name="redirectsTo" />
> > <field column="wwwRequired" name="wwwRequired" />
> > <field column="avgSavings" name="avgSavings" />
> > <field column="products" name="products" />
> > <field column="nameChecked" name="nameChecked" />
> > <field column="tempFlag" name="tempFlag" />
> > <field column="created" name="created" />
> > <field column="enableSplitTesting" name="enableSplitTesting"
> />
> > <field column="affiliateLinklock" name="affiliateLinklock" />
> > <field column="hasMobileSite" name="hasMobileSite" />
> > <field column="blockSite" name="blockSite" />
> > <entity name="merchant_tags" pk="siteId"
> > query="select raw_tag, freetags.id,
> > freetagged_objects.object_id as siteId
> > from freetags
> > inner join freetagged_objects
> > on freetags.id=freetagged_objects.tag_id
> > where freetagged_objects.object_id='${site.siteId}'">
> > <field column="raw_tag" name="raw_tag"/>
> > </entity>
> > </entity>
> > </document>
> >
>
--
Thanks and Regards
Rahul A. Warawdekar
Re: DataImportHandler w/ multivalued fields
Posted by Briggs Thompson <w....@gmail.com>.
In addition, I tried a query like below and changed the column definition
to
<field column="raw_tag" name="raw_tag" splitBy="," />
and still no luck. It is indexing the full content now but not multivalued.
It seems like the "splitBy" ins't working properly.
select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
from site
left outer join
(freetags inner join freetagged_objects)
on (freetags.id = freetagged_objects.tag_id
and site.siteId = freetagged_objects.object_id)
group by site.siteId
Am I doing something wrong?
Thanks,
Briggs Thompson
On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson <
w.briggs.thompson@gmail.com> wrote:
> Hello Solr Community!
>
> I am implementing a data connection to Solr through the Data Import
> Handler and non-multivalued fields are working correctly, but multivalued
> fields are not getting indexed properly.
>
> I am new to DataImportHandler, but from what I could find, the entity is
> the way to go for multivalued field. The weird thing is that data is being
> indexed for one row, meaning first raw_tag gets populated.
>
>
> Anyone have any ideas?
> Thanks,
> Briggs
>
> This is the relevant part of the schema:
>
> <field name ="raw_tag" type="text_en_lessAggressive" indexed="true"
> stored="false" multivalued="true"/>
> <field name ="raw_tag_string" type="string" indexed="false"
> stored="true" multivalued="true"/>
> <copyField source="raw_tag" dest="raw_tag_string"/>
>
> And the relevant part of data-import.xml:
>
> <document name="merchant">
> <entity name="site"
> query="select * from site ">
> <field column="siteId" name="siteId" />
> <field column="domain" name="domain" />
> <field column="aliasFor" name="aliasFor" />
> <field column="title" name="title" />
> <field column="description" name="description" />
> <field column="requests" name="requests" />
> <field column="requiresModeration" name="requiresModeration" />
> <field column="blocked" name="blocked" />
> <field column="affiliateLink" name="affiliateLink" />
> <field column="affiliateTracker" name="affiliateTracker" />
> <field column="affiliateNetwork" name="affiliateNetwork" />
> <field column="cjMerchantId" name="cjMerchantId" />
> <field column="thumbNail" name="thumbNail" />
> <field column="updateRankings" name="updateRankings" />
> <field column="couponCount" name="couponCount" />
> <field column="category" name="category" />
> <field column="adult" name="adult" />
> <field column="rank" name="rank" />
> <field column="redirectsTo" name="redirectsTo" />
> <field column="wwwRequired" name="wwwRequired" />
> <field column="avgSavings" name="avgSavings" />
> <field column="products" name="products" />
> <field column="nameChecked" name="nameChecked" />
> <field column="tempFlag" name="tempFlag" />
> <field column="created" name="created" />
> <field column="enableSplitTesting" name="enableSplitTesting" />
> <field column="affiliateLinklock" name="affiliateLinklock" />
> <field column="hasMobileSite" name="hasMobileSite" />
> <field column="blockSite" name="blockSite" />
> <entity name="merchant_tags" pk="siteId"
> query="select raw_tag, freetags.id,
> freetagged_objects.object_id as siteId
> from freetags
> inner join freetagged_objects
> on freetags.id=freetagged_objects.tag_id
> where freetagged_objects.object_id='${site.siteId}'">
> <field column="raw_tag" name="raw_tag"/>
> </entity>
> </entity>
> </document>
>