You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Avlesh Singh <av...@gmail.com> on 2009/11/01 02:55:58 UTC
Re: best way to model 1-N
>
> what am I missing?
>
Change your <entity name="category" query="select cfcr.feedId ..."> to
<entity name="category" *transformer="RegexTransformer"* query="select
cfcr.feedId .."> The "splitBy" directive is understood by this transformer
and in your case the attribute was simply ignored.
Don't forget to re-index once you have changed.
Cheers
Avlesh
On Fri, Oct 30, 2009 at 9:33 PM, Joel Nylund <jn...@yahoo.com> wrote:
> Thanks Chantal, I will keep that in mind for tuning,
>
> for sql I figured way to combine them into one row using concat, but I
> still seem to be having an issue splitting them:
>
> Db now returns as one column categoryType:
> TOPIC,LANGUAGE
>
> but my solr result, if you note the item in categoryType all seem to be
> within one str, I would expect it to be in multiple strings within the
> array, is this assumption wrong?
>
> <doc>
> -
> <arr name="categoryType">
> <str>TOPIC,LANGUAGE</str>
> </arr>
> <str name="id">40</str>
> <str name="title">feed title</str>
> </doc>
>
>
> Here is my import:
> <document name="doc">
> <entity name="item"
> query="SELECT f.id, f.title
> FROM Feed f
> <field column="id" name="id" />
> <field column="title" name="title" />
> <entity name="category" query="select cfcr.feedId,
> group_concat(cfcr.categoryType) as categoryType
> from CFR cfcr
> where
> cfcr.feedId = '${item.id}'
> AND
> group by cfcr.feedId">
> <field column="categoryType"
> name="categoryType" splityBy="," />
> </entity>
>
> </entity>
>
> In schema:
> <field name="categoryType" type="text" indexed="true" stored="true"
> required="false" multiValued="true"/>
> <field name="categoryName" type="text" indexed="true" stored="true"
> required="false" multiValued="true"/>
>
>
> what am I missing?
>
> thanks
> Joel
>
>
>
> On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote:
>
> That depends a bit on your database, but it is tricky and might not be
>> performant.
>>
>> If you are more of a Java developer, you might prefer retrieving mutliple
>> rows per SOLR document from your dataSource (join on your category and main
>> table), and aggregate them in your custom EntityProcessor. I got a far(!)
>> better performance retrieving everything in one query and doing the
>> aggregation in Java. But this is, of course, depending on your table
>> structure and data.
>>
>> Noble Paul helped me with the custom EntityProcessor, and it turned out
>> quite easy. Have a look at the thread with the heading from this mailing
>> list (SOLR-USER):
>> DataImportHandler / Import from DB : one data set comes in multiple rows
>>
>> Cheers,
>> Chantal
>>
>>
>> Joel Nylund schrieb:
>>
>>> thanks, but im confused how I can aggregate across rows, I dont know
>>> of any easy way to get my db to return one row for all the categories
>>> (given the hint from your other email), I have split the category
>>> query into a separate entity, but its returning multiple rows, how do
>>> I combine multiple rows into 1 index entity?
>>> thanks
>>> Joel
>>> On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:
>>>
>>>> In the database this is modeled a a 1-N where category table has the
>>>>> mapping of feed to category
>>>>> I need to be able to query , give me all the feeds in any given
>>>>> category.
>>>>> How can I best model this in solr?
>>>>> Seems like multiValued field might help, but how would I populate
>>>>> it, and
>>>>> would the query above work?.
>>>>>
>>>>> Yes you are right. A multivalued field for "categories" is the answer.
>>>>
>>>> For populating in the index -
>>>>
>>>> 1. If you use DIH to populate your indexes and your datasource is a
>>>> database then you can use DIH's RegexTransformer on an aggregated
>>>> list of
>>>> categories. e.g. if your database query retruns "a,b,c,d" in a
>>>> column called
>>>> "db_categories", this is how you would put it in DIH's data-config
>>>> file -
>>>> <field column="db_categories" name="categories" splityBy="," />.
>>>> 2. If you "add" documents to Solr yourself multiple values for
>>>> the field
>>>> can be specified as an array or list of values in the
>>>> SolrInputDocument.
>>>>
>>>> A multivalued field provides the same faceting and searching
>>>> capabilites
>>>> like regular fields. There is no special syntax.
>>>>
>>>> Cheers
>>>> Avlesh
>>>>
>>>> On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund <jn...@yahoo.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I have one index so far which contains feeds. I have been able to
>>>>> de-normalize several tables and map this data onto the feed entity.
>>>>> There is
>>>>> one tricky problem that I need help on.
>>>>>
>>>>> Feeds have 1 - many categories.
>>>>>
>>>>> So Lets say we have Category1, Category2 and Category3
>>>>>
>>>>> Feed 1 - is in Category 1
>>>>> Feed 2 is in category2 and category3
>>>>> Feed 3 is in category2
>>>>> Feed 4 has no category
>>>>>
>>>>> In the database this is modeled a a 1-N where category table has the
>>>>> mapping of feed to category
>>>>>
>>>>> I need to be able to query , give me all the feeds in any given
>>>>> category.
>>>>>
>>>>> How can I best model this in solr?
>>>>>
>>>>> Seems like multiValued field might help, but how would I populate
>>>>> it, and
>>>>> would the query above work?.
>>>>>
>>>>> thanks
>>>>> Joel
>>>>>
>>>>>
>>>>>
>