You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Avlesh Singh <av...@gmail.com> on 2009/11/01 02:55:58 UTC

Re: best way to model 1-N

>
> what am I missing?
>
Change your <entity name="category" query="select cfcr.feedId ..."> to
<entity name="category" *transformer="RegexTransformer"* query="select
cfcr.feedId .."> The "splitBy" directive is understood by this transformer
and in your case the attribute was simply ignored.

Don't forget to re-index once you have changed.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 9:33 PM, Joel Nylund <jn...@yahoo.com> wrote:

> Thanks Chantal, I will keep that in mind for tuning,
>
> for sql I figured  way to combine them into one row using concat, but I
> still seem to be having an issue splitting them:
>
> Db now returns as one column categoryType:
> TOPIC,LANGUAGE
>
> but my solr result, if you note the item in categoryType  all seem to be
> within one str, I would expect it to be in multiple strings within the
> array, is this assumption wrong?
>
> <doc>
> -
> <arr name="categoryType">
> <str>TOPIC,LANGUAGE</str>
> </arr>
> <str name="id">40</str>
> <str name="title">feed title</str>
> </doc>
>
>
> Here is my import:
>  <document name="doc">
>        <entity name="item"
>       query="SELECT f.id, f.title
>                FROM Feed f
>            <field column="id" name="id" />
>            <field column="title" name="title" />
>                        <entity name="category" query="select cfcr.feedId,
> group_concat(cfcr.categoryType) as categoryType
>                                                from CFR cfcr
>                                                where
>                                                cfcr.feedId = '${item.id}'
> AND
>                                                group by cfcr.feedId">
>                                        <field column="categoryType"
> name="categoryType" splityBy="," />
>                    </entity>
>
>     </entity>
>
> In schema:
>        <field name="categoryType" type="text" indexed="true" stored="true"
> required="false" multiValued="true"/>
>        <field name="categoryName" type="text" indexed="true" stored="true"
> required="false" multiValued="true"/>
>
>
> what am I missing?
>
> thanks
> Joel
>
>
>
> On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote:
>
>  That depends a bit on your database, but it is tricky and might not be
>> performant.
>>
>> If you are more of a Java developer, you might prefer retrieving mutliple
>> rows per SOLR document from your dataSource (join on your category and main
>> table), and aggregate them in your custom EntityProcessor. I got a far(!)
>> better performance retrieving everything in one query and doing the
>> aggregation in Java. But this is, of course, depending on your table
>> structure and data.
>>
>> Noble Paul helped me with the custom EntityProcessor, and it turned out
>> quite easy. Have a look at the thread with the heading from this mailing
>> list (SOLR-USER):
>> DataImportHandler / Import from DB : one data set comes in multiple rows
>>
>> Cheers,
>> Chantal
>>
>>
>> Joel Nylund schrieb:
>>
>>> thanks, but im confused how I can aggregate across rows, I dont know
>>> of any easy way to get my db to return one row for all the categories
>>> (given the hint from your other email), I have split the category
>>> query into a separate entity, but its returning multiple rows, how do
>>> I combine multiple rows into 1 index entity?
>>> thanks
>>> Joel
>>> On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:
>>>
>>>> In the database this is modeled a a 1-N where category table has the
>>>>> mapping of feed to category
>>>>> I need to be able to query , give me all the feeds in any given
>>>>> category.
>>>>> How can I best model this in solr?
>>>>> Seems like multiValued field might help, but how would I populate
>>>>> it, and
>>>>> would the query above work?.
>>>>>
>>>>>  Yes you are right. A multivalued field for "categories" is the answer.
>>>>
>>>> For populating in the index -
>>>>
>>>>  1. If you use DIH to populate your indexes and your datasource is a
>>>>  database then you can use DIH's RegexTransformer on an aggregated
>>>> list of
>>>>  categories. e.g. if your database query retruns "a,b,c,d" in a
>>>> column called
>>>>  "db_categories", this is how you would put it in DIH's data-config
>>>> file -
>>>>  <field column="db_categories" name="categories" splityBy="," />.
>>>>  2. If you "add" documents to Solr yourself  multiple values for
>>>> the field
>>>>  can be specified as an array or list of values in the
>>>> SolrInputDocument.
>>>>
>>>> A multivalued field provides the same faceting and searching
>>>> capabilites
>>>> like regular fields. There is no special syntax.
>>>>
>>>> Cheers
>>>> Avlesh
>>>>
>>>> On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund <jn...@yahoo.com>
>>>> wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> I have one index so far which contains feeds.  I have been able to
>>>>> de-normalize several tables and map this data onto the feed entity.
>>>>> There is
>>>>> one tricky problem that I need help on.
>>>>>
>>>>> Feeds have 1 - many categories.
>>>>>
>>>>> So Lets say we have Category1, Category2 and Category3
>>>>>
>>>>> Feed 1 - is in Category 1
>>>>> Feed 2 is in category2 and category3
>>>>> Feed 3 is in category2
>>>>> Feed 4 has no category
>>>>>
>>>>> In the database this is modeled a a 1-N where category table has the
>>>>> mapping of feed to category
>>>>>
>>>>> I need to be able to query , give me all the feeds in any given
>>>>> category.
>>>>>
>>>>> How can I best model this in solr?
>>>>>
>>>>> Seems like multiValued field might help, but how would I populate
>>>>> it, and
>>>>> would the query above work?.
>>>>>
>>>>> thanks
>>>>> Joel
>>>>>
>>>>>
>>>>>
>