You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kumaran Ramasubramanian <ku...@gmail.com> on 2016/08/30 06:38:39 UTC

parent-child relationship in lucene - to avoid reindexing if parent information changes

Hi All,


Am building a sample application, where a group of members can interact as
a chat room. i am trying to enable search for message level search...


If i denormalize group_name & group_members in every lucene document, then
below cases will reindex more number of lucene documents...

1. editing group name
2. adding / deleting a member


So am trying to index group_name, group_members(member ids as csv)  as
parent and every text message & message_id as child.
By using parent & child, i am trying to solve 1 * m cases...
If there are 1 lakh messages under one parent, how to delete a member id or
edit a group name without reindexing of its children??


is it possible to avoid reindexing? Which lucene class is best fit for
this?

Related Article:
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html




--
Kumaran R

Re: parent-child relationship in lucene - to avoid reindexing if parent information changes

Posted by Ralph Soika <ra...@imixs.com>.
Hi Kumaran,

I don't think that this would take much time. If you just have stored 
the GroupID in your message and you query first all group-IDs where a 
specific user is member of, you can easily select in a next step all 
messages. We do something similar in our own project.

1) query - get all groups the user [YOUR-USER-ID] is member of
(doctype:"group" AND groupmember:[YOUR-USER-ID])

You receive a list of groupIDs - I would guess typically not more that 
10 per a user. So the query is quite fast.

2.) query - search all messages for the selected groups:
(doctype:"message" AND (messagegroup:[ID-1] OR messagegroup:[ID-2] OR 
messagegroup:[ID-3] ......)


If you add/remove a user into a group, this dose not affect your messages.

Best regards
Ralph




On 30.08.2016 18:57, Kumaran Ramasubramanian wrote:
> Hi Ralph
>
>          Thank you for the response.. yes, It is one of the work-around..
> While searching, what you have suggested is costly and also it takes more
> time if number of groups is more (we can use query time join?? )..
>
> Also, my second problem remains same.( adding a member to a group ).
> Because, i want to make all existing messages in a group as visible to any
> new member... so i need to reindex all messages with that newly added
> member id..
>
> Is index time join (for second case ) or query time join ( for first case )
> can be best fit?
>
> --
> Kumaran R
>
>
>
>
>
>
> On Tue, Aug 30, 2016 at 1:55 PM, Ralph Soika <ra...@imixs.com> wrote:
>
>> Hi,
>>
>> I think this is more a problem of the data model.
>> You should not link a message to a group by the group name. Instead use a
>> GroupID (which is unique) to refer to the group. The GroupID is a
>> 'non-analyzed' and 'not-stored' field in your lucene document.
>>
>> Then, when you want to search for all messages assigned to groups the user
>> is member of, first search for the groups where the user is member to get
>> the id, and next search all messages with that ids.
>>
>> So there should no need to reindex.
>>
>> ===
>> Ralph
>>
>>
>>
>> On 30.08.2016 08:38, Kumaran Ramasubramanian wrote:
>>
>>> Hi All,
>>>
>>>
>>> Am building a sample application, where a group of members can interact as
>>> a chat room. i am trying to enable search for message level search...
>>>
>>>
>>> If i denormalize group_name & group_members in every lucene document, then
>>> below cases will reindex more number of lucene documents...
>>>
>>> 1. editing group name
>>> 2. adding / deleting a member
>>>
>>>
>>> So am trying to index group_name, group_members(member ids as csv)  as
>>> parent and every text message & message_id as child.
>>> By using parent & child, i am trying to solve 1 * m cases...
>>> If there are 1 lakh messages under one parent, how to delete a member id
>>> or
>>> edit a group name without reindexing of its children??
>>>
>>>
>>> is it possible to avoid reindexing? Which lucene class is best fit for
>>> this?
>>>
>>> Related Article:
>>> http://blog.mikemccandless.com/2012/01/searching-relational-
>>> content-with.html
>>>
>>>
>>>
>>>
>>> --
>>> Kumaran R
>>>
>>>
>> --
>> *Imixs*...extends the way people work together
>> We are an open source company, read more at: www.imixs.org <
>> http://www.imixs.org>
>> ------------------------------------------------------------------------
>> Imixs Software Solutions GmbH
>> Agnes-Pockels-Bogen 1, 80992 Mnchen
>> *Web:* www.imixs.com <http://www.imixs.com>
>> *Office:* +49 (0)89-452136 16 *Mobil:* +49-177-4128245
>> Registergericht: Amtsgericht Muenchen, HRB 136045
>> Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika
>>
>>


-- 
*Imixs*...extends the way people work together
We are an open source company, read more at: www.imixs.org 
<http://www.imixs.org>
------------------------------------------------------------------------
Imixs Software Solutions GmbH
Agnes-Pockels-Bogen 1, 80992 Mnchen
*Web:* www.imixs.com <http://www.imixs.com>
*Office:* +49 (0)89-452136 16 *Mobil:* +49-177-4128245
Registergericht: Amtsgericht Muenchen, HRB 136045
Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika


Re: parent-child relationship in lucene - to avoid reindexing if parent information changes

Posted by Kumaran Ramasubramanian <ku...@gmail.com>.
Hi Ralph

        Thank you for the response.. yes, It is one of the work-around..
While searching, what you have suggested is costly and also it takes more
time if number of groups is more (we can use query time join?? )..

Also, my second problem remains same.( adding a member to a group ).
Because, i want to make all existing messages in a group as visible to any
new member... so i need to reindex all messages with that newly added
member id..

Is index time join (for second case ) or query time join ( for first case )
can be best fit?

--
Kumaran R






On Tue, Aug 30, 2016 at 1:55 PM, Ralph Soika <ra...@imixs.com> wrote:

> Hi,
>
> I think this is more a problem of the data model.
> You should not link a message to a group by the group name. Instead use a
> GroupID (which is unique) to refer to the group. The GroupID is a
> 'non-analyzed' and 'not-stored' field in your lucene document.
>
> Then, when you want to search for all messages assigned to groups the user
> is member of, first search for the groups where the user is member to get
> the id, and next search all messages with that ids.
>
> So there should no need to reindex.
>
> ===
> Ralph
>
>
>
> On 30.08.2016 08:38, Kumaran Ramasubramanian wrote:
>
>> Hi All,
>>
>>
>> Am building a sample application, where a group of members can interact as
>> a chat room. i am trying to enable search for message level search...
>>
>>
>> If i denormalize group_name & group_members in every lucene document, then
>> below cases will reindex more number of lucene documents...
>>
>> 1. editing group name
>> 2. adding / deleting a member
>>
>>
>> So am trying to index group_name, group_members(member ids as csv)  as
>> parent and every text message & message_id as child.
>> By using parent & child, i am trying to solve 1 * m cases...
>> If there are 1 lakh messages under one parent, how to delete a member id
>> or
>> edit a group name without reindexing of its children??
>>
>>
>> is it possible to avoid reindexing? Which lucene class is best fit for
>> this?
>>
>> Related Article:
>> http://blog.mikemccandless.com/2012/01/searching-relational-
>> content-with.html
>>
>>
>>
>>
>> --
>> Kumaran R
>>
>>
>
> --
> *Imixs*...extends the way people work together
> We are an open source company, read more at: www.imixs.org <
> http://www.imixs.org>
> ------------------------------------------------------------------------
> Imixs Software Solutions GmbH
> Agnes-Pockels-Bogen 1, 80992 München
> *Web:* www.imixs.com <http://www.imixs.com>
> *Office:* +49 (0)89-452136 16 *Mobil:* +49-177-4128245
> Registergericht: Amtsgericht Muenchen, HRB 136045
> Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika
>
>

Re: parent-child relationship in lucene - to avoid reindexing if parent information changes

Posted by Ralph Soika <ra...@imixs.com>.
Hi,

I think this is more a problem of the data model.
You should not link a message to a group by the group name. Instead use 
a GroupID (which is unique) to refer to the group. The GroupID is a 
'non-analyzed' and 'not-stored' field in your lucene document.

Then, when you want to search for all messages assigned to groups the 
user is member of, first search for the groups where the user is member 
to get the id, and next search all messages with that ids.

So there should no need to reindex.

===
Ralph


On 30.08.2016 08:38, Kumaran Ramasubramanian wrote:
> Hi All,
>
>
> Am building a sample application, where a group of members can interact as
> a chat room. i am trying to enable search for message level search...
>
>
> If i denormalize group_name & group_members in every lucene document, then
> below cases will reindex more number of lucene documents...
>
> 1. editing group name
> 2. adding / deleting a member
>
>
> So am trying to index group_name, group_members(member ids as csv)  as
> parent and every text message & message_id as child.
> By using parent & child, i am trying to solve 1 * m cases...
> If there are 1 lakh messages under one parent, how to delete a member id or
> edit a group name without reindexing of its children??
>
>
> is it possible to avoid reindexing? Which lucene class is best fit for
> this?
>
> Related Article:
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
>
>
>
>
> --
> Kumaran R
>


-- 
*Imixs*...extends the way people work together
We are an open source company, read more at: www.imixs.org 
<http://www.imixs.org>
------------------------------------------------------------------------
Imixs Software Solutions GmbH
Agnes-Pockels-Bogen 1, 80992 M�nchen
*Web:* www.imixs.com <http://www.imixs.com>
*Office:* +49 (0)89-452136 16 *Mobil:* +49-177-4128245
Registergericht: Amtsgericht Muenchen, HRB 136045
Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika