You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Varun Thacker <va...@vthacker.in> on 2018/06/15 11:22:31 UTC

Do we need the MODIFYCOLLECTION Api?

Today the Modify Collection supports the following properties to be modified

   1. maxShardsPerNode
   2. rule
   3. snitch
   4. policy
   5. collection.configName
   6. autoAddReplicas
   7. replicationFactor

1-4 seems something we should get rid of because we have the AutoScaling
Policy framework?

5> Can anyone point out the use-case for this?

6> autoAddReplicas can be changed as a clusterprop and modify-collection
API ? Hmm. Which one is supposed to win?

7> We need to allow a user to change replicationFactor. But how does this
help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this
sounds just confusing? Or allow changing all replica types ?

Re: Do we need the MODIFYCOLLECTION Api?

Posted by Varun Thacker <va...@vthacker.in>.
So let's keep *collection.configName* and *replicationFactor*.

If we were to think of this API today , would MODIFYCOLLECTION be where we
still put it?

It almost feels like a collection setting. Maybe Collection Properties
( SOLR-11960 ) is where it should live?


On Fri, Jun 15, 2018 at 4:58 PM, Erick Erickson <er...@gmail.com>
wrote:

> re: collection.configName
>
> bq. Right and then basically we are giving a way for users to shoot
> themselves in the foot :)
>
> They can also delete their index files....
>
> Seriously though, what if I have a bunch of collections sharing a
> configset then I need to specialize only one by _adding_ fields? I'd
> like to copy the configset to a new one and then point my collection
> at it. And with the UninvertingMergePolicy adding DV would be one such
> specialization.
>
> I've also seen time-series collections (let's say 30 days) where you
> _cannot_ reindex. But you want to modify your schema anyway. People
> have
> 1> defined a new field that's a variant of the old field
> 2> have their indexing program index to _both_ for 30 days
> 3> change the app to use the new field
> 4> change the indexing program to stop indexing to the old field
>
> Sure, the metadata for the field is still carried along but that's not
> a problem for a few fields.
>
> Point is it's dangerous to go changing your configset for an existing
> collection, sure. But I find the API a better option than having to
> manually edit your ZK nodes.
>
> FWIW
>
> On Fri, Jun 15, 2018 at 7:18 AM, Varun Thacker <va...@vthacker.in> wrote:
> > Hi Jan,
> >
> > I agree with how your thinking of replicationFactor as basically being a
> > equivalent to nrtReplicas . Let's not change that.
> >
> > so the is #7 the real only use for this API?
> >
> > On Fri, Jun 15, 2018 at 1:46 PM, Jan Høydahl <ja...@cominvent.com>
> wrote:
> >>
> >> Do we have a v2 API for CREATE and MODIFYCOLLECTION? E.g.
> >>
> >> POST http://localhost:8983/api/c
> >> { modify-collection: { replicationFactor: 3 } }
> >>
> >> Perhaps we should focus on a decent v2 API and deprecate the old
> confusing
> >> one?
> >>
> >> wrt. replicationFactor / nrtReplica / pullReplicas / tlogReplicas, my
> wish
> >> is that replicationFactor keeps on living as today, only setting
> >> nrtReplicas, and is mutually exclusive to any of the three others. So
> if you
> >> have a collection with tlogReplicas defined, then modifying
> >> "replicationFactor" should throw and error. But if you only ever care
> about
> >> NRT replicas then you can keep using replicationFactor as before???
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >> 15. jun. 2018 kl. 13:22 skrev Varun Thacker <va...@vthacker.in>:
> >>
> >> Today the Modify Collection supports the following properties to be
> >> modified
> >>
> >> maxShardsPerNode
> >> rule
> >> snitch
> >> policy
> >> collection.configName
> >> autoAddReplicas
> >> replicationFactor
> >>
> >> 1-4 seems something we should get rid of because we have the AutoScaling
> >> Policy framework?
> >>
> >> 5> Can anyone point out the use-case for this?
> >>
> >> 6> autoAddReplicas can be changed as a clusterprop and modify-collection
> >> API ? Hmm. Which one is supposed to win?
> >>
> >> 7> We need to allow a user to change replicationFactor. But how does
> this
> >> help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this
> >> sounds just confusing? Or allow changing all replica types ?
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Do we need the MODIFYCOLLECTION Api?

Posted by Erick Erickson <er...@gmail.com>.
re: collection.configName

bq. Right and then basically we are giving a way for users to shoot
themselves in the foot :)

They can also delete their index files....

Seriously though, what if I have a bunch of collections sharing a
configset then I need to specialize only one by _adding_ fields? I'd
like to copy the configset to a new one and then point my collection
at it. And with the UninvertingMergePolicy adding DV would be one such
specialization.

I've also seen time-series collections (let's say 30 days) where you
_cannot_ reindex. But you want to modify your schema anyway. People
have
1> defined a new field that's a variant of the old field
2> have their indexing program index to _both_ for 30 days
3> change the app to use the new field
4> change the indexing program to stop indexing to the old field

Sure, the metadata for the field is still carried along but that's not
a problem for a few fields.

Point is it's dangerous to go changing your configset for an existing
collection, sure. But I find the API a better option than having to
manually edit your ZK nodes.

FWIW

On Fri, Jun 15, 2018 at 7:18 AM, Varun Thacker <va...@vthacker.in> wrote:
> Hi Jan,
>
> I agree with how your thinking of replicationFactor as basically being a
> equivalent to nrtReplicas . Let's not change that.
>
> so the is #7 the real only use for this API?
>
> On Fri, Jun 15, 2018 at 1:46 PM, Jan Høydahl <ja...@cominvent.com> wrote:
>>
>> Do we have a v2 API for CREATE and MODIFYCOLLECTION? E.g.
>>
>> POST http://localhost:8983/api/c
>> { modify-collection: { replicationFactor: 3 } }
>>
>> Perhaps we should focus on a decent v2 API and deprecate the old confusing
>> one?
>>
>> wrt. replicationFactor / nrtReplica / pullReplicas / tlogReplicas, my wish
>> is that replicationFactor keeps on living as today, only setting
>> nrtReplicas, and is mutually exclusive to any of the three others. So if you
>> have a collection with tlogReplicas defined, then modifying
>> "replicationFactor" should throw and error. But if you only ever care about
>> NRT replicas then you can keep using replicationFactor as before???
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 15. jun. 2018 kl. 13:22 skrev Varun Thacker <va...@vthacker.in>:
>>
>> Today the Modify Collection supports the following properties to be
>> modified
>>
>> maxShardsPerNode
>> rule
>> snitch
>> policy
>> collection.configName
>> autoAddReplicas
>> replicationFactor
>>
>> 1-4 seems something we should get rid of because we have the AutoScaling
>> Policy framework?
>>
>> 5> Can anyone point out the use-case for this?
>>
>> 6> autoAddReplicas can be changed as a clusterprop and modify-collection
>> API ? Hmm. Which one is supposed to win?
>>
>> 7> We need to allow a user to change replicationFactor. But how does this
>> help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this
>> sounds just confusing? Or allow changing all replica types ?
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Do we need the MODIFYCOLLECTION Api?

Posted by Varun Thacker <va...@vthacker.in>.
Hi Jan,

I agree with how your thinking of replicationFactor as basically being a
equivalent to nrtReplicas . Let's not change that.

so the is #7 the real only use for this API?

On Fri, Jun 15, 2018 at 1:46 PM, Jan Høydahl <ja...@cominvent.com> wrote:

> Do we have a v2 API for CREATE and MODIFYCOLLECTION? E.g.
>
> POST http://localhost:8983/api/c
> { modify-collection: { replicationFactor: 3 } }
>
> Perhaps we should focus on a decent v2 API and deprecate the old confusing
> one?
>
> wrt. replicationFactor / nrtReplica / pullReplicas / tlogReplicas, my wish
> is that replicationFactor keeps on living as today, only setting
> nrtReplicas, and is mutually exclusive to any of the three others. So if
> you have a collection with tlogReplicas defined, then modifying
> "replicationFactor" should throw and error. But if you only ever care about
> NRT replicas then you can keep using replicationFactor as before???
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 15. jun. 2018 kl. 13:22 skrev Varun Thacker <va...@vthacker.in>:
>
> Today the Modify Collection supports the following properties to be
> modified
>
>    1. maxShardsPerNode
>    2. rule
>    3. snitch
>    4. policy
>    5. collection.configName
>    6. autoAddReplicas
>    7. replicationFactor
>
> 1-4 seems something we should get rid of because we have the AutoScaling
> Policy framework?
>
> 5> Can anyone point out the use-case for this?
>
> 6> autoAddReplicas can be changed as a clusterprop and modify-collection
> API ? Hmm. Which one is supposed to win?
>
> 7> We need to allow a user to change replicationFactor. But how does this
> help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this
> sounds just confusing? Or allow changing all replica types ?
>
>
>

Re: Do we need the MODIFYCOLLECTION Api?

Posted by Jan Høydahl <ja...@cominvent.com>.
Do we have a v2 API for CREATE and MODIFYCOLLECTION? E.g.

POST http://localhost:8983/api/c <http://localhost:8983/api/c> 
{ modify-collection: { replicationFactor: 3 } }

Perhaps we should focus on a decent v2 API and deprecate the old confusing one?

wrt. replicationFactor / nrtReplica / pullReplicas / tlogReplicas, my wish is that replicationFactor keeps on living as today, only setting nrtReplicas, and is mutually exclusive to any of the three others. So if you have a collection with tlogReplicas defined, then modifying "replicationFactor" should throw and error. But if you only ever care about NRT replicas then you can keep using replicationFactor as before???

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 15. jun. 2018 kl. 13:22 skrev Varun Thacker <va...@vthacker.in>:
> 
> Today the Modify Collection supports the following properties to be modified
> maxShardsPerNode
> rule
> snitch
> policy
> collection.configName
> autoAddReplicas
> replicationFactor
> 1-4 seems something we should get rid of because we have the AutoScaling Policy framework?
> 
> 5> Can anyone point out the use-case for this?
> 
> 6> autoAddReplicas can be changed as a clusterprop and modify-collection API ? Hmm. Which one is supposed to win?
> 
> 7> We need to allow a user to change replicationFactor. But how does this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this sounds just confusing? Or allow changing all replica types ? 


Re: Do we need the MODIFYCOLLECTION Api?

Posted by Varun Thacker <va...@vthacker.in>.
Thanks everyone! I've created SOLR-12498 and linked it to this mailing list
thread

On Tue, Jun 19, 2018 at 8:14 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

>
>
> On Fri, Jun 15, 2018 at 7:47 PM Varun Thacker <va...@vthacker.in> wrote:
>
>>
>>
>> On Fri, Jun 15, 2018 at 2:44 PM, David Smiley <da...@gmail.com>
>> wrote:
>>
>>> +1 to get rid of #1, #2, #3, #7.
>>>
>>> Maybe I'm mistaken but I thought "policy" was a part of the auto scaling
>>> framework?
>>>
>>
>> Yeah. And http://lucene.apache.org/solr/guide/solrcloud-
>> autoscaling-api.html#create-and-modify-cluster-policies seems like the
>> way to modify it.  So I wonder why should modifycollection support it?
>> Maybe Noble , AB or Shalin could confirm?
>>
>
> The policy is indeed part of the auto scaling framework but the support in
> modify collection is to be able to switch policy for a collection. For
> example, say you have policy1 which you associated with collection xyz at
> creation time using the "usePolicy" parameter. Now if you want to change
> the collection to use policy2 instead then modify collection API is the way
> to go. IMO, we need support for this API even though certain parameters are
> ready to be deprecated.
>
>
>>
>>
>>> Maybe the capability for autoAddReplicas should be considered an aspect
>>> of the auto scaling framework instead of a collection setting, and thus we
>>> could remove it here?
>>>
>>
>> Yeah I'd love for that to happen. It's even tied to triggers etc so seems
>> like it should be enabled/disabled via the autoscaling API
>>
>>>
>>> I think the ability to modify collection.configName seems useful albeit
>>> rare to use in practice.  Perhaps you want to try out a bunch of changes
>>> and want to easily roll back.  You could create a config with those
>>> modifications, try it out, and if you don't like the results then point
>>> your config back to the original.  Although In practice it may not always
>>> be possible to just switch configs since a reindex may be required.
>>>
>>
>> Right and then basically we are giving a way for users to shoot
>> themselves in the foot :)
>>
>>>
>>>
>>> On Fri, Jun 15, 2018 at 7:22 AM Varun Thacker <va...@vthacker.in> wrote:
>>>
>>>> Today the Modify Collection supports the following properties to be
>>>> modified
>>>>
>>>>    1. maxShardsPerNode
>>>>    2. rule
>>>>    3. snitch
>>>>    4. policy
>>>>    5. collection.configName
>>>>    6. autoAddReplicas
>>>>    7. replicationFactor
>>>>
>>>> 1-4 seems something we should get rid of because we have the
>>>> AutoScaling Policy framework?
>>>>
>>>> 5> Can anyone point out the use-case for this?
>>>>
>>>> 6> autoAddReplicas can be changed as a clusterprop and
>>>> modify-collection API ? Hmm. Which one is supposed to win?
>>>>
>>>> 7> We need to allow a user to change replicationFactor. But how does
>>>> this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing
>>>> this sounds just confusing? Or allow changing all replica types ?
>>>>
>>> --
>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.
>>> solrenterprisesearchserver.com
>>>
>>
>>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Do we need the MODIFYCOLLECTION Api?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Jun 15, 2018 at 7:47 PM Varun Thacker <va...@vthacker.in> wrote:

>
>
> On Fri, Jun 15, 2018 at 2:44 PM, David Smiley <da...@gmail.com>
> wrote:
>
>> +1 to get rid of #1, #2, #3, #7.
>>
>> Maybe I'm mistaken but I thought "policy" was a part of the auto scaling
>> framework?
>>
>
> Yeah. And
> http://lucene.apache.org/solr/guide/solrcloud-autoscaling-api.html#create-and-modify-cluster-policies
> seems like the way to modify it.  So I wonder why should modifycollection
> support it?
> Maybe Noble , AB or Shalin could confirm?
>

The policy is indeed part of the auto scaling framework but the support in
modify collection is to be able to switch policy for a collection. For
example, say you have policy1 which you associated with collection xyz at
creation time using the "usePolicy" parameter. Now if you want to change
the collection to use policy2 instead then modify collection API is the way
to go. IMO, we need support for this API even though certain parameters are
ready to be deprecated.


>
>
>> Maybe the capability for autoAddReplicas should be considered an aspect
>> of the auto scaling framework instead of a collection setting, and thus we
>> could remove it here?
>>
>
> Yeah I'd love for that to happen. It's even tied to triggers etc so seems
> like it should be enabled/disabled via the autoscaling API
>
>>
>> I think the ability to modify collection.configName seems useful albeit
>> rare to use in practice.  Perhaps you want to try out a bunch of changes
>> and want to easily roll back.  You could create a config with those
>> modifications, try it out, and if you don't like the results then point
>> your config back to the original.  Although In practice it may not always
>> be possible to just switch configs since a reindex may be required.
>>
>
> Right and then basically we are giving a way for users to shoot themselves
> in the foot :)
>
>>
>>
>> On Fri, Jun 15, 2018 at 7:22 AM Varun Thacker <va...@vthacker.in> wrote:
>>
>>> Today the Modify Collection supports the following properties to be
>>> modified
>>>
>>>    1. maxShardsPerNode
>>>    2. rule
>>>    3. snitch
>>>    4. policy
>>>    5. collection.configName
>>>    6. autoAddReplicas
>>>    7. replicationFactor
>>>
>>> 1-4 seems something we should get rid of because we have the AutoScaling
>>> Policy framework?
>>>
>>> 5> Can anyone point out the use-case for this?
>>>
>>> 6> autoAddReplicas can be changed as a clusterprop and modify-collection
>>> API ? Hmm. Which one is supposed to win?
>>>
>>> 7> We need to allow a user to change replicationFactor. But how does
>>> this help? We have nrtReplicas / pullReplicas / tlogReplicas so changing
>>> this sounds just confusing? Or allow changing all replica types ?
>>>
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
>

-- 
Regards,
Shalin Shekhar Mangar.

Re: Do we need the MODIFYCOLLECTION Api?

Posted by Varun Thacker <va...@vthacker.in>.
On Fri, Jun 15, 2018 at 2:44 PM, David Smiley <da...@gmail.com>
wrote:

> +1 to get rid of #1, #2, #3, #7.
>
> Maybe I'm mistaken but I thought "policy" was a part of the auto scaling
> framework?
>

Yeah. And
http://lucene.apache.org/solr/guide/solrcloud-autoscaling-api.html#create-and-modify-cluster-policies
seems like the way to modify it.  So I wonder why should modifycollection
support it?
Maybe Noble , AB or Shalin could confirm?


> Maybe the capability for autoAddReplicas should be considered an aspect of
> the auto scaling framework instead of a collection setting, and thus we
> could remove it here?
>

Yeah I'd love for that to happen. It's even tied to triggers etc so seems
like it should be enabled/disabled via the autoscaling API

>
> I think the ability to modify collection.configName seems useful albeit
> rare to use in practice.  Perhaps you want to try out a bunch of changes
> and want to easily roll back.  You could create a config with those
> modifications, try it out, and if you don't like the results then point
> your config back to the original.  Although In practice it may not always
> be possible to just switch configs since a reindex may be required.
>

Right and then basically we are giving a way for users to shoot themselves
in the foot :)

>
>
> On Fri, Jun 15, 2018 at 7:22 AM Varun Thacker <va...@vthacker.in> wrote:
>
>> Today the Modify Collection supports the following properties to be
>> modified
>>
>>    1. maxShardsPerNode
>>    2. rule
>>    3. snitch
>>    4. policy
>>    5. collection.configName
>>    6. autoAddReplicas
>>    7. replicationFactor
>>
>> 1-4 seems something we should get rid of because we have the AutoScaling
>> Policy framework?
>>
>> 5> Can anyone point out the use-case for this?
>>
>> 6> autoAddReplicas can be changed as a clusterprop and modify-collection
>> API ? Hmm. Which one is supposed to win?
>>
>> 7> We need to allow a user to change replicationFactor. But how does this
>> help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this
>> sounds just confusing? Or allow changing all replica types ?
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.
> solrenterprisesearchserver.com
>

Re: Do we need the MODIFYCOLLECTION Api?

Posted by David Smiley <da...@gmail.com>.
+1 to get rid of #1, #2, #3, #7.

Maybe I'm mistaken but I thought "policy" was a part of the auto scaling
framework?

Maybe the capability for autoAddReplicas should be considered an aspect of
the auto scaling framework instead of a collection setting, and thus we
could remove it here?

I think the ability to modify collection.configName seems useful albeit
rare to use in practice.  Perhaps you want to try out a bunch of changes
and want to easily roll back.  You could create a config with those
modifications, try it out, and if you don't like the results then point
your config back to the original.  Although In practice it may not always
be possible to just switch configs since a reindex may be required.

On Fri, Jun 15, 2018 at 7:22 AM Varun Thacker <va...@vthacker.in> wrote:

> Today the Modify Collection supports the following properties to be
> modified
>
>    1. maxShardsPerNode
>    2. rule
>    3. snitch
>    4. policy
>    5. collection.configName
>    6. autoAddReplicas
>    7. replicationFactor
>
> 1-4 seems something we should get rid of because we have the AutoScaling
> Policy framework?
>
> 5> Can anyone point out the use-case for this?
>
> 6> autoAddReplicas can be changed as a clusterprop and modify-collection
> API ? Hmm. Which one is supposed to win?
>
> 7> We need to allow a user to change replicationFactor. But how does this
> help? We have nrtReplicas / pullReplicas / tlogReplicas so changing this
> sounds just confusing? Or allow changing all replica types ?
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com