You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by Will White <wi...@gmail.com> on 2023/05/23 05:52:11 UTC

Solr collection naming convention (/how to find historical decision making)

Hi all,

There's an inconsistency in how Solr validates collection names between
creating a collection through the Collection API
(org.apache.solr.handler.admin.api.CreateCollectionAPI.java#L405, via the
SolrIdentifierValidator) and through the Package Manager
(org.apache.solr.packagemanager.PackageUtils.java#L271 with an inline
regex).

I'd like to update the Package Manager to use the more expansive
identifierPattern (which allows for collections containing a '.' character,
as our collections use semantic versioning as a naming convention), but I
couldn't find any concrete details on what the supported naming convention
was for Solr 9 nor any relevant discussion in the last 24 months in the
email archive (lists.apache.org/list.html).

Question 1: Is there anywhere this information exists (so I can do proper
due diligence in future), and/or is a search through Slack/email archive
generally sufficient to open a JIRA ticket?

Question 2: Should a minor change like this be discussed in email or
against an open PR? I'm still getting used to the Apache ways of working,
apologies if I've missed the point on the email approach!

Cheers,

Will

Re: Solr collection naming convention (/how to find historical decision making)

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.
+1 and thanks to will for bringing this up. Single place
(SolrIdentifierValidator) is best, we should've used it in package manager
(i don't remember seeing something like that during the dev work of the
package manager)

On Fri, 26 May 2023 at 23:10, Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> most likely oversight.. not intentional.
> (sorry for terse replies, texting from a hospital amidst an emergency)
>
> On Fri, 26 May 2023 at 20:28, Jason Gerlowski <ge...@gmail.com>
> wrote:
>
>> Hi Will,
>>
>> I think the last thorough discussion around what names should be
>> allowed happened in SOLR-8110 here:
>> https://issues.apache.org/jira/browse/SOLR-8110.
>>
>> A few other relevant tickets are: SOLR-8642, SOLR-8725, and 8677, but
>> these mostly lean on the regex arrived at by SOLR-8110 (which allows
>> '.').
>>
>> I agree with Gus that we should tread carefully here, but IMO
>> "widening" changes like what Will is suggesting should be do-able.  As
>> long as there's not a particular reason that Package Manager code went
>> with a more restrictive regex, I think it makes sense to bring it into
>> line with the behavior in SolrIdentifierValidator.
>>
>> @ishan, or @noble - I know you guys did most of the work on the
>> package manager.  Did you have a particular reason for disallowing '.'
>> in collection names there?  (Maybe something in the package manager
>> relies on '.' as a delimiter?). Or was it an oversight?
>>
>> Best,
>>
>> Jason
>>
>>
>> On Tue, May 23, 2023 at 9:06 AM Gus Heck <gu...@gmail.com> wrote:
>> >
>> > Collection naming discussions that I recall were probably more than 24
>> > months ago. One good strategy is git blame in the relevant code. That
>> > should lead you to a commit hash and a commit message. The message would
>> > mention the Jira ticket. The jira ticket would hopefully have
>> discussion,
>> > sometimes (rarely) with links to mail archives.
>> >
>> > Note however changing anything about collection naming rules is not
>> small,
>> > and discussing it here (with pointers to past discussions/tickets) is
>> > certainly good.
>> >
>> > Once you have a patch and feel you have the background researched and a
>> > list discussion supporting your change, opening a Jira ticket and
>> creating
>> > a PR are the way to go.
>> >
>> > -Gus
>> >
>> > On Tue, May 23, 2023 at 2:38 AM Will White <wi...@gmail.com>
>> wrote:
>> >
>> > > Hi all,
>> > >
>> > > There's an inconsistency in how Solr validates collection names
>> between
>> > > creating a collection through the Collection API
>> > > (org.apache.solr.handler.admin.api.CreateCollectionAPI.java#L405, via
>> the
>> > > SolrIdentifierValidator) and through the Package Manager
>> > > (org.apache.solr.packagemanager.PackageUtils.java#L271 with an inline
>> > > regex).
>> > >
>> > > I'd like to update the Package Manager to use the more expansive
>> > > identifierPattern (which allows for collections containing a '.'
>> character,
>> > > as our collections use semantic versioning as a naming convention),
>> but I
>> > > couldn't find any concrete details on what the supported naming
>> convention
>> > > was for Solr 9 nor any relevant discussion in the last 24 months in
>> the
>> > > email archive (lists.apache.org/list.html).
>> > >
>> > > Question 1: Is there anywhere this information exists (so I can do
>> proper
>> > > due diligence in future), and/or is a search through Slack/email
>> archive
>> > > generally sufficient to open a JIRA ticket?
>> > >
>> > > Question 2: Should a minor change like this be discussed in email or
>> > > against an open PR? I'm still getting used to the Apache ways of
>> working,
>> > > apologies if I've missed the point on the email approach!
>> > >
>> > > Cheers,
>> > >
>> > > Will
>> > >
>> >
>> >
>> > --
>> > http://www.needhamsoftware.com (work)
>> > http://www.the111shift.com (play)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
>> For additional commands, e-mail: dev-help@solr.apache.org
>>
>>

Re: Solr collection naming convention (/how to find historical decision making)

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.
most likely oversight.. not intentional.
(sorry for terse replies, texting from a hospital amidst an emergency)

On Fri, 26 May 2023 at 20:28, Jason Gerlowski <ge...@gmail.com> wrote:

> Hi Will,
>
> I think the last thorough discussion around what names should be
> allowed happened in SOLR-8110 here:
> https://issues.apache.org/jira/browse/SOLR-8110.
>
> A few other relevant tickets are: SOLR-8642, SOLR-8725, and 8677, but
> these mostly lean on the regex arrived at by SOLR-8110 (which allows
> '.').
>
> I agree with Gus that we should tread carefully here, but IMO
> "widening" changes like what Will is suggesting should be do-able.  As
> long as there's not a particular reason that Package Manager code went
> with a more restrictive regex, I think it makes sense to bring it into
> line with the behavior in SolrIdentifierValidator.
>
> @ishan, or @noble - I know you guys did most of the work on the
> package manager.  Did you have a particular reason for disallowing '.'
> in collection names there?  (Maybe something in the package manager
> relies on '.' as a delimiter?). Or was it an oversight?
>
> Best,
>
> Jason
>
>
> On Tue, May 23, 2023 at 9:06 AM Gus Heck <gu...@gmail.com> wrote:
> >
> > Collection naming discussions that I recall were probably more than 24
> > months ago. One good strategy is git blame in the relevant code. That
> > should lead you to a commit hash and a commit message. The message would
> > mention the Jira ticket. The jira ticket would hopefully have discussion,
> > sometimes (rarely) with links to mail archives.
> >
> > Note however changing anything about collection naming rules is not
> small,
> > and discussing it here (with pointers to past discussions/tickets) is
> > certainly good.
> >
> > Once you have a patch and feel you have the background researched and a
> > list discussion supporting your change, opening a Jira ticket and
> creating
> > a PR are the way to go.
> >
> > -Gus
> >
> > On Tue, May 23, 2023 at 2:38 AM Will White <wi...@gmail.com>
> wrote:
> >
> > > Hi all,
> > >
> > > There's an inconsistency in how Solr validates collection names between
> > > creating a collection through the Collection API
> > > (org.apache.solr.handler.admin.api.CreateCollectionAPI.java#L405, via
> the
> > > SolrIdentifierValidator) and through the Package Manager
> > > (org.apache.solr.packagemanager.PackageUtils.java#L271 with an inline
> > > regex).
> > >
> > > I'd like to update the Package Manager to use the more expansive
> > > identifierPattern (which allows for collections containing a '.'
> character,
> > > as our collections use semantic versioning as a naming convention),
> but I
> > > couldn't find any concrete details on what the supported naming
> convention
> > > was for Solr 9 nor any relevant discussion in the last 24 months in the
> > > email archive (lists.apache.org/list.html).
> > >
> > > Question 1: Is there anywhere this information exists (so I can do
> proper
> > > due diligence in future), and/or is a search through Slack/email
> archive
> > > generally sufficient to open a JIRA ticket?
> > >
> > > Question 2: Should a minor change like this be discussed in email or
> > > against an open PR? I'm still getting used to the Apache ways of
> working,
> > > apologies if I've missed the point on the email approach!
> > >
> > > Cheers,
> > >
> > > Will
> > >
> >
> >
> > --
> > http://www.needhamsoftware.com (work)
> > http://www.the111shift.com (play)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
> For additional commands, e-mail: dev-help@solr.apache.org
>
>

Re: Solr collection naming convention (/how to find historical decision making)

Posted by Jason Gerlowski <ge...@gmail.com>.
Hi Will,

I think the last thorough discussion around what names should be
allowed happened in SOLR-8110 here:
https://issues.apache.org/jira/browse/SOLR-8110.

A few other relevant tickets are: SOLR-8642, SOLR-8725, and 8677, but
these mostly lean on the regex arrived at by SOLR-8110 (which allows
'.').

I agree with Gus that we should tread carefully here, but IMO
"widening" changes like what Will is suggesting should be do-able.  As
long as there's not a particular reason that Package Manager code went
with a more restrictive regex, I think it makes sense to bring it into
line with the behavior in SolrIdentifierValidator.

@ishan, or @noble - I know you guys did most of the work on the
package manager.  Did you have a particular reason for disallowing '.'
in collection names there?  (Maybe something in the package manager
relies on '.' as a delimiter?). Or was it an oversight?

Best,

Jason


On Tue, May 23, 2023 at 9:06 AM Gus Heck <gu...@gmail.com> wrote:
>
> Collection naming discussions that I recall were probably more than 24
> months ago. One good strategy is git blame in the relevant code. That
> should lead you to a commit hash and a commit message. The message would
> mention the Jira ticket. The jira ticket would hopefully have discussion,
> sometimes (rarely) with links to mail archives.
>
> Note however changing anything about collection naming rules is not small,
> and discussing it here (with pointers to past discussions/tickets) is
> certainly good.
>
> Once you have a patch and feel you have the background researched and a
> list discussion supporting your change, opening a Jira ticket and creating
> a PR are the way to go.
>
> -Gus
>
> On Tue, May 23, 2023 at 2:38 AM Will White <wi...@gmail.com> wrote:
>
> > Hi all,
> >
> > There's an inconsistency in how Solr validates collection names between
> > creating a collection through the Collection API
> > (org.apache.solr.handler.admin.api.CreateCollectionAPI.java#L405, via the
> > SolrIdentifierValidator) and through the Package Manager
> > (org.apache.solr.packagemanager.PackageUtils.java#L271 with an inline
> > regex).
> >
> > I'd like to update the Package Manager to use the more expansive
> > identifierPattern (which allows for collections containing a '.' character,
> > as our collections use semantic versioning as a naming convention), but I
> > couldn't find any concrete details on what the supported naming convention
> > was for Solr 9 nor any relevant discussion in the last 24 months in the
> > email archive (lists.apache.org/list.html).
> >
> > Question 1: Is there anywhere this information exists (so I can do proper
> > due diligence in future), and/or is a search through Slack/email archive
> > generally sufficient to open a JIRA ticket?
> >
> > Question 2: Should a minor change like this be discussed in email or
> > against an open PR? I'm still getting used to the Apache ways of working,
> > apologies if I've missed the point on the email approach!
> >
> > Cheers,
> >
> > Will
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@solr.apache.org
For additional commands, e-mail: dev-help@solr.apache.org


Re: Solr collection naming convention (/how to find historical decision making)

Posted by Gus Heck <gu...@gmail.com>.
Collection naming discussions that I recall were probably more than 24
months ago. One good strategy is git blame in the relevant code. That
should lead you to a commit hash and a commit message. The message would
mention the Jira ticket. The jira ticket would hopefully have discussion,
sometimes (rarely) with links to mail archives.

Note however changing anything about collection naming rules is not small,
and discussing it here (with pointers to past discussions/tickets) is
certainly good.

Once you have a patch and feel you have the background researched and a
list discussion supporting your change, opening a Jira ticket and creating
a PR are the way to go.

-Gus

On Tue, May 23, 2023 at 2:38 AM Will White <wi...@gmail.com> wrote:

> Hi all,
>
> There's an inconsistency in how Solr validates collection names between
> creating a collection through the Collection API
> (org.apache.solr.handler.admin.api.CreateCollectionAPI.java#L405, via the
> SolrIdentifierValidator) and through the Package Manager
> (org.apache.solr.packagemanager.PackageUtils.java#L271 with an inline
> regex).
>
> I'd like to update the Package Manager to use the more expansive
> identifierPattern (which allows for collections containing a '.' character,
> as our collections use semantic versioning as a naming convention), but I
> couldn't find any concrete details on what the supported naming convention
> was for Solr 9 nor any relevant discussion in the last 24 months in the
> email archive (lists.apache.org/list.html).
>
> Question 1: Is there anywhere this information exists (so I can do proper
> due diligence in future), and/or is a search through Slack/email archive
> generally sufficient to open a JIRA ticket?
>
> Question 2: Should a minor change like this be discussed in email or
> against an open PR? I'm still getting used to the Apache ways of working,
> apologies if I've missed the point on the email approach!
>
> Cheers,
>
> Will
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)