You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by David Smiley <da...@gmail.com> on 2018/12/28 16:36:06 UTC

Feature: Solr implicitly defined field types?

While working on https://issues.apache.org/jira/browse/SOLR-12768 it
occurred to me that it would be nice if Solr had implicitly defined field
types.  This would allow you to define a field in your schema that refers
to a type that is *not* also in your schema -- at least not explicitly
(need not explicitly be put in your schema.xml if classic, or need not be
passed to schema manipulation API if you use that).  The idea would be that
these types would be Solr platform provided field types that need not be
defined by you.

There are multiple ways this loose idea might be conceived / imagined into
a concrete proposal.

(A) The main idea I'm kicking around right now is that Solr would _not_
throw an error at the moment of reading your field definition that it
doesn't see your type... instead it would see it's a platform type (via
some built-in hard-coded registry) and then register that type on the fly.
So if you were to read the schema then you'd see it.  In this way, it's
kind of a shortcut.  Platform field types that you don't actually refer to
will never end up being put into your schema.

(B) A schema could pre-initialize with the platform/implicit types.  This
is the simplest idea but I don't like it because you may not even need some
of these types.  I'm not going to go down this path now but wanted to
mention it.

I'm exploring (A) right now... I'm hoping to do this for at least a
"_nest_path_"  field in support of nested documents in 8.0, but conceivably
the idea would be expanded to lots of things in our base schema right now
(int, str, etc.)
-- 
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by David Smiley <da...@gmail.com>.

I'm glad to hear at least *somebody* other than me likes the idea :-)

I started some manual experimentation with it.  After I got past one little
bug, sure enough it worked and would show up in the admin screen.  It
showed up because of the /admin/luke handler interacting with IndexSchema,
*not* as I said due to using the HTTP schema API since it doesn't use
that.  Either way, that works.  And after some schema manipulation
(performed easily via Solr's admin screen which has a form to add field
types), I saw the schema get persisted which, as I expected, displayed the
field type definition there.

But then I got to wonder if that's actually a good thing, and I'm now
thinking probably not.  (We could have implicit types with or without this
behavior.)  Why not?
  (A) This field type was serialized incorrectly; there were no analyzers
when there should have been some.  This has little to do with implicit
field types; it's due to assumptions in our schema / field type
serialization that simply give up unless it sees a TokenizerChain subclass
of Analyzer, whereas in my code I chose to use a CustomAnalzyer Lucene
utility in-code.  I could "fix" this by using TokenizerChain instead or
change the serialization code, but still, it ought to be tested since it's
a sneaky bug (won't throw an error).  Or alternatively never persist
implicit field types; though _that_ would need to be tested.
  (B) It can sometimes thwart future changes we may choose for a type's
definition.  Since it shows up, it's somewhat locked in at the time the
schema is manipulated with the schema API (with whatever the impl is
considering Solr/luceneMatchVersion was at that time).  After that point,
if the user were to keep the config, then delete all data, then update
luceneMatchVersion in solrconfig, then index again, it would still have the
same field type definition as it did prior because the field type is
explicitly defined at this point.  This isn't a huge deal since apps
deploy/publish their configuration in different ways, and most popular ways
would be immune to this (to be affected, the app must manipulate the schema
with the API).  Even apps that do manipulate the schema with the API might
do major revision upgrades in a from-scratch way instead of using the same
config in-place.  And it's a hypothetical scenario of a future point in
time where we eventually change our mind on what some particular implicit
field type ought to do.

To have implicit field types not persisted, the simplest impl would
probably simply never save back such implicit field types into the
IndexSchema's registry of field types, and thus it won't be iterated to be
persisted.  'course it'd need a test.

~ David
-- 
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by Jan Høydahl <ja...@cominvent.com>.

I'd really like to see these implicit types.
Whether they are defined in code, in a implicit-types.xml in webapp is just implementation. Also, a <primitiveFieldTypes> would just be necessary if there is ever a need to take more explicit control, but if the right defaults are established, I see only positive effects from shipping with implicit int, long, date, bool, float, double ++ Perhaps you can sum up your final suggestion and if you don't get any vetos then go ahead :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 7. jan. 2019 kl. 14:40 skrev David Smiley <da...@gmail.com>:
> 
> Hmmm.  My opinion is neutral on a <primitiveFieldTypes>.  It would have more implementation & documentation complexity to it IMO than an implicit primitive type as I've been pushing.  But still; it's alright.
> 
> Since I can't seem to convince anyone on the merits of implicit field types, I will back out this part of SOLR-12768.  Instead I suppose I will add a new field type for that particular issue's need.
> 
> ~ David
> 
> On Sat, Jan 5, 2019 at 5:29 PM Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
> In some other thread or Jira that I cannot find now I proposed a new tag in schema to make this explicit. So instead of 50 tags defining all primitive types and dynamicFields, we could have one tag:
> 
> <primitiveFiledTypes enabled=«true» dynamicMappings=«true» lazy=«true»/>
> 
> This is just a draft idea. This would give a way to disable these implicit primitive types if they are made default on. A lazy mode could delay adding to scheme until first use if that saves any resources.
> 
> Jan
> 
> 5. jan. 2019 kl. 21:29 skrev David Smiley <david.w.smiley@gmail.com <ma...@gmail.com>>:
> 
>> You would see these types in the HTTP schema API, and thus you would also end up seeing it on the admin schema screen (which uses that API).
>> It would not be saved back to the XML file unless you're further manipulating your schema via the HTTP schema API (managed schema).  I ought to verify all this manually.  As I'm sure you already know, comments / formatting do not survive that round-trip.
>> 
>> I'm a convention over configuration believer, and thus I prefer CoC over explicitness/verbosity.  I suppose all CoC arguments could be shot down with generic statements of perceived maintenance/understandability benefits.  Shrug; yet surely there's a case for CoC in some cases?  Let me ask you this: why is it okay for databases to not have definitions of what primitives types are yet in Solr you would rather it be explicit always?  That analogy is the crux of it.  I'm not arguing for "text_general" or other text analyzed types to be implicits; who knows where to draw the line there.  I thought primitives would be a slam dunk.
>> 
>> On Sat, Jan 5, 2019 at 3:07 PM Gus Heck <gus.heck@gmail.com <ma...@gmail.com>gus.heck@gmail.com <ma...@gmail.com>> wrote:
> 
>> To my mind the only types (or fields) that should get built-in are the ones that would break solr if they were changed. Anything else should show up in the config file. Your _nest_path_ probably falls into the "it would break solr if it changed" category. 
>> 
>> I notice in your initial post you say "So if you were to read the schema then you'd see it." if that implies that there would be a way to fetch the final_efective_schema.xml file from the server via the admin ui that might make me feel better about this. Such a file should essentially be the schema.xml (or managed_schema.xml) with a "implicit generated types - do not edit" section. Comments etc should be preserved from the original, and possibly a provenance comment (which fields rely on the implicit addition so it's easy to spot an accidental usage of the implicit type) with each implicitly added type. 
>> 
>> Simplicity of code and code maintenance is of course excellent. Simplicity for the person trying to troubleshoot a system they've just been hired to fix/improve is also excellent. I'd prefer to SEE what's going on than have to remember what's going on modulo some version matrix in my head. Hard enough remembering which admin commands are available on version X...
>> 
>> 
>> On Fri, Jan 4, 2019 at 10:52 PM David Smiley <david.w.smiley@gmail.com <ma...@gmail.com>> wrote:
>> On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <apache@elyograg.org <ma...@elyograg.org>> wrote:
>> Looking at what came before, my preference would have been implicitly 
>> defined default types -- things like int, string, etc, defined in code.  
>> The only problem with that comes at Solr upgrade time ... what if we 
>> decide for a later version (even if it's limited to a major release) 
>> that IntPointField shouldn't be the implicit class for "int"?  Someone 
>> who upgrades an index using that implicit type to the new version will 
>> find that Solr will no longer work.  Which makes the idea unworkable.
>> 
>> I addressed this earlier -- search for "luceneMatchVersion" which is key.
>> 
>> RE a file based system schema (what Alexandre suggested)... that sounds workable but a more complex idea that would take more code & documentation -- at least relative to the very simple idea of some built-ins in the code (my proposal).  See SOLR-12768.patch <https://issues.apache.org/jira/secure/attachment/12953284/SOLR-12768.patch>  changes to IndexSchema. 
>> -- 
>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley <http://linkedin.com/in/davidwsmiley> | Book: http://www.solrenterprisesearchserver.com <http://www.solrenterprisesearchserver.com/>
>> 
>> -- 
>> http://www.the111shift.com <http://www.the111shift.com/>-- 
>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley <http://linkedin.com/in/davidwsmiley> | Book: http://www.solrenterprisesearchserver.com <http://www.solrenterprisesearchserver.com/>-- 
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley <http://linkedin.com/in/davidwsmiley> | Book: http://www.solrenterprisesearchserver.com <http://www.solrenterprisesearchserver.com/>

Re: Feature: Solr implicitly defined field types?

Posted by David Smiley <da...@gmail.com>.

Hmmm.  My opinion is neutral on a <primitiveFieldTypes>.  It would have
more implementation & documentation complexity to it IMO than an implicit
primitive type as I've been pushing.  But still; it's alright.

Since I can't seem to convince anyone on the merits of implicit field
types, I will back out this part of SOLR-12768.  Instead I suppose I will
add a new field type for that particular issue's need.

~ David

On Sat, Jan 5, 2019 at 5:29 PM Jan Høydahl <ja...@cominvent.com> wrote:

> In some other thread or Jira that I cannot find now I proposed a new tag
> in schema to make this explicit. So instead of 50 tags defining all
> primitive types and dynamicFields, we could have one tag:
>
> <primitiveFiledTypes enabled=«true» dynamicMappings=«true» lazy=«true»/>
>
> This is just a draft idea. This would give a way to disable these implicit
> primitive types if they are made default on. A lazy mode could delay adding
> to scheme until first use if that saves any resources.
>
> Jan
>
> 5. jan. 2019 kl. 21:29 skrev David Smiley <da...@gmail.com>:
>
> You would see these types in the HTTP schema API, and thus you would also
> end up seeing it on the admin schema screen (which uses that API).
> It would not be saved back to the XML file unless you're further
> manipulating your schema via the HTTP schema API (managed schema).  I ought
> to verify all this manually.  As I'm sure you already know, comments /
> formatting do not survive that round-trip.
>
> I'm a convention over configuration believer, and thus I prefer CoC over
> explicitness/verbosity.  I suppose all CoC arguments could be shot down
> with generic statements of perceived maintenance/understandability
> benefits.  Shrug; yet surely there's a case for CoC in some cases?  Let me
> ask you this: why is it okay for databases to not have definitions of what
> primitives types are yet in Solr you would rather it be explicit always?
> That analogy is the crux of it.  I'm not arguing for "text_general" or
> other text analyzed types to be implicits; who knows where to draw the line
> there.  I thought primitives would be a slam dunk.
>
> On Sat, Jan 5, 2019 at 3:07 PM Gus Heck <gus.heck@gmail.com
> gus.heck@gmail.com> wrote:
>
> To my mind the only types (or fields) that should get built-in are the
>> ones that would break solr if they were changed. Anything else should show
>> up in the config file. Your _nest_path_ probably falls into the "it would
>> break solr if it changed" category.
>>
>> I notice in your initial post you say "So if you were to read the schema
>> then you'd see it." if that implies that there would be a way to fetch the
>> final_efective_schema.xml file from the server via the admin ui that might
>> make me feel better about this. Such a file should essentially be the
>> schema.xml (or managed_schema.xml) with a "implicit generated types - do
>> not edit" section. Comments etc should be preserved from the original, and
>> possibly a provenance comment (which fields rely on the implicit addition
>> so it's easy to spot an accidental usage of the implicit type) with each
>> implicitly added type.
>>
>> Simplicity of code and code maintenance is of course excellent.
>> Simplicity for the person trying to troubleshoot a system they've just been
>> hired to fix/improve is also excellent. I'd prefer to SEE what's going on
>> than have to remember what's going on modulo some version matrix in my
>> head. Hard enough remembering which admin commands are available on version
>> X...
>>
>>
>> On Fri, Jan 4, 2019 at 10:52 PM David Smiley <da...@gmail.com>
>> wrote:
>>
>>> On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <ap...@elyograg.org>
>>> wrote:
>>>
>>>> Looking at what came before, my preference would have been implicitly
>>>> defined default types -- things like int, string, etc, defined in
>>>> code.
>>>> The only problem with that comes at Solr upgrade time ... what if we
>>>> decide for a later version (even if it's limited to a major release)
>>>> that IntPointField shouldn't be the implicit class for "int"?  Someone
>>>> who upgrades an index using that implicit type to the new version will
>>>> find that Solr will no longer work.  Which makes the idea unworkable.
>>>>
>>>
>>> I addressed this earlier -- search for "luceneMatchVersion" which is
>>> key.
>>>
>>> RE a file based system schema (what Alexandre suggested)... that sounds
>>> workable but a more complex idea that would take more code & documentation
>>> -- at least relative to the very simple idea of some built-ins in the code
>>> (my proposal).  See SOLR-12768.patch
>>> <https://issues.apache.org/jira/secure/attachment/12953284/SOLR-12768.patch>
>>> changes to IndexSchema.
>>> --
>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> http://www.solrenterprisesearchserver.com
>>>
>>
>>
>> --
>> http://www.the111shift.com
>>
> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
> --
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by Jan Høydahl <ja...@cominvent.com>.

In some other thread or Jira that I cannot find now I proposed a new tag in schema to make this explicit. So instead of 50 tags defining all primitive types and dynamicFields, we could have one tag:

<primitiveFiledTypes enabled=«true» dynamicMappings=«true» lazy=«true»/>

This is just a draft idea. This would give a way to disable these implicit primitive types if they are made default on. A lazy mode could delay adding to scheme until first use if that saves any resources.

Jan

> 5. jan. 2019 kl. 21:29 skrev David Smiley <da...@gmail.com>:
> 
> You would see these types in the HTTP schema API, and thus you would also end up seeing it on the admin schema screen (which uses that API).
> It would not be saved back to the XML file unless you're further manipulating your schema via the HTTP schema API (managed schema).  I ought to verify all this manually.  As I'm sure you already know, comments / formatting do not survive that round-trip.
> 
> I'm a convention over configuration believer, and thus I prefer CoC over explicitness/verbosity.  I suppose all CoC arguments could be shot down with generic statements of perceived maintenance/understandability benefits.  Shrug; yet surely there's a case for CoC in some cases?  Let me ask you this: why is it okay for databases to not have definitions of what primitives types are yet in Solr you would rather it be explicit always?  That analogy is the crux of it.  I'm not arguing for "text_general" or other text analyzed types to be implicits; who knows where to draw the line there.  I thought primitives would be a slam dunk.
> 
>> On Sat, Jan 5, 2019 at 3:07 PM Gus Heck <gu...@gmail.com> wrote:
>> To my mind the only types (or fields) that should get built-in are the ones that would break solr if they were changed. Anything else should show up in the config file. Your _nest_path_ probably falls into the "it would break solr if it changed" category. 
>> 
>> I notice in your initial post you say "So if you were to read the schema then you'd see it." if that implies that there would be a way to fetch the final_efective_schema.xml file from the server via the admin ui that might make me feel better about this. Such a file should essentially be the schema.xml (or managed_schema.xml) with a "implicit generated types - do not edit" section. Comments etc should be preserved from the original, and possibly a provenance comment (which fields rely on the implicit addition so it's easy to spot an accidental usage of the implicit type) with each implicitly added type. 
>> 
>> Simplicity of code and code maintenance is of course excellent. Simplicity for the person trying to troubleshoot a system they've just been hired to fix/improve is also excellent. I'd prefer to SEE what's going on than have to remember what's going on modulo some version matrix in my head. Hard enough remembering which admin commands are available on version X...
>> 
>> 
>>> On Fri, Jan 4, 2019 at 10:52 PM David Smiley <da...@gmail.com> wrote:
>>>> On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <ap...@elyograg.org> wrote:
>>>> Looking at what came before, my preference would have been implicitly 
>>>> defined default types -- things like int, string, etc, defined in code.  
>>>> The only problem with that comes at Solr upgrade time ... what if we 
>>>> decide for a later version (even if it's limited to a major release) 
>>>> that IntPointField shouldn't be the implicit class for "int"?  Someone 
>>>> who upgrades an index using that implicit type to the new version will 
>>>> find that Solr will no longer work.  Which makes the idea unworkable.
>>> 
>>> I addressed this earlier -- search for "luceneMatchVersion" which is key.
>>> 
>>> RE a file based system schema (what Alexandre suggested)... that sounds workable but a more complex idea that would take more code & documentation -- at least relative to the very simple idea of some built-ins in the code (my proposal).  See SOLR-12768.patch  changes to IndexSchema. 
>>> -- 
>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
>> 
>> 
>> -- 
>> http://www.the111shift.com
> -- 
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by David Smiley <da...@gmail.com>.

You would see these types in the HTTP schema API, and thus you would also
end up seeing it on the admin schema screen (which uses that API).
It would not be saved back to the XML file unless you're further
manipulating your schema via the HTTP schema API (managed schema).  I ought
to verify all this manually.  As I'm sure you already know, comments /
formatting do not survive that round-trip.

I'm a convention over configuration believer, and thus I prefer CoC over
explicitness/verbosity.  I suppose all CoC arguments could be shot down
with generic statements of perceived maintenance/understandability
benefits.  Shrug; yet surely there's a case for CoC in some cases?  Let me
ask you this: why is it okay for databases to not have definitions of what
primitives types are yet in Solr you would rather it be explicit always?
That analogy is the crux of it.  I'm not arguing for "text_general" or
other text analyzed types to be implicits; who knows where to draw the line
there.  I thought primitives would be a slam dunk.

On Sat, Jan 5, 2019 at 3:07 PM Gus Heck <gu...@gmail.com> wrote:

> To my mind the only types (or fields) that should get built-in are the
> ones that would break solr if they were changed. Anything else should show
> up in the config file. Your _nest_path_ probably falls into the "it would
> break solr if it changed" category.
>
> I notice in your initial post you say "So if you were to read the schema
> then you'd see it." if that implies that there would be a way to fetch the
> final_efective_schema.xml file from the server via the admin ui that might
> make me feel better about this. Such a file should essentially be the
> schema.xml (or managed_schema.xml) with a "implicit generated types - do
> not edit" section. Comments etc should be preserved from the original, and
> possibly a provenance comment (which fields rely on the implicit addition
> so it's easy to spot an accidental usage of the implicit type) with each
> implicitly added type.
>
> Simplicity of code and code maintenance is of course excellent. Simplicity
> for the person trying to troubleshoot a system they've just been hired to
> fix/improve is also excellent. I'd prefer to SEE what's going on than have
> to remember what's going on modulo some version matrix in my head. Hard
> enough remembering which admin commands are available on version X...
>
>
> On Fri, Jan 4, 2019 at 10:52 PM David Smiley <da...@gmail.com>
> wrote:
>
>> On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <ap...@elyograg.org> wrote:
>>
>>> Looking at what came before, my preference would have been implicitly
>>> defined default types -- things like int, string, etc, defined in code.
>>> The only problem with that comes at Solr upgrade time ... what if we
>>> decide for a later version (even if it's limited to a major release)
>>> that IntPointField shouldn't be the implicit class for "int"?  Someone
>>> who upgrades an index using that implicit type to the new version will
>>> find that Solr will no longer work.  Which makes the idea unworkable.
>>>
>>
>> I addressed this earlier -- search for "luceneMatchVersion" which is key.
>>
>> RE a file based system schema (what Alexandre suggested)... that sounds
>> workable but a more complex idea that would take more code & documentation
>> -- at least relative to the very simple idea of some built-ins in the code
>> (my proposal).  See SOLR-12768.patch
>> <https://issues.apache.org/jira/secure/attachment/12953284/SOLR-12768.patch>
>> changes to IndexSchema.
>> --
>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
>
> --
> http://www.the111shift.com
>
-- 
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by Gus Heck <gu...@gmail.com>.

To my mind the only types (or fields) that should get built-in are the ones
that would break solr if they were changed. Anything else should show up in
the config file. Your _nest_path_ probably falls into the "it would break
solr if it changed" category.

I notice in your initial post you say "So if you were to read the schema
then you'd see it." if that implies that there would be a way to fetch the
final_efective_schema.xml file from the server via the admin ui that might
make me feel better about this. Such a file should essentially be the
schema.xml (or managed_schema.xml) with a "implicit generated types - do
not edit" section. Comments etc should be preserved from the original, and
possibly a provenance comment (which fields rely on the implicit addition
so it's easy to spot an accidental usage of the implicit type) with each
implicitly added type.

Simplicity of code and code maintenance is of course excellent. Simplicity
for the person trying to troubleshoot a system they've just been hired to
fix/improve is also excellent. I'd prefer to SEE what's going on than have
to remember what's going on modulo some version matrix in my head. Hard
enough remembering which admin commands are available on version X...

On Fri, Jan 4, 2019 at 10:52 PM David Smiley <da...@gmail.com>
wrote:

> On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <ap...@elyograg.org> wrote:
>
>> Looking at what came before, my preference would have been implicitly
>> defined default types -- things like int, string, etc, defined in code.
>> The only problem with that comes at Solr upgrade time ... what if we
>> decide for a later version (even if it's limited to a major release)
>> that IntPointField shouldn't be the implicit class for "int"?  Someone
>> who upgrades an index using that implicit type to the new version will
>> find that Solr will no longer work.  Which makes the idea unworkable.
>>
>
> I addressed this earlier -- search for "luceneMatchVersion" which is key.
>
> RE a file based system schema (what Alexandre suggested)... that sounds
> workable but a more complex idea that would take more code & documentation
> -- at least relative to the very simple idea of some built-ins in the code
> (my proposal).  See SOLR-12768.patch
> <https://issues.apache.org/jira/secure/attachment/12953284/SOLR-12768.patch>
> changes to IndexSchema.
> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>

-- 
http://www.the111shift.com

Re: Feature: Solr implicitly defined field types?

Posted by David Smiley <da...@gmail.com>.

On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <ap...@elyograg.org> wrote:

> Looking at what came before, my preference would have been implicitly
> defined default types -- things like int, string, etc, defined in code.
> The only problem with that comes at Solr upgrade time ... what if we
> decide for a later version (even if it's limited to a major release)
> that IntPointField shouldn't be the implicit class for "int"?  Someone
> who upgrades an index using that implicit type to the new version will
> find that Solr will no longer work.  Which makes the idea unworkable.
>

I addressed this earlier -- search for "luceneMatchVersion" which is key.

RE a file based system schema (what Alexandre suggested)... that sounds
workable but a more complex idea that would take more code & documentation
-- at least relative to the very simple idea of some built-ins in the code
(my proposal).  See SOLR-12768.patch
<https://issues.apache.org/jira/secure/attachment/12953284/SOLR-12768.patch>
changes to IndexSchema.
-- 
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by Shawn Heisey <ap...@elyograg.org>.

I'm jumping into this conversation a little bit late.  Sorry for any 
problems that causes.

On 1/4/2019 9:52 AM, Alexandre Rafalovitch wrote:
> What about if a system schema was loaded at a startup implicitly.
> Then, if a new schema is loaded and type definition is missing, it is
> copied - at that time - into the specific schema. So, on the first
> rewrite those - and only those used - types will be written out.

Looking at what came before, my preference would have been implicitly 
defined default types -- things like int, string, etc, defined in code.  
The only problem with that comes at Solr upgrade time ... what if we 
decide for a later version (even if it's limited to a major release) 
that IntPointField shouldn't be the implicit class for "int"?  Someone 
who upgrades an index using that implicit type to the new version will 
find that Solr will no longer work.  Which makes the idea unworkable.

A file-based system schema where implicit types are explicitly defined 
is an interesting idea that I think would get around the problem 
described above.  We would need to decide exactly what can be defined in 
the system schema -- my initial bias would be to only allow types, not 
fields or other schema config, to be defined there.  Probably a good 
location for the system schema file would be the ZK chroot or the solr 
home, depending on whether the system is in cloud mode.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Feature: Solr implicitly defined field types?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

What about if a system schema was loaded at a startup implicitly.
Then, if a new schema is loaded and type definition is missing, it is
copied - at that time - into the specific schema. So, on the first
rewrite those - and only those used - types will be written out.

This allows to version the system types the same way as we version
normal schema. I agree with Gus that hidden configuration causes all
sorts of challenges.

And - for tooling purposes - there definitely needs to be a way to get
all used definitions, explicit and implicit, used and just available.
That also points towards something that already has self-describing
mechanism (like Schema API) available.

Regards,
   Alex.


On Fri, 4 Jan 2019 at 10:45, David Smiley <da...@gmail.com> wrote:
>
> I'm thinking this feature would be used conservatively -- and thus just primitive types that wouldn't have an interesting configuration to them, or for something you are really not expected to change (the nest path of nested docs).  So you wouldn't feel you had to go read the docs.  The schema might even have a comment to mention a list of implicit field types (a one-liner comma delimited list).
>
> On Fri, Jan 4, 2019 at 10:34 AM Gus Heck <gu...@gmail.com> wrote:
>>
>> I'm perhaps slightly conservative with respect to configuration, but I'm not fond of hidden configuration that I can't see. What I don't like is looking at a config file and not seeing the full story. That means i have to read the config and ALSO go read some part of the documentation that I've failed to memorize, and probably need to google to find to be fully aware of what's going on....  (and no I didn't like it when some standard stuff disappeared from solrconfig.xml a while back either). Small changes of course seem reasonable, but the further we drift into implicit things, especially if we get a collection of several implicit things described in various disparate parts of the manual the more cryptic the system becomes. That's my opinion, YMMV.
>>
>> -Gus
>>
>> On Thu, Jan 3, 2019 at 2:57 PM David Smiley <da...@gmail.com> wrote:
>>>
>>> Broadly, you refer to "locale" issues.  Solr's way of dealing with this today is with optional & configurable use of URPs.  The schema-less / data-driven mode has some of these enabled; you can see it in the solrconfig.xml including many date formats.  You can look into that for further info if you like.  The primitive field types are not locale sensitive.
>>>
>>> Update: It's looking like 8.0 will only employ this implicit field type mechanism for _nest_path_ which probably won't be in the default schema.  Assuming it isn't, then it'll only be documented in the context of this particular feature.  It'd be nice to see the scope of fields expanded and at that juncture it could/should be more broadly documented.  That can wait to people have energy to do it.
>>>
>>> On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <jo...@gmail.com> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> I now get the idea and yes this makes sense. It would require though some tutorial or best practices, eg overriding a platform data type may make not so much sense - it may confuse new developers in an existing project that know Solr, but then get a platform type that has not the default behavior.
>>>>
>>>> Could you deal with different languages in platform types? Eg for dates it does not seem a problem, because Solr expects only one specific type of date that needs to be somehow converted beforehand (maybe that conversion could be also part of a platform type), but decimals are different in some languages or Boolean values.
>>>>
>>>> Am 30.12.2018 um 07:01 schrieb David Smiley <da...@gmail.com>:
>>>>
>>>> Thanks for your thoughtful response Jörn!
>>>> ...
>>>> On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <jo...@gmail.com> wrote:
>>>>>
>>>>> I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me.
>>>>
>>>>
>>>> RE "the platform type could have been defined by somebody else":  I'm not imagining it'd be configurable, thus the "somebody else" is the Solr project/committers.
>>>>
>>>> Otherwise, I think I get your point, but perhaps I don't.  It's the same point for any use of some new feature of Solr.  If you use some new feature, you have to take care that all Solr instances you deploy your configuration to can handle that new feature.  That's a fairly generic point that would apply to just about anything in Solr.
>>>>
>>>>>
>>>>> A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.
>>>>
>>>>
>>>> Yes it's possible.  Though I think that point is apart from the feature I propose.  You're saying that you might want to use an "int" field and then one day realize you want some newer/better definition of what an "int" is (e.g. trie -> points).  Sure.  That's true wether the field type is explicit or implicit.  There's nothing stopping you from explicitly defining the field type if you want to; the names would not be reserved. If you want to stick with your current index running the new Solr version, then you would keep luceneMatchVersion what it was, which would effectively retain the interpretation of the implicit field types.
>>>>
>>>>>
>>>>> Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.
>>>>>
>>>>> Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc).
>>>>
>>>>
>>>> I'm imagining platform types would be basic primitive types (int, boolean, etc. and some special situations like in the issue I referenced).  They would not depend on contrib libs... though I could imagine one day an evolution of this in which a contrib could somehow auto-add implicit field types.
>>>>
>>>>>
>>>>> I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.
>>>>
>>>>
>>>> The notion of implicit field types is not a cross-config (cross-collection) thing.  Implicit field types are nothing more than built-in shortcuts.
>>>>
>>>> I recall one of my very early observations of Solr's schema was of surprise to see primitive types defined in the schema.  Consider in SQL DDL statements that refer to varchar and such.  Your DDL doesn't need to define what a varchar is!
>>>>
>>>> Happy New Year,
>>>> ~ David
>>>>
>>>>> Am 28.12.2018 um 17:36 schrieb David Smiley <da...@gmail.com>:
>>>>>
>>>>> While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.
>>>>>
>>>>> There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.
>>>>>
>>>>> (A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.
>>>>>
>>>>> (B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.
>>>>>
>>>>> I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
>>>>> --
>>>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
>>>>
>>>> --
>>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
>>>
>>> --
>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
>>
>>
>>
>> --
>> http://www.the111shift.com
>
> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Feature: Solr implicitly defined field types?

Posted by David Smiley <da...@gmail.com>.

I'm thinking this feature would be used conservatively -- and thus just
primitive types that wouldn't have an interesting configuration to them, or
for something you are really not expected to change (the nest path of
nested docs).  So you wouldn't feel you had to go read the docs.  The
schema might even have a comment to mention a list of implicit field types
(a one-liner comma delimited list).

On Fri, Jan 4, 2019 at 10:34 AM Gus Heck <gu...@gmail.com> wrote:

> I'm perhaps slightly conservative with respect to configuration, but I'm
> not fond of hidden configuration that I can't see. What I don't like is
> looking at a config file and not seeing the full story. That means i have
> to read the config and ALSO go read some part of the documentation that
> I've failed to memorize, and probably need to google to find to be fully
> aware of what's going on....  (and no I didn't like it when some standard
> stuff disappeared from solrconfig.xml a while back either). Small changes
> of course seem reasonable, but the further we drift into implicit things,
> especially if we get a collection of several implicit things described in
> various disparate parts of the manual the more cryptic the system becomes.
> That's my opinion, YMMV.
>
> -Gus
>
> On Thu, Jan 3, 2019 at 2:57 PM David Smiley <da...@gmail.com>
> wrote:
>
>> Broadly, you refer to "locale" issues.  Solr's way of dealing with this
>> today is with optional & configurable use of URPs.  The schema-less /
>> data-driven mode has some of these enabled; you can see it in the
>> solrconfig.xml including many date formats.  You can look into that for
>> further info if you like.  The primitive field types are not locale
>> sensitive.
>>
>> Update: It's looking like 8.0 will only employ this implicit field type
>> mechanism for _nest_path_ which probably won't be in the default schema.
>> Assuming it isn't, then it'll only be documented in the context of this
>> particular feature.  It'd be nice to see the scope of fields expanded and
>> at that juncture it could/should be more broadly documented.  That can wait
>> to people have energy to do it.
>>
>> On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <jo...@gmail.com> wrote:
>>
>>> Hi David,
>>>
>>> I now get the idea and yes this makes sense. It would require though
>>> some tutorial or best practices, eg overriding a platform data type may
>>> make not so much sense - it may confuse new developers in an existing
>>> project that know Solr, but then get a platform type that has not the
>>> default behavior.
>>>
>>> Could you deal with different languages in platform types? Eg for dates
>>> it does not seem a problem, because Solr expects only one specific type of
>>> date that needs to be somehow converted beforehand (maybe that conversion
>>> could be also part of a platform type), but decimals are different in some
>>> languages or Boolean values.
>>>
>>> Am 30.12.2018 um 07:01 schrieb David Smiley <da...@gmail.com>:
>>>
>>> Thanks for your thoughtful response Jörn!
>>> ...
>>> On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <jo...@gmail.com>
>>> wrote:
>>>
>>>> I think it is a good idea, but I see some potential complexity for
>>>> “deployment” of collections. For instance, in environments where Solr is
>>>> used as a shared platform amongst several stakeholders, every time you
>>>> deploy/modify a collection you need to take care that the platform types
>>>> exist. If it exists in the Test environment then i need to make sure that
>>>> it exists as well in acceptance/production. The problem is that the
>>>> platform type could have been defined by somebody else who has not yet (eg
>>>> due to project/sprint delays) not updated the other environments. Another
>>>> issue is if I move to another Solr cluster in the same environment. Then, I
>>>> have to make sure that all platform types move with me.
>>>>
>>>
>>> RE "the platform type could have been defined by somebody else":  I'm
>>> not imagining it'd be configurable, thus the "somebody else" is the Solr
>>> project/committers.
>>>
>>> Otherwise, I think I get your point, but perhaps I don't.  It's the same
>>> point for *any* use of some new feature of Solr.  If you use some new
>>> feature, you have to take care that all Solr instances you deploy your
>>> configuration to can handle that new feature.  That's a fairly generic
>>> point that would apply to just about anything in Solr.
>>>
>>>
>>>> A (minor) issue is that platform types may change (for whatever
>>>> reasons) and that then potentially all collections have to be reindexed or
>>>> we have different versions of the same platform type making things not
>>>> easier.
>>>>
>>>
>>> Yes it's possible.  Though I think that point is apart from the feature
>>> I propose.  You're saying that you might want to use an "int" field and
>>> then one day realize you want some newer/better definition of what an "int"
>>> is (e.g. trie -> points).  Sure.  That's true wether the field type is
>>> explicit or implicit.  There's nothing stopping you from explicitly
>>> defining the field type if you want to; the names would not be reserved. If
>>> you want to stick with your current index running the new Solr version,
>>> then you would keep luceneMatchVersion what it was, which would effectively
>>> retain the interpretation of the implicit field types.
>>>
>>>
>>>> Currently we have all our Schema definitions in a version management
>>>> system (we use the Schema API but the JSON requests are out there) so that
>>>> projects can inspire from each other. Needless to say, that careful type
>>>> engineering requires also some documentation on technical design and may be
>>>> indeed very Collection specific.
>>>>
>>>> Another issue could be that a platform type may also imply a certain
>>>> platform solrconfig.xml (eg lib directive etc).
>>>>
>>>
>>> I'm imagining platform types would be basic primitive types (int,
>>> boolean, etc. and some special situations like in the issue I referenced).
>>> They would not depend on contrib libs... though I could imagine one day an
>>> evolution of this in which a contrib could somehow auto-add implicit field
>>> types.
>>>
>>>
>>>> I am not sure yet what are the exact benefits of referring to types of
>>>> other collections in the Solr runtime itself instead of having a version
>>>> system and letting projects decide if they want to adapt types of other
>>>> collections, but maybe I am overlooking something here.
>>>>
>>>
>>> The notion of implicit field types is not a cross-config
>>> (cross-collection) thing.  Implicit field types are nothing more than
>>> built-in shortcuts.
>>>
>>> I recall one of my very early observations of Solr's schema was of
>>> surprise to see primitive types defined in the schema.  Consider in SQL DDL
>>> statements that refer to varchar and such.  Your DDL doesn't need to define
>>> what a varchar is!
>>>
>>> Happy New Year,
>>> ~ David
>>>
>>> Am 28.12.2018 um 17:36 schrieb David Smiley <da...@gmail.com>:
>>>>
>>>> While working on https://issues.apache.org/jira/browse/SOLR-12768 it
>>>> occurred to me that it would be nice if Solr had implicitly defined field
>>>> types.  This would allow you to define a field in your schema that refers
>>>> to a type that is *not* also in your schema -- at least not explicitly
>>>> (need not explicitly be put in your schema.xml if classic, or need not be
>>>> passed to schema manipulation API if you use that).  The idea would be that
>>>> these types would be Solr platform provided field types that need not be
>>>> defined by you.
>>>>
>>>> There are multiple ways this loose idea might be conceived / imagined
>>>> into a concrete proposal.
>>>>
>>>> (A) The main idea I'm kicking around right now is that Solr would _not_
>>>> throw an error at the moment of reading your field definition that it
>>>> doesn't see your type... instead it would see it's a platform type (via
>>>> some built-in hard-coded registry) and then register that type on the fly.
>>>> So if you were to read the schema then you'd see it.  In this way, it's
>>>> kind of a shortcut.  Platform field types that you don't actually refer to
>>>> will never end up being put into your schema.
>>>>
>>>> (B) A schema could pre-initialize with the platform/implicit types.
>>>> This is the simplest idea but I don't like it because you may not even need
>>>> some of these types.  I'm not going to go down this path now but wanted to
>>>> mention it.
>>>>
>>>> I'm exploring (A) right now... I'm hoping to do this for at least a
>>>> "_nest_path_"  field in support of nested documents in 8.0, but conceivably
>>>> the idea would be expanded to lots of things in our base schema right now
>>>> (int, str, etc.)
>>>> --
>>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>>> http://www.solrenterprisesearchserver.com
>>>>
>>>> --
>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> http://www.solrenterprisesearchserver.com
>>>
>>> --
>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
>
> --
> http://www.the111shift.com
>
-- 
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by Gus Heck <gu...@gmail.com>.

I'm perhaps slightly conservative with respect to configuration, but I'm
not fond of hidden configuration that I can't see. What I don't like is
looking at a config file and not seeing the full story. That means i have
to read the config and ALSO go read some part of the documentation that
I've failed to memorize, and probably need to google to find to be fully
aware of what's going on....  (and no I didn't like it when some standard
stuff disappeared from solrconfig.xml a while back either). Small changes
of course seem reasonable, but the further we drift into implicit things,
especially if we get a collection of several implicit things described in
various disparate parts of the manual the more cryptic the system becomes.
That's my opinion, YMMV.

-Gus

On Thu, Jan 3, 2019 at 2:57 PM David Smiley <da...@gmail.com>
wrote:

> Broadly, you refer to "locale" issues.  Solr's way of dealing with this
> today is with optional & configurable use of URPs.  The schema-less /
> data-driven mode has some of these enabled; you can see it in the
> solrconfig.xml including many date formats.  You can look into that for
> further info if you like.  The primitive field types are not locale
> sensitive.
>
> Update: It's looking like 8.0 will only employ this implicit field type
> mechanism for _nest_path_ which probably won't be in the default schema.
> Assuming it isn't, then it'll only be documented in the context of this
> particular feature.  It'd be nice to see the scope of fields expanded and
> at that juncture it could/should be more broadly documented.  That can wait
> to people have energy to do it.
>
> On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <jo...@gmail.com> wrote:
>
>> Hi David,
>>
>> I now get the idea and yes this makes sense. It would require though some
>> tutorial or best practices, eg overriding a platform data type may make not
>> so much sense - it may confuse new developers in an existing project that
>> know Solr, but then get a platform type that has not the default behavior.
>>
>> Could you deal with different languages in platform types? Eg for dates
>> it does not seem a problem, because Solr expects only one specific type of
>> date that needs to be somehow converted beforehand (maybe that conversion
>> could be also part of a platform type), but decimals are different in some
>> languages or Boolean values.
>>
>> Am 30.12.2018 um 07:01 schrieb David Smiley <da...@gmail.com>:
>>
>> Thanks for your thoughtful response Jörn!
>> ...
>> On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <jo...@gmail.com> wrote:
>>
>>> I think it is a good idea, but I see some potential complexity for
>>> “deployment” of collections. For instance, in environments where Solr is
>>> used as a shared platform amongst several stakeholders, every time you
>>> deploy/modify a collection you need to take care that the platform types
>>> exist. If it exists in the Test environment then i need to make sure that
>>> it exists as well in acceptance/production. The problem is that the
>>> platform type could have been defined by somebody else who has not yet (eg
>>> due to project/sprint delays) not updated the other environments. Another
>>> issue is if I move to another Solr cluster in the same environment. Then, I
>>> have to make sure that all platform types move with me.
>>>
>>
>> RE "the platform type could have been defined by somebody else":  I'm not
>> imagining it'd be configurable, thus the "somebody else" is the Solr
>> project/committers.
>>
>> Otherwise, I think I get your point, but perhaps I don't.  It's the same
>> point for *any* use of some new feature of Solr.  If you use some new
>> feature, you have to take care that all Solr instances you deploy your
>> configuration to can handle that new feature.  That's a fairly generic
>> point that would apply to just about anything in Solr.
>>
>>
>>> A (minor) issue is that platform types may change (for whatever reasons)
>>> and that then potentially all collections have to be reindexed or we have
>>> different versions of the same platform type making things not easier.
>>>
>>
>> Yes it's possible.  Though I think that point is apart from the feature I
>> propose.  You're saying that you might want to use an "int" field and then
>> one day realize you want some newer/better definition of what an "int" is
>> (e.g. trie -> points).  Sure.  That's true wether the field type is
>> explicit or implicit.  There's nothing stopping you from explicitly
>> defining the field type if you want to; the names would not be reserved. If
>> you want to stick with your current index running the new Solr version,
>> then you would keep luceneMatchVersion what it was, which would effectively
>> retain the interpretation of the implicit field types.
>>
>>
>>> Currently we have all our Schema definitions in a version management
>>> system (we use the Schema API but the JSON requests are out there) so that
>>> projects can inspire from each other. Needless to say, that careful type
>>> engineering requires also some documentation on technical design and may be
>>> indeed very Collection specific.
>>>
>>> Another issue could be that a platform type may also imply a certain
>>> platform solrconfig.xml (eg lib directive etc).
>>>
>>
>> I'm imagining platform types would be basic primitive types (int,
>> boolean, etc. and some special situations like in the issue I referenced).
>> They would not depend on contrib libs... though I could imagine one day an
>> evolution of this in which a contrib could somehow auto-add implicit field
>> types.
>>
>>
>>> I am not sure yet what are the exact benefits of referring to types of
>>> other collections in the Solr runtime itself instead of having a version
>>> system and letting projects decide if they want to adapt types of other
>>> collections, but maybe I am overlooking something here.
>>>
>>
>> The notion of implicit field types is not a cross-config
>> (cross-collection) thing.  Implicit field types are nothing more than
>> built-in shortcuts.
>>
>> I recall one of my very early observations of Solr's schema was of
>> surprise to see primitive types defined in the schema.  Consider in SQL DDL
>> statements that refer to varchar and such.  Your DDL doesn't need to define
>> what a varchar is!
>>
>> Happy New Year,
>> ~ David
>>
>> Am 28.12.2018 um 17:36 schrieb David Smiley <da...@gmail.com>:
>>>
>>> While working on https://issues.apache.org/jira/browse/SOLR-12768 it
>>> occurred to me that it would be nice if Solr had implicitly defined field
>>> types.  This would allow you to define a field in your schema that refers
>>> to a type that is *not* also in your schema -- at least not explicitly
>>> (need not explicitly be put in your schema.xml if classic, or need not be
>>> passed to schema manipulation API if you use that).  The idea would be that
>>> these types would be Solr platform provided field types that need not be
>>> defined by you.
>>>
>>> There are multiple ways this loose idea might be conceived / imagined
>>> into a concrete proposal.
>>>
>>> (A) The main idea I'm kicking around right now is that Solr would _not_
>>> throw an error at the moment of reading your field definition that it
>>> doesn't see your type... instead it would see it's a platform type (via
>>> some built-in hard-coded registry) and then register that type on the fly.
>>> So if you were to read the schema then you'd see it.  In this way, it's
>>> kind of a shortcut.  Platform field types that you don't actually refer to
>>> will never end up being put into your schema.
>>>
>>> (B) A schema could pre-initialize with the platform/implicit types.
>>> This is the simplest idea but I don't like it because you may not even need
>>> some of these types.  I'm not going to go down this path now but wanted to
>>> mention it.
>>>
>>> I'm exploring (A) right now... I'm hoping to do this for at least a
>>> "_nest_path_"  field in support of nested documents in 8.0, but conceivably
>>> the idea would be expanded to lots of things in our base schema right now
>>> (int, str, etc.)
>>> --
>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> http://www.solrenterprisesearchserver.com
>>>
>>> --
>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>


-- 
http://www.the111shift.com

Re: Feature: Solr implicitly defined field types?

Posted by David Smiley <da...@gmail.com>.

Broadly, you refer to "locale" issues.  Solr's way of dealing with this
today is with optional & configurable use of URPs.  The schema-less /
data-driven mode has some of these enabled; you can see it in the
solrconfig.xml including many date formats.  You can look into that for
further info if you like.  The primitive field types are not locale
sensitive.

Update: It's looking like 8.0 will only employ this implicit field type
mechanism for _nest_path_ which probably won't be in the default schema.
Assuming it isn't, then it'll only be documented in the context of this
particular feature.  It'd be nice to see the scope of fields expanded and
at that juncture it could/should be more broadly documented.  That can wait
to people have energy to do it.

On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <jo...@gmail.com> wrote:

> Hi David,
>
> I now get the idea and yes this makes sense. It would require though some
> tutorial or best practices, eg overriding a platform data type may make not
> so much sense - it may confuse new developers in an existing project that
> know Solr, but then get a platform type that has not the default behavior.
>
> Could you deal with different languages in platform types? Eg for dates it
> does not seem a problem, because Solr expects only one specific type of
> date that needs to be somehow converted beforehand (maybe that conversion
> could be also part of a platform type), but decimals are different in some
> languages or Boolean values.
>
> Am 30.12.2018 um 07:01 schrieb David Smiley <da...@gmail.com>:
>
> Thanks for your thoughtful response Jörn!
> ...
> On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <jo...@gmail.com> wrote:
>
>> I think it is a good idea, but I see some potential complexity for
>> “deployment” of collections. For instance, in environments where Solr is
>> used as a shared platform amongst several stakeholders, every time you
>> deploy/modify a collection you need to take care that the platform types
>> exist. If it exists in the Test environment then i need to make sure that
>> it exists as well in acceptance/production. The problem is that the
>> platform type could have been defined by somebody else who has not yet (eg
>> due to project/sprint delays) not updated the other environments. Another
>> issue is if I move to another Solr cluster in the same environment. Then, I
>> have to make sure that all platform types move with me.
>>
>
> RE "the platform type could have been defined by somebody else":  I'm not
> imagining it'd be configurable, thus the "somebody else" is the Solr
> project/committers.
>
> Otherwise, I think I get your point, but perhaps I don't.  It's the same
> point for *any* use of some new feature of Solr.  If you use some new
> feature, you have to take care that all Solr instances you deploy your
> configuration to can handle that new feature.  That's a fairly generic
> point that would apply to just about anything in Solr.
>
>
>> A (minor) issue is that platform types may change (for whatever reasons)
>> and that then potentially all collections have to be reindexed or we have
>> different versions of the same platform type making things not easier.
>>
>
> Yes it's possible.  Though I think that point is apart from the feature I
> propose.  You're saying that you might want to use an "int" field and then
> one day realize you want some newer/better definition of what an "int" is
> (e.g. trie -> points).  Sure.  That's true wether the field type is
> explicit or implicit.  There's nothing stopping you from explicitly
> defining the field type if you want to; the names would not be reserved. If
> you want to stick with your current index running the new Solr version,
> then you would keep luceneMatchVersion what it was, which would effectively
> retain the interpretation of the implicit field types.
>
>
>> Currently we have all our Schema definitions in a version management
>> system (we use the Schema API but the JSON requests are out there) so that
>> projects can inspire from each other. Needless to say, that careful type
>> engineering requires also some documentation on technical design and may be
>> indeed very Collection specific.
>>
>> Another issue could be that a platform type may also imply a certain
>> platform solrconfig.xml (eg lib directive etc).
>>
>
> I'm imagining platform types would be basic primitive types (int, boolean,
> etc. and some special situations like in the issue I referenced).  They
> would not depend on contrib libs... though I could imagine one day an
> evolution of this in which a contrib could somehow auto-add implicit field
> types.
>
>
>> I am not sure yet what are the exact benefits of referring to types of
>> other collections in the Solr runtime itself instead of having a version
>> system and letting projects decide if they want to adapt types of other
>> collections, but maybe I am overlooking something here.
>>
>
> The notion of implicit field types is not a cross-config
> (cross-collection) thing.  Implicit field types are nothing more than
> built-in shortcuts.
>
> I recall one of my very early observations of Solr's schema was of
> surprise to see primitive types defined in the schema.  Consider in SQL DDL
> statements that refer to varchar and such.  Your DDL doesn't need to define
> what a varchar is!
>
> Happy New Year,
> ~ David
>
> Am 28.12.2018 um 17:36 schrieb David Smiley <da...@gmail.com>:
>>
>> While working on https://issues.apache.org/jira/browse/SOLR-12768 it
>> occurred to me that it would be nice if Solr had implicitly defined field
>> types.  This would allow you to define a field in your schema that refers
>> to a type that is *not* also in your schema -- at least not explicitly
>> (need not explicitly be put in your schema.xml if classic, or need not be
>> passed to schema manipulation API if you use that).  The idea would be that
>> these types would be Solr platform provided field types that need not be
>> defined by you.
>>
>> There are multiple ways this loose idea might be conceived / imagined
>> into a concrete proposal.
>>
>> (A) The main idea I'm kicking around right now is that Solr would _not_
>> throw an error at the moment of reading your field definition that it
>> doesn't see your type... instead it would see it's a platform type (via
>> some built-in hard-coded registry) and then register that type on the fly.
>> So if you were to read the schema then you'd see it.  In this way, it's
>> kind of a shortcut.  Platform field types that you don't actually refer to
>> will never end up being put into your schema.
>>
>> (B) A schema could pre-initialize with the platform/implicit types.  This
>> is the simplest idea but I don't like it because you may not even need some
>> of these types.  I'm not going to go down this path now but wanted to
>> mention it.
>>
>> I'm exploring (A) right now... I'm hoping to do this for at least a
>> "_nest_path_"  field in support of nested documents in 8.0, but conceivably
>> the idea would be expanded to lots of things in our base schema right now
>> (int, str, etc.)
>> --
>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
> --
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by Jörn Franke <jo...@gmail.com>.

Hi David,

I now get the idea and yes this makes sense. It would require though some tutorial or best practices, eg overriding a platform data type may make not so much sense - it may confuse new developers in an existing project that know Solr, but then get a platform type that has not the default behavior.

Could you deal with different languages in platform types? Eg for dates it does not seem a problem, because Solr expects only one specific type of date that needs to be somehow converted beforehand (maybe that conversion could be also part of a platform type), but decimals are different in some languages or Boolean values.

> Am 30.12.2018 um 07:01 schrieb David Smiley <da...@gmail.com>:
> 
> Thanks for your thoughtful response Jörn!
> ...
>> On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <jo...@gmail.com> wrote:
>> I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me. 
> 
> RE "the platform type could have been defined by somebody else":  I'm not imagining it'd be configurable, thus the "somebody else" is the Solr project/committers.
> 
> Otherwise, I think I get your point, but perhaps I don't.  It's the same point for any use of some new feature of Solr.  If you use some new feature, you have to take care that all Solr instances you deploy your configuration to can handle that new feature.  That's a fairly generic point that would apply to just about anything in Solr.
>  
>> A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.
> 
> Yes it's possible.  Though I think that point is apart from the feature I propose.  You're saying that you might want to use an "int" field and then one day realize you want some newer/better definition of what an "int" is (e.g. trie -> points).  Sure.  That's true wether the field type is explicit or implicit.  There's nothing stopping you from explicitly defining the field type if you want to; the names would not be reserved. If you want to stick with your current index running the new Solr version, then you would keep luceneMatchVersion what it was, which would effectively retain the interpretation of the implicit field types.
>  
>> Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.
>> 
>> Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc). 
> 
> I'm imagining platform types would be basic primitive types (int, boolean, etc. and some special situations like in the issue I referenced).  They would not depend on contrib libs... though I could imagine one day an evolution of this in which a contrib could somehow auto-add implicit field types.
>  
>> I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.
> 
> The notion of implicit field types is not a cross-config (cross-collection) thing.  Implicit field types are nothing more than built-in shortcuts.
>  
> I recall one of my very early observations of Solr's schema was of surprise to see primitive types defined in the schema.  Consider in SQL DDL statements that refer to varchar and such.  Your DDL doesn't need to define what a varchar is!
> 
> Happy New Year,
> ~ David
> 
>>> Am 28.12.2018 um 17:36 schrieb David Smiley <da...@gmail.com>:
>>> 
>>> While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.  
>>> 
>>> There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.  
>>> 
>>> (A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.
>>> 
>>> (B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.
>>> 
>>> I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
>>> -- 
>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
> -- 
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by David Smiley <da...@gmail.com>.

Thanks for your thoughtful response Jörn!
...
On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <jo...@gmail.com> wrote:

> I think it is a good idea, but I see some potential complexity for
> “deployment” of collections. For instance, in environments where Solr is
> used as a shared platform amongst several stakeholders, every time you
> deploy/modify a collection you need to take care that the platform types
> exist. If it exists in the Test environment then i need to make sure that
> it exists as well in acceptance/production. The problem is that the
> platform type could have been defined by somebody else who has not yet (eg
> due to project/sprint delays) not updated the other environments. Another
> issue is if I move to another Solr cluster in the same environment. Then, I
> have to make sure that all platform types move with me.
>

RE "the platform type could have been defined by somebody else":  I'm not
imagining it'd be configurable, thus the "somebody else" is the Solr
project/committers.

Otherwise, I think I get your point, but perhaps I don't.  It's the same
point for *any* use of some new feature of Solr.  If you use some new
feature, you have to take care that all Solr instances you deploy your
configuration to can handle that new feature.  That's a fairly generic
point that would apply to just about anything in Solr.

> A (minor) issue is that platform types may change (for whatever reasons)
> and that then potentially all collections have to be reindexed or we have
> different versions of the same platform type making things not easier.
>

Yes it's possible.  Though I think that point is apart from the feature I
propose.  You're saying that you might want to use an "int" field and then
one day realize you want some newer/better definition of what an "int" is
(e.g. trie -> points).  Sure.  That's true wether the field type is
explicit or implicit.  There's nothing stopping you from explicitly
defining the field type if you want to; the names would not be reserved. If
you want to stick with your current index running the new Solr version,
then you would keep luceneMatchVersion what it was, which would effectively
retain the interpretation of the implicit field types.

> Currently we have all our Schema definitions in a version management
> system (we use the Schema API but the JSON requests are out there) so that
> projects can inspire from each other. Needless to say, that careful type
> engineering requires also some documentation on technical design and may be
> indeed very Collection specific.
>
> Another issue could be that a platform type may also imply a certain
> platform solrconfig.xml (eg lib directive etc).
>

I'm imagining platform types would be basic primitive types (int, boolean,
etc. and some special situations like in the issue I referenced).  They
would not depend on contrib libs... though I could imagine one day an
evolution of this in which a contrib could somehow auto-add implicit field
types.

> I am not sure yet what are the exact benefits of referring to types of
> other collections in the Solr runtime itself instead of having a version
> system and letting projects decide if they want to adapt types of other
> collections, but maybe I am overlooking something here.
>

The notion of implicit field types is not a cross-config (cross-collection)
thing.  Implicit field types are nothing more than built-in shortcuts.

I recall one of my very early observations of Solr's schema was of surprise
to see primitive types defined in the schema.  Consider in SQL DDL
statements that refer to varchar and such.  Your DDL doesn't need to define
what a varchar is!

Happy New Year,
~ David

Am 28.12.2018 um 17:36 schrieb David Smiley <da...@gmail.com>:
>
> While working on https://issues.apache.org/jira/browse/SOLR-12768 it
> occurred to me that it would be nice if Solr had implicitly defined field
> types.  This would allow you to define a field in your schema that refers
> to a type that is *not* also in your schema -- at least not explicitly
> (need not explicitly be put in your schema.xml if classic, or need not be
> passed to schema manipulation API if you use that).  The idea would be that
> these types would be Solr platform provided field types that need not be
> defined by you.
>
> There are multiple ways this loose idea might be conceived / imagined into
> a concrete proposal.
>
> (A) The main idea I'm kicking around right now is that Solr would _not_
> throw an error at the moment of reading your field definition that it
> doesn't see your type... instead it would see it's a platform type (via
> some built-in hard-coded registry) and then register that type on the fly.
> So if you were to read the schema then you'd see it.  In this way, it's
> kind of a shortcut.  Platform field types that you don't actually refer to
> will never end up being put into your schema.
>
> (B) A schema could pre-initialize with the platform/implicit types.  This
> is the simplest idea but I don't like it because you may not even need some
> of these types.  I'm not going to go down this path now but wanted to
> mention it.
>
> I'm exploring (A) right now... I'm hoping to do this for at least a
> "_nest_path_"  field in support of nested documents in 8.0, but conceivably
> the idea would be expanded to lots of things in our base schema right now
> (int, str, etc.)
> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
> --
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Feature: Solr implicitly defined field types?

Posted by Jörn Franke <jo...@gmail.com>.

I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me. 

A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.

Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.

Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc). 

I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.

> Am 28.12.2018 um 17:36 schrieb David Smiley <da...@gmail.com>:
> 
> While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.  
> 
> There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.  
> 
> (A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.
> 
> (B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.
> 
> I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
> -- 
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com