You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2014/08/07 18:49:05 UTC

Re: UseCompoundFile in SolrIndexConfig

: I understand that this might seem as a simplification to users, where they
: set this value once and it controls both places, but I think it's bad.
: First, because if you set <useCompoundFile>, you basically *always* end up
: w/ CFS, even if you intend that to apply to only newly flushed segments. In
: order to use default settings for merged segments, you have to explicitly
: include the default settings in the <mergePolicy> element. This is trappy I
: think and looks odd.

I'm pretty sure this was intentional because it kept things consistent 
from a backcompat standpoint, and (for solr users with a high level 
understanding, not folks like you who are intimitly familiar with the 
underlying code) it's very easy to understand: if you just set 
<useCompoundFile/> -- w/o customizing a <mergePolicy/> -- all of your 
files are compound.  Novice solr users will never be confused why there 
are some files that aren't compound.

Having said that: times change.  

If you think it's trappy behavior for the common case, i won't agrue with 
you (somebody else might though).  There's nothing to stop us from 
changing it, as long as we have a note in the Upgrading section making it 
clear what folks need to add to their solrconfig.xml file to maintain 
existing behavior if they so choose.  (and as you said: beefing up the ref 
guide with details about why there are multiple CFS related settings, and 
ow they impact perf in different scenerios, etc...)

: Beyond that, SolrIndexConfig is trunk contains deprecated code around this
: parameter and somewhat hacks around older schemas that defined useCFS
: inside the MP element -- are we still required to support that back-compat
: in trunk as well?

that's more of a judgement call.

usually what we've done in situations like this is make the backcompat 
support log a WARN that the syntax they are using is deprecated and should 
be changed, but then we also tend to leave the support in for as long as 
feasible -- following the princible of "don't break shit for existing 
users unless absolutely neccessary".  but "feasible" and "absolutely 
neccessary" can vary by situation: if the backcompat hoops we have to jump 
through are making the code impossible to maintain, or impossible to add 
some new feature, or causing performance problems for the common case (but 
that's rare with config still backcompat like we're talking about here) - 
then go ahead and rip it out in trunk; but when doing that we also usually 
update the "active" (ie: 4x) branch to switch those WARN logs to hard 
fails so they don't get overlooked by obtuse users who keep upgrading.

So, for example: some user has a config file they've been upgrading since 
Solr 1.2 that contains a <foo/> tag. in 3.6 the syntax changed, <bar 
foo=""/> is the new right way to do things and we added backcompat kludge 
for the old syntax, with a WARN log advising them to change -- but the 
user never notices it.  ~ 4.5 we decided the backcompat logic is getting 
to be a bitch to maintain, so on trunk we rip it out, but on 4x we add a 
special check that throws a hard starup error if the <foo/> tag was found 
in the config -- so as long as the guy upgrades to 4.6 at some point, 
he'll know beyond a doubt that he needs to change his config.  but if he 
manages to upgrade from 4.5 directly to 5.x, then his antique syntax will 
just be silently ignored.




-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: UseCompoundFile in SolrIndexConfig

Posted by Chris Hostetter <ho...@fucit.org>.
: Of course, if we change the semantics of <useCompoundFile>, we can remove
: the deprecated code in 5.0 because the behavior changed and it's fine if
: users don't pay attention since nothing breaks in their apps. I will handle

exactly.



-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: UseCompoundFile in SolrIndexConfig

Posted by Shai Erera <se...@gmail.com>.
OK, I agree I didn't have XML APIs in mind when I wrote that. So if we
change that API such that users *must* migrate (because e.g. we add a new
mandatory parameter), then that's fine not to keep old code. But if users'
apps silently use other settings just because they didn't upgrade their XML
files (and old tags are ignored), that's not good.

Of course, if we change the semantics of <useCompoundFile>, we can remove
the deprecated code in 5.0 because the behavior changed and it's fine if
users don't pay attention since nothing breaks in their apps. I will handle
that under the same issue.

Shai


On Mon, Aug 11, 2014 at 7:55 PM, Chris Hostetter <ho...@fucit.org>
wrote:

>
> : Maybe we can have a new parameter <alwaysUseCompoundFile> to affect both
> : newly flushed and newly merged segments (this will be translated into
> : respective IWC and MP settings), and we make <useCompoundFile> affect IWC
> : only? I know this is a slight change to back-compat, but it's not a
> serious
> : change as in user indexes will still work as they did, only huge merged
> : segments won't be packed in a CFS. So if anyone asks, we just tell them
> to
> : migrate to the new API.
>
> I think that's fine ... but i also think it's fine to make a clean break
> of it, and document in CHANGES.txt's upgrade section "<useCompoundFile> no
> longer affects the MergePolicy, it only impacts IndexWriterConfig, because
> that makes the most sense by default and most users on upgrade should be
> better off in this situation -- if you really want to *always* use
> compound files, add the following settings to your <mergePolicy/>
> config..."
>
> Happy to let you make that call -- no strong opinion about it.
>
> : As for keeping code in for as long as we can ... I have a problem with
> : that. It's like we try to "educate" users about the best use of Solr,
> : through WARN messages, but then don't make them do the actual cut ever,
> : unless it gets in the developers' way. I'd prefer that we treat the XML
> : files like any other API -- we deprecate and remove in the next major
> : release. Users have all the time through 4.x to get used to the new API.
> : When 5.0 is out, they have to make the cut, and since we also document
> that
> : in the migration guide, it should be enough, no?
>
> There's a subtle but important distinction though between "Java APIs" and
> "XML APIs" -- which is why i tend to err on the side of leaving in support
> for things as long as possible when dealing with XML parsing.
>
> in both cases,you can add a deprecation message (from the compiler,
> or a warning message logged by the config file parsing code) but the
> question is what happens if they ignore or don't notice those warnings and
> keeps upgrading -- or does a leap frog upgrade (ie: goes straight from 4.5
> to 5.0, and you deprecated soemthing in 4.6)
>
> if you rip out that Java API in 5.0, a user who blindly upgrades will get
> a hard failure right up front at compilation and knows immediately that
> there is a problem and they have to consult the upgrade/migration docs.
>
> on the xml side though ... a user who does a leap frog upgrade straight
> from 4.5 to 5.0 (or didn't notice the warnings logged by 4.6) will have no
> idea that their config is now being ignored -- unless of course there is
> special logic in the 5.0 code to check for it, bug if you're leaving that
> in there, then why not leave in code to try and be backcompat as well?
>
>
> but like i said .. it's a fine line, and a judegement call -- there
> hasn't really been an explicit "this is what we always do".  in some
> cases the impact on users is minor but the burden on developers is heavy,
> so it's easy to make one choice ... in other cases the impact on users
> can be significant, so a little extra effort is put into backcompat, or at
> least keeping an explicit check & hard failure message in the code beyond
> the next X.0 release.
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: UseCompoundFile in SolrIndexConfig

Posted by Chris Hostetter <ho...@fucit.org>.
: Maybe we can have a new parameter <alwaysUseCompoundFile> to affect both
: newly flushed and newly merged segments (this will be translated into
: respective IWC and MP settings), and we make <useCompoundFile> affect IWC
: only? I know this is a slight change to back-compat, but it's not a serious
: change as in user indexes will still work as they did, only huge merged
: segments won't be packed in a CFS. So if anyone asks, we just tell them to
: migrate to the new API.

I think that's fine ... but i also think it's fine to make a clean break 
of it, and document in CHANGES.txt's upgrade section "<useCompoundFile> no 
longer affects the MergePolicy, it only impacts IndexWriterConfig, because 
that makes the most sense by default and most users on upgrade should be 
better off in this situation -- if you really want to *always* use 
compound files, add the following settings to your <mergePolicy/> 
config..."

Happy to let you make that call -- no strong opinion about it.

: As for keeping code in for as long as we can ... I have a problem with
: that. It's like we try to "educate" users about the best use of Solr,
: through WARN messages, but then don't make them do the actual cut ever,
: unless it gets in the developers' way. I'd prefer that we treat the XML
: files like any other API -- we deprecate and remove in the next major
: release. Users have all the time through 4.x to get used to the new API.
: When 5.0 is out, they have to make the cut, and since we also document that
: in the migration guide, it should be enough, no?

There's a subtle but important distinction though between "Java APIs" and 
"XML APIs" -- which is why i tend to err on the side of leaving in support 
for things as long as possible when dealing with XML parsing.

in both cases,you can add a deprecation message (from the compiler, 
or a warning message logged by the config file parsing code) but the 
question is what happens if they ignore or don't notice those warnings and 
keeps upgrading -- or does a leap frog upgrade (ie: goes straight from 4.5 
to 5.0, and you deprecated soemthing in 4.6)

if you rip out that Java API in 5.0, a user who blindly upgrades will get 
a hard failure right up front at compilation and knows immediately that 
there is a problem and they have to consult the upgrade/migration docs.

on the xml side though ... a user who does a leap frog upgrade straight 
from 4.5 to 5.0 (or didn't notice the warnings logged by 4.6) will have no 
idea that their config is now being ignored -- unless of course there is 
special logic in the 5.0 code to check for it, bug if you're leaving that 
in there, then why not leave in code to try and be backcompat as well?


but like i said .. it's a fine line, and a judegement call -- there 
hasn't really been an explicit "this is what we always do".  in some 
cases the impact on users is minor but the burden on developers is heavy, 
so it's easy to make one choice ... in other cases the impact on users 
can be significant, so a little extra effort is put into backcompat, or at 
least keeping an explicit check & hard failure message in the code beyond 
the next X.0 release.


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: UseCompoundFile in SolrIndexConfig

Posted by Shai Erera <se...@gmail.com>.
Thanks Hoss for the detailed reply!

About <useCompoundFile>, I understand the simplification this brings to
regular users, but I also think we should protect such users from making
silly mistakes, ONLY because they don't have deep understanding of the
underlying stuff. Packing 20GB segments in a compound file will most likely
buy you nothing at search time, and is a lot of wasted I/O during indexing.

Maybe we can have a new parameter <alwaysUseCompoundFile> to affect both
newly flushed and newly merged segments (this will be translated into
respective IWC and MP settings), and we make <useCompoundFile> affect IWC
only? I know this is a slight change to back-compat, but it's not a serious
change as in user indexes will still work as they did, only huge merged
segments won't be packed in a CFS. So if anyone asks, we just tell them to
migrate to the new API.

As for keeping code in for as long as we can ... I have a problem with
that. It's like we try to "educate" users about the best use of Solr,
through WARN messages, but then don't make them do the actual cut ever,
unless it gets in the developers' way. I'd prefer that we treat the XML
files like any other API -- we deprecate and remove in the next major
release. Users have all the time through 4.x to get used to the new API.
When 5.0 is out, they have to make the cut, and since we also document that
in the migration guide, it should be enough, no?

Shai


On Thu, Aug 7, 2014 at 7:49 PM, Chris Hostetter <ho...@fucit.org>
wrote:

>
> : I understand that this might seem as a simplification to users, where
> they
> : set this value once and it controls both places, but I think it's bad.
> : First, because if you set <useCompoundFile>, you basically *always* end
> up
> : w/ CFS, even if you intend that to apply to only newly flushed segments.
> In
> : order to use default settings for merged segments, you have to explicitly
> : include the default settings in the <mergePolicy> element. This is
> trappy I
> : think and looks odd.
>
> I'm pretty sure this was intentional because it kept things consistent
> from a backcompat standpoint, and (for solr users with a high level
> understanding, not folks like you who are intimitly familiar with the
> underlying code) it's very easy to understand: if you just set
> <useCompoundFile/> -- w/o customizing a <mergePolicy/> -- all of your
> files are compound.  Novice solr users will never be confused why there
> are some files that aren't compound.
>
> Having said that: times change.
>
> If you think it's trappy behavior for the common case, i won't agrue with
> you (somebody else might though).  There's nothing to stop us from
> changing it, as long as we have a note in the Upgrading section making it
> clear what folks need to add to their solrconfig.xml file to maintain
> existing behavior if they so choose.  (and as you said: beefing up the ref
> guide with details about why there are multiple CFS related settings, and
> ow they impact perf in different scenerios, etc...)
>
> : Beyond that, SolrIndexConfig is trunk contains deprecated code around
> this
> : parameter and somewhat hacks around older schemas that defined useCFS
> : inside the MP element -- are we still required to support that
> back-compat
> : in trunk as well?
>
> that's more of a judgement call.
>
> usually what we've done in situations like this is make the backcompat
> support log a WARN that the syntax they are using is deprecated and should
> be changed, but then we also tend to leave the support in for as long as
> feasible -- following the princible of "don't break shit for existing
> users unless absolutely neccessary".  but "feasible" and "absolutely
> neccessary" can vary by situation: if the backcompat hoops we have to jump
> through are making the code impossible to maintain, or impossible to add
> some new feature, or causing performance problems for the common case (but
> that's rare with config still backcompat like we're talking about here) -
> then go ahead and rip it out in trunk; but when doing that we also usually
> update the "active" (ie: 4x) branch to switch those WARN logs to hard
> fails so they don't get overlooked by obtuse users who keep upgrading.
>
> So, for example: some user has a config file they've been upgrading since
> Solr 1.2 that contains a <foo/> tag. in 3.6 the syntax changed, <bar
> foo=""/> is the new right way to do things and we added backcompat kludge
> for the old syntax, with a WARN log advising them to change -- but the
> user never notices it.  ~ 4.5 we decided the backcompat logic is getting
> to be a bitch to maintain, so on trunk we rip it out, but on 4x we add a
> special check that throws a hard starup error if the <foo/> tag was found
> in the config -- so as long as the guy upgrades to 4.6 at some point,
> he'll know beyond a doubt that he needs to change his config.  but if he
> manages to upgrade from 4.5 directly to 5.x, then his antique syntax will
> just be silently ignored.
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>