You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2013/08/30 13:42:31 UTC

Making Lucene45Codec extensions have their own name

Hi

While debugging a test failure (unrelated to the subject of this email), I
noticed something strange. I set the Codec to Lucene45Codec, yet when the
segment was read, it used Facet45Codec. Digging further I found this:

   - All Lucene45Codec extensions share the same "Lucene45" name, there's
   no way to override that.
   - facet/META-INF was loaded before core/META-INF, so NamedSPILoader set
   "Lucene45=Facet45Codec" and when later it discovered Lucene45Codec, it
   didn't override the mapping (as it defines).

While in this case there's no real harm done since all Facet45Codec does is
apply special logic for the "facet" fields, and also during search time the
DVFormat and PostingsFormat are loaded by PerField based on attributes it
recorded in FieldInfo, I think it's better if Codecs can have unique names.
It's definitely healthier (and safer) to know that the same Codec type that
was used to write this segment is used to read it.

I think this is very easy to solve, if we allow Lucene45Codec extensions
"own" their name if they choose to. It's quite easy, by having two ctors on
Lucene45Codec:

   1. Default ctor (no args) which defaults the name to "Lucene45"
   2. Protected ctor which takes a name as argument and can be called by
   extensions' ctors. That way, Facet45Codec could do "super("Facet45")" and
   own its name.

What do you think? Do you see any problem this may cause?

Shai

Re: Making Lucene45Codec extensions have their own name

Posted by Shai Erera <se...@gmail.com>.
I think one reason to keep it there is Codec.availableCodecs(). If a test
relies on it to pick a Codec at random, he won't be able to choose RWCodec.

Sorry if I go back and forth, I'm just trying to understand these
dependencies.

This boolean seems like an easy solution, I don't mind adding it in
LUCENE-5189.

Shai


On Sat, Aug 31, 2013 at 6:34 AM, Robert Muir <rc...@gmail.com> wrote:

> We do not need to do anything complicated here.
>
> we just need to add back the boolean (impersonation is active) like we
> did for 3.x back compat.
>
> the test that rely upon this (currently there are zero of them) can
> unset it appropriately.
>
>
> On Fri, Aug 30, 2013 at 11:24 PM, Shai Erera <se...@gmail.com> wrote:
> > I tried that and of course it didn't work, but because of lines like
> these
> > in TestRuleSetupAndRestoreClass:
> >
> >       codec = Codec.forName("Lucene40");
> >       assert codec instanceof Lucene40RWCodec : "fix your classpath to
> have
> > tests-framework.jar before lucene-core.jar";
> >       assert (PostingsFormat.forName("Lucene40") instanceof
> > Lucene40RWPostingsFormat) : "fix your classpath to have
> tests-framework.jar
> > before lucene-core.jar";
> >
> > I understand the third line, but why do we need to do Codec.forName here?
> > I searched for Codec.forName and there are 14 matches, some of which are
> in
> > tests which should just do e.g. "new SimpleTextCodec()" or "new
> > Lucene40RWCodec()". What do you think?
> >
> > Shai
> >
> >
> > On Sat, Aug 31, 2013 at 6:13 AM, Shai Erera <se...@gmail.com> wrote:
> >>
> >> On LUCENE-5189 I ran into the following problem: I wrote a test which
> >> creates an index with "old" unsupported Codecs (40, 41, 42) and then
> tries
> >> to update them, expecting to fail since those Codecs don't support
> >> numeric-dv-updates, as well as their Consumer isn't even shipped and
> they
> >> throw UOE in fieldsConsumer(). But, the test doesn't hit that exception,
> >> because the RWCodecs are listed under test-framework/META-INF/services
> and
> >> are used instead of the real 40/41/42Codec.
> >>
> >> In light of what was discussed here, why do we need those Codecs listed
> in
> >> services?
> >>
> >> Shai
> >>
> >>
> >> On Fri, Aug 30, 2013 at 5:28 PM, Shai Erera <se...@gmail.com> wrote:
> >>>
> >>> I removed it from META-INF and all tests pass. So I guess it's safe to
> >>> remove it completely.
> >>>
> >>>
> >>>> Actually Facet45Codec is not needed at all. I dont understand why we
> >>>> have a codec class here at all.
> >>>
> >>>
> >>> Because it takes FacetIndexingParams and returns FacetDVF for all
> fields
> >>> that appear in the indexing params. Can you do that without a Codec
> class?
> >>> It's also easier to use than users figuring they should extend
> >>> Lucene45Codec to plug-in FacetDVF.
> >>>
> >>> I'll just remove it from META-INF?
> >>>
> >>> Shai
> >>>
> >>>
> >>> On Fri, Aug 30, 2013 at 5:16 PM, Robert Muir <rc...@gmail.com> wrote:
> >>>>
> >>>> On Fri, Aug 30, 2013 at 10:01 AM, Shai Erera <se...@gmail.com>
> wrote:
> >>>> > I don't think we should go that far. If you extend Lucene45Codec you
> >>>> > basically agree to the entire index format, but are given a chance
> to
> >>>> > control per-field postings and doc values. Otherwise, make your own
> >>>> > Codec,
> >>>> > and then you'll need to register it in META-INF/services.
> >>>> >
> >>>> > The assert I proposed to make in the ctor is only for "education
> >>>> > purposes"
> >>>> > -- apps need not register their Lucene45CodecExtension in services.
> We
> >>>> > can
> >>>> > document it, and assertions would help verify it.
> >>>> >
> >>>>
> >>>> I don't think we need anything here really: Facet45Codec shouldnt be
> >>>> in META-INF.
> >>>>
> >>>> Actually Facet45Codec is not needed at all. I dont understand why we
> >>>> have a codec class here at all.
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>>
> >>>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Making Lucene45Codec extensions have their own name

Posted by Robert Muir <rc...@gmail.com>.
We do not need to do anything complicated here.

we just need to add back the boolean (impersonation is active) like we
did for 3.x back compat.

the test that rely upon this (currently there are zero of them) can
unset it appropriately.


On Fri, Aug 30, 2013 at 11:24 PM, Shai Erera <se...@gmail.com> wrote:
> I tried that and of course it didn't work, but because of lines like these
> in TestRuleSetupAndRestoreClass:
>
>       codec = Codec.forName("Lucene40");
>       assert codec instanceof Lucene40RWCodec : "fix your classpath to have
> tests-framework.jar before lucene-core.jar";
>       assert (PostingsFormat.forName("Lucene40") instanceof
> Lucene40RWPostingsFormat) : "fix your classpath to have tests-framework.jar
> before lucene-core.jar";
>
> I understand the third line, but why do we need to do Codec.forName here?
> I searched for Codec.forName and there are 14 matches, some of which are in
> tests which should just do e.g. "new SimpleTextCodec()" or "new
> Lucene40RWCodec()". What do you think?
>
> Shai
>
>
> On Sat, Aug 31, 2013 at 6:13 AM, Shai Erera <se...@gmail.com> wrote:
>>
>> On LUCENE-5189 I ran into the following problem: I wrote a test which
>> creates an index with "old" unsupported Codecs (40, 41, 42) and then tries
>> to update them, expecting to fail since those Codecs don't support
>> numeric-dv-updates, as well as their Consumer isn't even shipped and they
>> throw UOE in fieldsConsumer(). But, the test doesn't hit that exception,
>> because the RWCodecs are listed under test-framework/META-INF/services and
>> are used instead of the real 40/41/42Codec.
>>
>> In light of what was discussed here, why do we need those Codecs listed in
>> services?
>>
>> Shai
>>
>>
>> On Fri, Aug 30, 2013 at 5:28 PM, Shai Erera <se...@gmail.com> wrote:
>>>
>>> I removed it from META-INF and all tests pass. So I guess it's safe to
>>> remove it completely.
>>>
>>>
>>>> Actually Facet45Codec is not needed at all. I dont understand why we
>>>> have a codec class here at all.
>>>
>>>
>>> Because it takes FacetIndexingParams and returns FacetDVF for all fields
>>> that appear in the indexing params. Can you do that without a Codec class?
>>> It's also easier to use than users figuring they should extend
>>> Lucene45Codec to plug-in FacetDVF.
>>>
>>> I'll just remove it from META-INF?
>>>
>>> Shai
>>>
>>>
>>> On Fri, Aug 30, 2013 at 5:16 PM, Robert Muir <rc...@gmail.com> wrote:
>>>>
>>>> On Fri, Aug 30, 2013 at 10:01 AM, Shai Erera <se...@gmail.com> wrote:
>>>> > I don't think we should go that far. If you extend Lucene45Codec you
>>>> > basically agree to the entire index format, but are given a chance to
>>>> > control per-field postings and doc values. Otherwise, make your own
>>>> > Codec,
>>>> > and then you'll need to register it in META-INF/services.
>>>> >
>>>> > The assert I proposed to make in the ctor is only for "education
>>>> > purposes"
>>>> > -- apps need not register their Lucene45CodecExtension in services. We
>>>> > can
>>>> > document it, and assertions would help verify it.
>>>> >
>>>>
>>>> I don't think we need anything here really: Facet45Codec shouldnt be
>>>> in META-INF.
>>>>
>>>> Actually Facet45Codec is not needed at all. I dont understand why we
>>>> have a codec class here at all.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Making Lucene45Codec extensions have their own name

Posted by Shai Erera <se...@gmail.com>.
I tried that and of course it didn't work, but because of lines like these
in TestRuleSetupAndRestoreClass:

      codec = Codec.forName("Lucene40");
      assert codec instanceof Lucene40RWCodec : "fix your classpath to have
tests-framework.jar before lucene-core.jar";
      assert (PostingsFormat.forName("Lucene40") instanceof
Lucene40RWPostingsFormat) : "fix your classpath to have tests-framework.jar
before lucene-core.jar";

I understand the third line, but why do we need to do Codec.forName here?
I searched for Codec.forName and there are 14 matches, some of which are in
tests which should just do e.g. "new SimpleTextCodec()" or "new
Lucene40RWCodec()". What do you think?

Shai


On Sat, Aug 31, 2013 at 6:13 AM, Shai Erera <se...@gmail.com> wrote:

> On LUCENE-5189 I ran into the following problem: I wrote a test which
> creates an index with "old" unsupported Codecs (40, 41, 42) and then tries
> to update them, expecting to fail since those Codecs don't support
> numeric-dv-updates, as well as their Consumer isn't even shipped and they
> throw UOE in fieldsConsumer(). But, the test doesn't hit that exception,
> because the RWCodecs are listed under test-framework/META-INF/services and
> are used instead of the real 40/41/42Codec.
>
> In light of what was discussed here, why do we need those Codecs listed in
> services?
>
> Shai
>
>
> On Fri, Aug 30, 2013 at 5:28 PM, Shai Erera <se...@gmail.com> wrote:
>
>> I removed it from META-INF and all tests pass. So I guess it's safe to
>> remove it completely.
>>
>>
>> Actually Facet45Codec is not needed at all. I dont understand why we
>>> have a codec class here at all.
>>>
>>
>> Because it takes FacetIndexingParams and returns FacetDVF for all fields
>> that appear in the indexing params. Can you do that without a Codec class?
>> It's also easier to use than users figuring they should extend
>> Lucene45Codec to plug-in FacetDVF.
>>
>> I'll just remove it from META-INF?
>>
>> Shai
>>
>>
>> On Fri, Aug 30, 2013 at 5:16 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>> On Fri, Aug 30, 2013 at 10:01 AM, Shai Erera <se...@gmail.com> wrote:
>>> > I don't think we should go that far. If you extend Lucene45Codec you
>>> > basically agree to the entire index format, but are given a chance to
>>> > control per-field postings and doc values. Otherwise, make your own
>>> Codec,
>>> > and then you'll need to register it in META-INF/services.
>>> >
>>> > The assert I proposed to make in the ctor is only for "education
>>> purposes"
>>> > -- apps need not register their Lucene45CodecExtension in services. We
>>> can
>>> > document it, and assertions would help verify it.
>>> >
>>>
>>> I don't think we need anything here really: Facet45Codec shouldnt be
>>> in META-INF.
>>>
>>> Actually Facet45Codec is not needed at all. I dont understand why we
>>> have a codec class here at all.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>
>

Re: Making Lucene45Codec extensions have their own name

Posted by Shai Erera <se...@gmail.com>.
On LUCENE-5189 I ran into the following problem: I wrote a test which
creates an index with "old" unsupported Codecs (40, 41, 42) and then tries
to update them, expecting to fail since those Codecs don't support
numeric-dv-updates, as well as their Consumer isn't even shipped and they
throw UOE in fieldsConsumer(). But, the test doesn't hit that exception,
because the RWCodecs are listed under test-framework/META-INF/services and
are used instead of the real 40/41/42Codec.

In light of what was discussed here, why do we need those Codecs listed in
services?

Shai


On Fri, Aug 30, 2013 at 5:28 PM, Shai Erera <se...@gmail.com> wrote:

> I removed it from META-INF and all tests pass. So I guess it's safe to
> remove it completely.
>
>
> Actually Facet45Codec is not needed at all. I dont understand why we
>> have a codec class here at all.
>>
>
> Because it takes FacetIndexingParams and returns FacetDVF for all fields
> that appear in the indexing params. Can you do that without a Codec class?
> It's also easier to use than users figuring they should extend
> Lucene45Codec to plug-in FacetDVF.
>
> I'll just remove it from META-INF?
>
> Shai
>
>
> On Fri, Aug 30, 2013 at 5:16 PM, Robert Muir <rc...@gmail.com> wrote:
>
>> On Fri, Aug 30, 2013 at 10:01 AM, Shai Erera <se...@gmail.com> wrote:
>> > I don't think we should go that far. If you extend Lucene45Codec you
>> > basically agree to the entire index format, but are given a chance to
>> > control per-field postings and doc values. Otherwise, make your own
>> Codec,
>> > and then you'll need to register it in META-INF/services.
>> >
>> > The assert I proposed to make in the ctor is only for "education
>> purposes"
>> > -- apps need not register their Lucene45CodecExtension in services. We
>> can
>> > document it, and assertions would help verify it.
>> >
>>
>> I don't think we need anything here really: Facet45Codec shouldnt be
>> in META-INF.
>>
>> Actually Facet45Codec is not needed at all. I dont understand why we
>> have a codec class here at all.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Re: Making Lucene45Codec extensions have their own name

Posted by Shai Erera <se...@gmail.com>.
I removed it from META-INF and all tests pass. So I guess it's safe to
remove it completely.

Actually Facet45Codec is not needed at all. I dont understand why we
> have a codec class here at all.
>

Because it takes FacetIndexingParams and returns FacetDVF for all fields
that appear in the indexing params. Can you do that without a Codec class?
It's also easier to use than users figuring they should extend
Lucene45Codec to plug-in FacetDVF.

I'll just remove it from META-INF?

Shai


On Fri, Aug 30, 2013 at 5:16 PM, Robert Muir <rc...@gmail.com> wrote:

> On Fri, Aug 30, 2013 at 10:01 AM, Shai Erera <se...@gmail.com> wrote:
> > I don't think we should go that far. If you extend Lucene45Codec you
> > basically agree to the entire index format, but are given a chance to
> > control per-field postings and doc values. Otherwise, make your own
> Codec,
> > and then you'll need to register it in META-INF/services.
> >
> > The assert I proposed to make in the ctor is only for "education
> purposes"
> > -- apps need not register their Lucene45CodecExtension in services. We
> can
> > document it, and assertions would help verify it.
> >
>
> I don't think we need anything here really: Facet45Codec shouldnt be
> in META-INF.
>
> Actually Facet45Codec is not needed at all. I dont understand why we
> have a codec class here at all.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Making Lucene45Codec extensions have their own name

Posted by Robert Muir <rc...@gmail.com>.
On Fri, Aug 30, 2013 at 10:01 AM, Shai Erera <se...@gmail.com> wrote:
> I don't think we should go that far. If you extend Lucene45Codec you
> basically agree to the entire index format, but are given a chance to
> control per-field postings and doc values. Otherwise, make your own Codec,
> and then you'll need to register it in META-INF/services.
>
> The assert I proposed to make in the ctor is only for "education purposes"
> -- apps need not register their Lucene45CodecExtension in services. We can
> document it, and assertions would help verify it.
>

I don't think we need anything here really: Facet45Codec shouldnt be
in META-INF.

Actually Facet45Codec is not needed at all. I dont understand why we
have a codec class here at all.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Making Lucene45Codec extensions have their own name

Posted by Shai Erera <se...@gmail.com>.
I don't think we should go that far. If you extend Lucene45Codec you
basically agree to the entire index format, but are given a chance to
control per-field postings and doc values. Otherwise, make your own Codec,
and then you'll need to register it in META-INF/services.

The assert I proposed to make in the ctor is only for "education purposes"
-- apps need not register their Lucene45CodecExtension in services. We can
document it, and assertions would help verify it.

Shai


On Fri, Aug 30, 2013 at 4:49 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> > On Fri, Aug 30, 2013 at 3:00 PM, Shai Erera <se...@gmail.com> wrote:
> > > The Codec itself may not be needed to be specified in
> > > META-INF/services, but the DVFormat it uses is.
> >
> > Correct, because otherwise Lucene couldn't read segments written with
> this
> > DV format.
> >
> > > So it's not like you can define a Codec today which does not list
> > > anything in 'services', unless your Codec just reuses one of the
> > > predefined DVF/PF listed under core/codecs. Is that right?
> >
> > This is correct, and I do that quite often (defining anonymous
> sub-classes of
> > Lucene45 without registering anything in META6INF/services).
> >
> > > It's confusing that these per-field things are used differently while
> > > indexing and reading. At indexing, the Codec decides what to return
> > > per-field, at search the Codec is more or less not used, cause the
> > > per-field formats are read from FI.attributes and initialized directly.
> >
> > I don't find it confusing. At reading time, it just does the right
> thing, ie. using
> > the format which has been used for writing on a per-segment basis. So you
> > could change your mind and decide all your doc values formats should be
> > disk-based although some of them were memory-based and it will just work.
> > Old segments will still use MemoryDocValuesFormat while the new ones
> > written with DiskDocValuesFormat will be disk-based. The index will be
> > progressively migrated as segments are merged.
>
> I think the most confusing part here is inconsistency between several
> parts: If everything in the Lucene Default Codec would use META-INF to
> lookup stuff by name, we would never ever need to have other instances of
> Lucene's main codec in META-INF (it would always be enough to have
> anonymous classes to change postings formats, DV formats,... on writing,
> IndexReader would always know how to read). But unfortunately not all parts
> are done by META-INF. If you want to plug in another stored fields format,
> you have to define a new codec, because StoredFieldsFormats have no name
> and are not written to the index file.
>
> I think we should change this and make all parts a codec offers to the
> outside implement NamedSPI, so it can be read from the index (not only the
> per-field stuff). Replacing a new Lucene main codec would then only be
> possible if you change the basic file format completely (like addig a
> replacement for compound files, new segment infos formats,...).
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

RE: Making Lucene45Codec extensions have their own name

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

> On Fri, Aug 30, 2013 at 3:00 PM, Shai Erera <se...@gmail.com> wrote:
> > The Codec itself may not be needed to be specified in
> > META-INF/services, but the DVFormat it uses is.
> 
> Correct, because otherwise Lucene couldn't read segments written with this
> DV format.
> 
> > So it's not like you can define a Codec today which does not list
> > anything in 'services', unless your Codec just reuses one of the
> > predefined DVF/PF listed under core/codecs. Is that right?
> 
> This is correct, and I do that quite often (defining anonymous sub-classes of
> Lucene45 without registering anything in META6INF/services).
> 
> > It's confusing that these per-field things are used differently while
> > indexing and reading. At indexing, the Codec decides what to return
> > per-field, at search the Codec is more or less not used, cause the
> > per-field formats are read from FI.attributes and initialized directly.
> 
> I don't find it confusing. At reading time, it just does the right thing, ie. using
> the format which has been used for writing on a per-segment basis. So you
> could change your mind and decide all your doc values formats should be
> disk-based although some of them were memory-based and it will just work.
> Old segments will still use MemoryDocValuesFormat while the new ones
> written with DiskDocValuesFormat will be disk-based. The index will be
> progressively migrated as segments are merged.

I think the most confusing part here is inconsistency between several parts: If everything in the Lucene Default Codec would use META-INF to lookup stuff by name, we would never ever need to have other instances of Lucene's main codec in META-INF (it would always be enough to have anonymous classes to change postings formats, DV formats,... on writing, IndexReader would always know how to read). But unfortunately not all parts are done by META-INF. If you want to plug in another stored fields format, you have to define a new codec, because StoredFieldsFormats have no name and are not written to the index file.

I think we should change this and make all parts a codec offers to the outside implement NamedSPI, so it can be read from the index (not only the per-field stuff). Replacing a new Lucene main codec would then only be possible if you change the basic file format completely (like addig a replacement for compound files, new segment infos formats,...).

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Making Lucene45Codec extensions have their own name

Posted by Adrien Grand <jp...@gmail.com>.
Hi,

On Fri, Aug 30, 2013 at 3:00 PM, Shai Erera <se...@gmail.com> wrote:
> The Codec itself may not be needed to be specified in META-INF/services, but
> the DVFormat it uses is.

Correct, because otherwise Lucene couldn't read segments written with
this DV format.

> So it's not like you can define a Codec today which
> does not list anything in 'services', unless your Codec just reuses one of
> the predefined DVF/PF listed under core/codecs. Is that right?

This is correct, and I do that quite often (defining anonymous
sub-classes of Lucene45 without registering anything in
META6INF/services).

> It's confusing that these per-field things are used differently while
> indexing and reading. At indexing, the Codec decides what to return
> per-field, at search the Codec is more or less not used, cause the per-field
> formats are read from FI.attributes and initialized directly.

I don't find it confusing. At reading time, it just does the right
thing, ie. using the format which has been used for writing on a
per-segment basis. So you could change your mind and decide all your
doc values formats should be disk-based although some of them were
memory-based and it will just work. Old segments will still use
MemoryDocValuesFormat while the new ones written with
DiskDocValuesFormat will be disk-based. The index will be
progressively migrated as segments are merged.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Making Lucene45Codec extensions have their own name

Posted by Shai Erera <se...@gmail.com>.
I had a brief chat with Adrien about it and I think that it doesn't make
sense at all to have a Lucene45CodecExtension registered in services. It's
completely useless because during search that extension isn't ever used.
Correct?

Maybe we want to go as far as completely preventing it by detecting in the
ctor that if NamedSPILoader is in the calling stack, it appears immediately
before the ctor, that is that there's no other Codec in between?

Shai


On Fri, Aug 30, 2013 at 4:00 PM, Shai Erera <se...@gmail.com> wrote:

> The Codec itself may not be needed to be specified in META-INF/services,
> but the DVFormat it uses is. So it's not like you can define a Codec today
> which does not list anything in 'services', unless your Codec just reuses
> one of the predefined DVF/PF listed under core/codecs. Is that right?
>
> Perhaps the Codec indeed does not need to be listed under services - I
> admit if it was the case, I may not have hit this weird situation and
> reported ... I'm still learning my way through Codecs, so what you say
> could be the solution and we don't need Lucene45Codecs to own their name.
>
> It's confusing that these per-field things are used differently while
> indexing and reading. At indexing, the Codec decides what to return
> per-field, at search the Codec is more or less not used, cause the
> per-field formats are read from FI.attributes and initialized directly.
>
> I can try to remove it from the Codecs service and see if anything
> breaks...
>
> Shai
>
>
> On Fri, Aug 30, 2013 at 3:50 PM, Adrien Grand <jp...@gmail.com> wrote:
>
>> Hi Shai,
>>
>> I think the issue is that Facet45Codec shouldn't be declared in
>> META-INF/services. This codec is exactly the same one as
>> Lucene45Codec, it only sets different defaults at writing time but
>> reading segments written with Lucene45 or Facet45 is exactly the same
>> for Lucene. We could set up different names but I prefer the way it is
>> today, since it allows for customizing the per-field formats for
>> postings lists and doc values without the burden of having to declare
>> a new codec in META-INF/services, it is even possible to use anonymous
>> classes.
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Re: Making Lucene45Codec extensions have their own name

Posted by Shai Erera <se...@gmail.com>.
The Codec itself may not be needed to be specified in META-INF/services,
but the DVFormat it uses is. So it's not like you can define a Codec today
which does not list anything in 'services', unless your Codec just reuses
one of the predefined DVF/PF listed under core/codecs. Is that right?

Perhaps the Codec indeed does not need to be listed under services - I
admit if it was the case, I may not have hit this weird situation and
reported ... I'm still learning my way through Codecs, so what you say
could be the solution and we don't need Lucene45Codecs to own their name.

It's confusing that these per-field things are used differently while
indexing and reading. At indexing, the Codec decides what to return
per-field, at search the Codec is more or less not used, cause the
per-field formats are read from FI.attributes and initialized directly.

I can try to remove it from the Codecs service and see if anything breaks...

Shai


On Fri, Aug 30, 2013 at 3:50 PM, Adrien Grand <jp...@gmail.com> wrote:

> Hi Shai,
>
> I think the issue is that Facet45Codec shouldn't be declared in
> META-INF/services. This codec is exactly the same one as
> Lucene45Codec, it only sets different defaults at writing time but
> reading segments written with Lucene45 or Facet45 is exactly the same
> for Lucene. We could set up different names but I prefer the way it is
> today, since it allows for customizing the per-field formats for
> postings lists and doc values without the burden of having to declare
> a new codec in META-INF/services, it is even possible to use anonymous
> classes.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Making Lucene45Codec extensions have their own name

Posted by Adrien Grand <jp...@gmail.com>.
Hi Thomas,

On Fri, Aug 30, 2013 at 2:57 PM, Thomas Matthijs <li...@selckin.be> wrote:
> Needs to be in services or you can't read indexes written with it afaik

This is correct in general, but in this particular case, Facet45Codec
is a sub-class of Lucene45Codec which just overrides the default doc
values formats to use in a compatible way with Lucene45Codec, so
Lucene45Codec will be able to read those segments although they have
been written with Facet45Codec.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Making Lucene45Codec extensions have their own name

Posted by Thomas Matthijs <li...@selckin.be>.
On Fri, Aug 30, 2013 at 2:50 PM, Adrien Grand <jp...@gmail.com> wrote:

> Hi Shai,
>
> I think the issue is that Facet45Codec shouldn't be declared in
> META-INF/services. This codec is exactly the same one as
> Lucene45Codec, it only sets different defaults at writing time but
> reading segments written with Lucene45 or Facet45 is exactly the same
> for Lucene. We could set up different names but I prefer the way it is
> today, since it allows for customizing the per-field formats for
> postings lists and doc values without the burden of having to declare
> a new codec in META-INF/services, it is even possible to use anonymous
> classes.
>


Needs to be in services or you can't read indexes written with it afaik

Re: Making Lucene45Codec extensions have their own name

Posted by Adrien Grand <jp...@gmail.com>.
Hi Shai,

I think the issue is that Facet45Codec shouldn't be declared in
META-INF/services. This codec is exactly the same one as
Lucene45Codec, it only sets different defaults at writing time but
reading segments written with Lucene45 or Facet45 is exactly the same
for Lucene. We could set up different names but I prefer the way it is
today, since it allows for customizing the per-field formats for
postings lists and doc values without the burden of having to declare
a new codec in META-INF/services, it is even possible to use anonymous
classes.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Making Lucene45Codec extensions have their own name

Posted by Shai Erera <se...@gmail.com>.
Another way to achieve that is make a Lucene45CodecBase (abstract) with a
ctor which takes 'name' argument, and the Lucene45Codec extends it for
setting the name, and Facet45Codec will extend it for setting its own name
and override getPerField(). We can even make Lucene45Codec final.

Shai


On Fri, Aug 30, 2013 at 2:42 PM, Shai Erera <se...@gmail.com> wrote:

> Hi
>
> While debugging a test failure (unrelated to the subject of this email), I
> noticed something strange. I set the Codec to Lucene45Codec, yet when the
> segment was read, it used Facet45Codec. Digging further I found this:
>
>    - All Lucene45Codec extensions share the same "Lucene45" name, there's
>    no way to override that.
>    - facet/META-INF was loaded before core/META-INF, so NamedSPILoader
>    set "Lucene45=Facet45Codec" and when later it discovered Lucene45Codec, it
>    didn't override the mapping (as it defines).
>
> While in this case there's no real harm done since all Facet45Codec does
> is apply special logic for the "facet" fields, and also during search time
> the DVFormat and PostingsFormat are loaded by PerField based on attributes
> it recorded in FieldInfo, I think it's better if Codecs can have unique
> names. It's definitely healthier (and safer) to know that the same Codec
> type that was used to write this segment is used to read it.
>
> I think this is very easy to solve, if we allow Lucene45Codec extensions
> "own" their name if they choose to. It's quite easy, by having two ctors on
> Lucene45Codec:
>
>    1. Default ctor (no args) which defaults the name to "Lucene45"
>    2. Protected ctor which takes a name as argument and can be called by
>    extensions' ctors. That way, Facet45Codec could do "super("Facet45")" and
>    own its name.
>
> What do you think? Do you see any problem this may cause?
>
> Shai
>