You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Benson Margulies <bi...@gmail.com> on 2015/02/10 23:23:04 UTC

Is the Solr CodecFactory doc a bit off-kilter?

http://wiki.apache.org/solr/SimpleTextCodecExample

Why does it have:

<codecFactory name="CodecFactory" class="solr.SchemaCodecFactory" />

and then:

postingsFormat="SimpleText"

Shouldn't the postingFormat match the codec factory name? For that
matter, how much of this is obsolete? Is there better doc elsewhere or
does this need help?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Is the Solr CodecFactory doc a bit off-kilter?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 2/10/2015 7:23 PM, Jack Krupansky wrote:
> The Solr codec factory simply uses the Solr schema to fetch the desired
> postings format name., because Lucene knows nothing about the Solr
> schema. That postings format name is the name of a Lucene-level codec.
> 
> The name attribute of the <codecFactory> element is simply ignored. In
> fact, the example solrconfig <codecFactory> element simply describes the
> default codec factory for Solr and is unneeded - it's simply there for
> documentation, and in case some advanced expert user wanted to override it.

>From what I saw in the branch_5x code (this excerpt is from
SolrCore.java), if you don't explicitly choose a factory in your schema,
then you'll get a very simple anonymous factory class which simply
provides the default codec:

    } else {
      factory = new CodecFactory() {
        @Override
        public Codec getCodec() {
          return Codec.getDefault();
        }
      };
    }

This means that if you want postingsFormat and docValuesFormat to
actually work, you must explicity define the codecFactory in schema.xml
to use solr.SchemaCodecFactory, or possibly a custom class that extends
SchemaCodecFactory.

The code *immediately* after what I quoted above checks whether the
factory implements SolrCoreAware.  If the factory doesn't implement
SolrCoreAware, which would include the anonymous class in the code
above, any postingsFormat or docValuesFormat attributes in the schema
will result in an exception.

Completely unrelated, but interesting to me:  A custom CodecFactory
descended from SchemaCodecFactory could maybe be used to turn off stored
field compression for users in unusual situtations.  Any hints from
Lucene folks on how to modify the codec to disable compression would be
appreciated.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Is the Solr CodecFactory doc a bit off-kilter?

Posted by Jack Krupansky <ja...@gmail.com>.
The Solr codec factory simply uses the Solr schema to fetch the desired
postings format name., because Lucene knows nothing about the Solr schema.
That postings format name is the name of a Lucene-level codec.

The name attribute of the <codecFactory> element is simply ignored. In
fact, the example solrconfig <codecFactory> element simply describes the
default codec factory for Solr and is unneeded - it's simply there for
documentation, and in case some advanced expert user wanted to override it.

-- Jack Krupansky

On Tue, Feb 10, 2015 at 8:54 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 2/10/2015 3:23 PM, Benson Margulies wrote:
> > http://wiki.apache.org/solr/SimpleTextCodecExample
> >
> > Why does it have:
> >
> > <codecFactory name="CodecFactory" class="solr.SchemaCodecFactory" />
> >
> > and then:
> >
> > postingsFormat="SimpleText"
> >
> > Shouldn't the postingFormat match the codec factory name? For that
> > matter, how much of this is obsolete? Is there better doc elsewhere or
> > does this need help?
>
> If I am wrong about this, then I can plead ignorance, but from what I
> can see, I don't think that the codecFactory and postingsFormat should
> match.  A codecFactory seems to be designed to select and manipulate the
> Lucene codec that ultimately gets used for the index, while
> postingsFormat looks like it's only one small part of a Lucene codec
> definition -- and one of the things that can be manipulated by the
> codecFactory.
>
> The code in SchemaCodecFactory is very short and deceptively simple,
> leveraging a great deal of functionality from other classes.  If I'm
> reading it right, that class is required in order for the postingsFormat
> and docValuesFormat options in the schema to actually take effect.  If
> that codecFactory is not explicitly specified in the schema, those
> configs would probably be ignored.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Is the Solr CodecFactory doc a bit off-kilter?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 2/10/2015 3:23 PM, Benson Margulies wrote:
> http://wiki.apache.org/solr/SimpleTextCodecExample
>
> Why does it have:
>
> <codecFactory name="CodecFactory" class="solr.SchemaCodecFactory" />
>
> and then:
>
> postingsFormat="SimpleText"
>
> Shouldn't the postingFormat match the codec factory name? For that
> matter, how much of this is obsolete? Is there better doc elsewhere or
> does this need help?

If I am wrong about this, then I can plead ignorance, but from what I
can see, I don't think that the codecFactory and postingsFormat should
match.  A codecFactory seems to be designed to select and manipulate the
Lucene codec that ultimately gets used for the index, while
postingsFormat looks like it's only one small part of a Lucene codec
definition -- and one of the things that can be manipulated by the
codecFactory.

The code in SchemaCodecFactory is very short and deceptively simple,
leveraging a great deal of functionality from other classes.  If I'm
reading it right, that class is required in order for the postingsFormat
and docValuesFormat options in the schema to actually take effect.  If
that codecFactory is not explicitly specified in the schema, those
configs would probably be ignored.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org