You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ken Krugler <kk...@transpac.com> on 2014/04/01 01:31:02 UTC

Re: Enabling other SimpleText formats besides postings

Hi Erik (& Shawn),

On Mar 31, 2014, at 1:48pm, Shawn Heisey <so...@elyograg.org> wrote:

> On 3/31/2014 2:36 PM, Erik Hatcher wrote:
>> Not currently possible.  Solr’s SchemaCodecFactory only has a hook for postings format (and doc values format).

OK, thanks for confirming.

> Would it be a reasonable thing to develop a config structure (probably in schema.xml) that starts with something like <codec name="foo"> and has ways to specify the class and related configuration for each of the components in the codec? Then you could specify codec="foo" on an individual field definition.  The codec definition could allow one of them to have default="true".
> 
> I will admit that my understanding of these Lucene-level details is low, so I could be thinking about this wrong.

The absolute easiest approach would be to support a new init value for codecFactory, which SchemaCodecFactory would use to select a different base codec class to use (versus always using Lucene<version>Codec). That would switch everything to a different codec.

Or you could extend the SchemaCodecFactory to support additional per-field settings for stored fields format, etc beyond what's currently available.

For my quick & dirty hack I've specified a different codecFactory in solrconfig.xml, and have my own factory that hard-codes the SimpleTextCodec.

This works - all files are in the SimpleTextXXX format, other than the segments.gen and segments_XX files; what, those aren't pluggable?!?! :)

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr