You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by John Wang <jo...@gmail.com> on 2013/06/10 20:17:19 UTC

lucene indexing and field configuration or schema

Hi folks:

    Solr has schemas that defined per field configuration for the entire
corpus, whereas Lucene determines the information from each individual
document. So on that level, it is inconsistent.

    We found having the schema information up front allows us flexibilities
in designing our posting list, also makes the indexingchain logic much
simpler.

    Just wanted to toss out this idea. It may have been discussed before,
but I was unable to google it.

Thanks

-John

Re: lucene indexing and field configuration or schema

Posted by John Wang <jo...@gmail.com>.
Hey Adrian:

Sorry about the late reply. I somehow missed your email.

We are doing some customizations with the Lucene indexing pipeline, here
are some examples we ran into that we used an external configuration file
to help us:

1) default payload size: we define this in our config file to avoid storing
a length per posting.
2) docvalue types: we have built updatable docvalue support for fix length
types, e.g. int, long etc., we store the type in the configuration file,
where as with lucene, a long would be used and could be wasteful for us.

Thanks

-John


On Tue, Jun 11, 2013 at 8:50 AM, Adrien Grand <jp...@gmail.com> wrote:

> Hi John,
>
> On Mon, Jun 10, 2013 at 8:17 PM, John Wang <jo...@gmail.com> wrote:
> >     We found having the schema information up front allows us
> flexibilities
> > in designing our posting list, also makes the indexingchain logic much
> > simpler.
>
> Can you give examples of the kind of decisions that you are able to
> make by having the schema up-front?
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: lucene indexing and field configuration or schema

Posted by Adrien Grand <jp...@gmail.com>.
Hi John,

On Mon, Jun 10, 2013 at 8:17 PM, John Wang <jo...@gmail.com> wrote:
>     We found having the schema information up front allows us flexibilities
> in designing our posting list, also makes the indexingchain logic much
> simpler.

Can you give examples of the kind of decisions that you are able to
make by having the schema up-front?

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org