You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by xavi jmlucjav <jm...@gmail.com> on 2015/05/30 15:18:24 UTC

any changes about limitations on huge number of fields lately?

Hi guys,

someone I work with has been advised that currently Solr can support
'infinite' number of fields.

I thought there was a practical limitation of say thousands of fields (for
sure less than a million), or    things can start to break (I think I
remember seeings memory issues reported on the mailing list by several
people).


Was there any change I missed lately that makes having say 1M fields in
Solr practical??

thanks

Re: any changes about limitations on huge number of fields lately?

Posted by xavi jmlucjav <jm...@gmail.com>.
On Sat, May 30, 2015 at 11:15 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> xavi jmlucjav <jm...@gmail.com> wrote:
> > I think the plan is to facet only on class_u1, class_u2 for queries from
> > user1, etc. So faceting would not happen on all fields on a single query.
>
> I understand that, but most of the created structures stays in memory
> between calls (DocValues helps here). Your heap will slowly fill up as more
> and more users perform faceted queries on their content.
>
got it...priceless info, thanks!


>
> - Toke Eskildsen
>

Re: any changes about limitations on huge number of fields lately?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
xavi jmlucjav <jm...@gmail.com> wrote:
> I think the plan is to facet only on class_u1, class_u2 for queries from
> user1, etc. So faceting would not happen on all fields on a single query.

I understand that, but most of the created structures stays in memory between calls (DocValues helps here). Your heap will slowly fill up as more and more users perform faceted queries on their content.

- Toke Eskildsen

Re: any changes about limitations on huge number of fields lately?

Posted by xavi jmlucjav <jm...@gmail.com>.
Thanks Toke for the input.

I think the plan is to facet only on class_u1, class_u2 for queries from
user1, etc. So faceting would not happen on all fields on a single query.
But still.

I did not design the schema, just found out about the number of fields and
advised again that, when they asked for a second opinion. We did not get to
discuss a different schema, but if we get to this point I will take that
plan into consideration for sure.

xavi

On Sat, May 30, 2015 at 10:17 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> xavi jmlucjav <jm...@gmail.com> wrote:
> > They reason for such a large number of fields:
> > - users create dynamically 'classes' of documents, say one user creates
> 10
> > classes on average
> > - for each 'class', the fields are created like this:
> "unique_id_"+fieldname
> > - there are potentially hundreds of thousands of users.
>
> Switch to a scheme where you control the names of fields outside of Solr,
> but share the fields internally:
>
> User 1 has 10 custom classes: u1_a, u1_b, u1_c, ... u1_j
> Internally they are mapped to class1, class2, class3, ... class10
>
> User 2 uses 2 classes: u2_horses, u2_elephants
> Internally they are mapped to class1, class2
>
> When User 2 queries field u2_horses, you rewrite the query to use class1
> instead.
>
> > There is faceting in each users' fields.
> > So this will result in >1M fields, very sparsely populated.
>
> If you are faceting on all of them and if you are not using DocValues,
> this will explode your memory requirements with vanilla Solr: UnInverted
> faceting maintains separate a map from all documentIDs to field values
> (ordinals for Strings) for _all_ the facet fields. Even if you only had 10
> million documents and even if your 1 million facet fields all had just 1
> value, represented by 1 bit, it would still require 10M * 1M * 1 bits in
> memory, which is 10 terabyte of RAM.
>
> - Toke Eskildsen
>

Re: any changes about limitations on huge number of fields lately?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
xavi jmlucjav <jm...@gmail.com> wrote:
> They reason for such a large number of fields:
> - users create dynamically 'classes' of documents, say one user creates 10
> classes on average
> - for each 'class', the fields are created like this: "unique_id_"+fieldname
> - there are potentially hundreds of thousands of users.

Switch to a scheme where you control the names of fields outside of Solr, but share the fields internally:

User 1 has 10 custom classes: u1_a, u1_b, u1_c, ... u1_j
Internally they are mapped to class1, class2, class3, ... class10

User 2 uses 2 classes: u2_horses, u2_elephants
Internally they are mapped to class1, class2

When User 2 queries field u2_horses, you rewrite the query to use class1 instead.

> There is faceting in each users' fields.
> So this will result in >1M fields, very sparsely populated.

If you are faceting on all of them and if you are not using DocValues, this will explode your memory requirements with vanilla Solr: UnInverted faceting maintains separate a map from all documentIDs to field values (ordinals for Strings) for _all_ the facet fields. Even if you only had 10 million documents and even if your 1 million facet fields all had just 1 value, represented by 1 bit, it would still require 10M * 1M * 1 bits in memory, which is 10 terabyte of RAM.

- Toke Eskildsen

Re: any changes about limitations on huge number of fields lately?

Posted by xavi jmlucjav <jm...@gmail.com>.
They reason for such a large number of fields:
- users create dynamically 'classes' of documents, say one user creates 10
classes on average
- for each 'class', the fields are created like this: "unique_id_"+fieldname
- there are potentially hundreds of thousands of users.

There is faceting in each users' fields.

So this will result in >1M fields, very sparsely populated. I warned them
this did not sound like a good design to me, but apparently someone very
knowledgeable in solr said this will work out fine. That is why I wanted to
double check...

On Sat, May 30, 2015 at 9:22 PM, Jack Krupansky <ja...@gmail.com>
wrote:

> Anything more than a few hundred seems very suspicious.
>
> Anything more than a few dozen or 50 or 75 seems suspicious as well.
>
> The point should not be how crazy can you get with Solr, but that craziness
> should be avoided altogether!
>
> Solr's design is optimal for a large number of relatively small documents,
> not large documents.
>
>
> -- Jack Krupansky
>
> On Sat, May 30, 2015 at 3:05 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > Nothing's really changed in that area lately. Your co-worker is
> > perhaps confusing the statement that "Solr has no a-priori limit on
> > the number of distinct fields that can be in a corpus" with supporting
> > an infinite number of fields. Not having a built-in limit is much
> > different than supporting....
> >
> > Whether Solr breaks with thousands and thousands of fields is pretty
> > dependent on what you _do_ with those fields. Simply doing keyword
> > searches isn't going to put the same memory pressure on as, say,
> > faceting on them all (even if in different queries).
> >
> > I'd really ask why so many fields are necessary though.
> >
> > Best,
> > Erick
> >
> > On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav <jm...@gmail.com>
> wrote:
> > > Hi guys,
> > >
> > > someone I work with has been advised that currently Solr can support
> > > 'infinite' number of fields.
> > >
> > > I thought there was a practical limitation of say thousands of fields
> > (for
> > > sure less than a million), or    things can start to break (I think I
> > > remember seeings memory issues reported on the mailing list by several
> > > people).
> > >
> > >
> > > Was there any change I missed lately that makes having say 1M fields in
> > > Solr practical??
> > >
> > > thanks
> >
>

Re: any changes about limitations on huge number of fields lately?

Posted by Jack Krupansky <ja...@gmail.com>.
Anything more than a few hundred seems very suspicious.

Anything more than a few dozen or 50 or 75 seems suspicious as well.

The point should not be how crazy can you get with Solr, but that craziness
should be avoided altogether!

Solr's design is optimal for a large number of relatively small documents,
not large documents.


-- Jack Krupansky

On Sat, May 30, 2015 at 3:05 PM, Erick Erickson <er...@gmail.com>
wrote:

> Nothing's really changed in that area lately. Your co-worker is
> perhaps confusing the statement that "Solr has no a-priori limit on
> the number of distinct fields that can be in a corpus" with supporting
> an infinite number of fields. Not having a built-in limit is much
> different than supporting....
>
> Whether Solr breaks with thousands and thousands of fields is pretty
> dependent on what you _do_ with those fields. Simply doing keyword
> searches isn't going to put the same memory pressure on as, say,
> faceting on them all (even if in different queries).
>
> I'd really ask why so many fields are necessary though.
>
> Best,
> Erick
>
> On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav <jm...@gmail.com> wrote:
> > Hi guys,
> >
> > someone I work with has been advised that currently Solr can support
> > 'infinite' number of fields.
> >
> > I thought there was a practical limitation of say thousands of fields
> (for
> > sure less than a million), or    things can start to break (I think I
> > remember seeings memory issues reported on the mailing list by several
> > people).
> >
> >
> > Was there any change I missed lately that makes having say 1M fields in
> > Solr practical??
> >
> > thanks
>

Re: any changes about limitations on huge number of fields lately?

Posted by Erick Erickson <er...@gmail.com>.
Nothing's really changed in that area lately. Your co-worker is
perhaps confusing the statement that "Solr has no a-priori limit on
the number of distinct fields that can be in a corpus" with supporting
an infinite number of fields. Not having a built-in limit is much
different than supporting....

Whether Solr breaks with thousands and thousands of fields is pretty
dependent on what you _do_ with those fields. Simply doing keyword
searches isn't going to put the same memory pressure on as, say,
faceting on them all (even if in different queries).

I'd really ask why so many fields are necessary though.

Best,
Erick

On Sat, May 30, 2015 at 6:18 AM, xavi jmlucjav <jm...@gmail.com> wrote:
> Hi guys,
>
> someone I work with has been advised that currently Solr can support
> 'infinite' number of fields.
>
> I thought there was a practical limitation of say thousands of fields (for
> sure less than a million), or    things can start to break (I think I
> remember seeings memory issues reported on the mailing list by several
> people).
>
>
> Was there any change I missed lately that makes having say 1M fields in
> Solr practical??
>
> thanks