You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sheng <sh...@gmail.com> on 2016/07/06 14:31:27 UTC

dv field is too large

Hi,

I am getting an IAE indicating one of the SortedDocValueField is too large,
> 32k

I googled a bit, and it seems like #Lucene-4583 has addressed this issue in
4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
misunderstand anything ?

Thanks,

Re: dv field is too large

Posted by Michael McCandless <lu...@mikemccandless.com>.
Yes, or you could get the utf8 bytes yourself client side and check that
length.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 6, 2016 at 6:16 PM, Sheng <sh...@gmail.com> wrote:

> Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of
> characters a payload string can carry?
>
> On Wednesday, July 6, 2016, Michael McCandless <lu...@mikemccandless.com>
> wrote:
>
> > Maybe you could simply truncate the user-supplied values at 32 KB?
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Wed, Jul 6, 2016 at 5:55 PM, Sheng <shengcer@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi Eric,
> > >
> > > I am refactoring a legacy system. One of the most annoying things is I
> > have
> > > to keep the old feature even though it makes little sense. In this
> case,
> > we
> > > have to index a particular data structure which has bunch of fields and
> > > each of them is promised to be searchable and search-sortable to the
> > user.
> > > Turns out one field is notoriously large. I think the old
> implementation
> > > uses some quite clumsy way to make it happen. But since we decide to
> > > refactor the system with all the goodies from Lucene, we want to do the
> > > sorting right, and here we are at this issue... :-(
> > >
> > > On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > Is this an "XY" problem? Meaning, why do you need DV fields larger
> than
> > > > 32K?
> > > >
> > > > You can't search it as text as it's not tokenized. Faceting and
> sorting
> > > by
> > > > a 32K
> > > > field doesn't seem very useful. You may have a perfectly valid
> reason,
> > > but
> > > > it's
> > > > not obvious what use-case you're serving from this thread so far....
> > > >
> > > > Nobody has yet put forth a compelling use-case for such large fields,
> > > > perhaps
> > > > this would be one.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com
> > <javascript:;>
> > > <javascript:;>>
> > > > wrote:
> > > > > Mike - Thanks for the prompt response. Is there a way to bypass
> this
> > > > > constraint for SortedDocValueField ? Or we have to live with it,
> > > meaning
> > > > no
> > > > > fix even in future release?
> > > > >
> > > > > On Wednesday, July 6, 2016, Michael McCandless <
> > > > lucene@mikemccandless.com <javascript:;> <javascript:;>>
> > > > > wrote:
> > > > >
> > > > >> I believe only binary DVs can be larger than 32K bytes.
> > > > >>
> > > > >> Mike McCandless
> > > > >>
> > > > >> http://blog.mikemccandless.com
> > > > >>
> > > > >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
> > <javascript:;>
> > > > <javascript:;> <javascript:;>>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi,
> > > > >> >
> > > > >> > I am getting an IAE indicating one of the SortedDocValueField is
> > too
> > > > >> large,
> > > > >> > > 32k
> > > > >> >
> > > > >> > I googled a bit, and it seems like #Lucene-4583 has addressed
> this
> > > > issue
> > > > >> in
> > > > >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
> > > > >> > misunderstand anything ?
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > > > <javascript:;>
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> > > > <javascript:;>
> > > >
> > > >
> > >
> >
>

Re: dv field is too large

Posted by Sheng <sh...@gmail.com>.
Is 32k / MAX_UTF8_BYTES_PER_CHAR an accurate limit for the number of
characters a payload string can carry?

On Wednesday, July 6, 2016, Michael McCandless <lu...@mikemccandless.com>
wrote:

> Maybe you could simply truncate the user-supplied values at 32 KB?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jul 6, 2016 at 5:55 PM, Sheng <shengcer@gmail.com <javascript:;>>
> wrote:
>
> > Hi Eric,
> >
> > I am refactoring a legacy system. One of the most annoying things is I
> have
> > to keep the old feature even though it makes little sense. In this case,
> we
> > have to index a particular data structure which has bunch of fields and
> > each of them is promised to be searchable and search-sortable to the
> user.
> > Turns out one field is notoriously large. I think the old implementation
> > uses some quite clumsy way to make it happen. But since we decide to
> > refactor the system with all the goodies from Lucene, we want to do the
> > sorting right, and here we are at this issue... :-(
> >
> > On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Is this an "XY" problem? Meaning, why do you need DV fields larger than
> > > 32K?
> > >
> > > You can't search it as text as it's not tokenized. Faceting and sorting
> > by
> > > a 32K
> > > field doesn't seem very useful. You may have a perfectly valid reason,
> > but
> > > it's
> > > not obvious what use-case you're serving from this thread so far....
> > >
> > > Nobody has yet put forth a compelling use-case for such large fields,
> > > perhaps
> > > this would be one.
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com
> <javascript:;>
> > <javascript:;>>
> > > wrote:
> > > > Mike - Thanks for the prompt response. Is there a way to bypass this
> > > > constraint for SortedDocValueField ? Or we have to live with it,
> > meaning
> > > no
> > > > fix even in future release?
> > > >
> > > > On Wednesday, July 6, 2016, Michael McCandless <
> > > lucene@mikemccandless.com <javascript:;> <javascript:;>>
> > > > wrote:
> > > >
> > > >> I believe only binary DVs can be larger than 32K bytes.
> > > >>
> > > >> Mike McCandless
> > > >>
> > > >> http://blog.mikemccandless.com
> > > >>
> > > >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
> <javascript:;>
> > > <javascript:;> <javascript:;>>
> > > >> wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > I am getting an IAE indicating one of the SortedDocValueField is
> too
> > > >> large,
> > > >> > > 32k
> > > >> >
> > > >> > I googled a bit, and it seems like #Lucene-4583 has addressed this
> > > issue
> > > >> in
> > > >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
> > > >> > misunderstand anything ?
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> > > <javascript:;>
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
> > > <javascript:;>
> > >
> > >
> >
>

Re: dv field is too large

Posted by Michael McCandless <lu...@mikemccandless.com>.
Maybe you could simply truncate the user-supplied values at 32 KB?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 6, 2016 at 5:55 PM, Sheng <sh...@gmail.com> wrote:

> Hi Eric,
>
> I am refactoring a legacy system. One of the most annoying things is I have
> to keep the old feature even though it makes little sense. In this case, we
> have to index a particular data structure which has bunch of fields and
> each of them is promised to be searchable and search-sortable to the user.
> Turns out one field is notoriously large. I think the old implementation
> uses some quite clumsy way to make it happen. But since we decide to
> refactor the system with all the goodies from Lucene, we want to do the
> sorting right, and here we are at this issue... :-(
>
> On Wednesday, July 6, 2016, Erick Erickson <er...@gmail.com>
> wrote:
>
> > Is this an "XY" problem? Meaning, why do you need DV fields larger than
> > 32K?
> >
> > You can't search it as text as it's not tokenized. Faceting and sorting
> by
> > a 32K
> > field doesn't seem very useful. You may have a perfectly valid reason,
> but
> > it's
> > not obvious what use-case you're serving from this thread so far....
> >
> > Nobody has yet put forth a compelling use-case for such large fields,
> > perhaps
> > this would be one.
> >
> > Best,
> > Erick
> >
> > On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com
> <javascript:;>>
> > wrote:
> > > Mike - Thanks for the prompt response. Is there a way to bypass this
> > > constraint for SortedDocValueField ? Or we have to live with it,
> meaning
> > no
> > > fix even in future release?
> > >
> > > On Wednesday, July 6, 2016, Michael McCandless <
> > lucene@mikemccandless.com <javascript:;>>
> > > wrote:
> > >
> > >> I believe only binary DVs can be larger than 32K bytes.
> > >>
> > >> Mike McCandless
> > >>
> > >> http://blog.mikemccandless.com
> > >>
> > >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
> > <javascript:;> <javascript:;>>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I am getting an IAE indicating one of the SortedDocValueField is too
> > >> large,
> > >> > > 32k
> > >> >
> > >> > I googled a bit, and it seems like #Lucene-4583 has addressed this
> > issue
> > >> in
> > >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
> > >> > misunderstand anything ?
> > >> >
> > >> > Thanks,
> > >> >
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> >
> >
>

Re: dv field is too large

Posted by Michael McCandless <lu...@mikemccandless.com>.
I agree, I'll improve the docs about this limit.  Thanks Sheng.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 6, 2016 at 10:59 PM, Sheng <sh...@gmail.com> wrote:

> I agree. That said, wouldn't it also make sense to clearly point it out by
> adding the comments to the corresponding classes. This is not the first
> time I am running into this "magic number" pitfall when using Lucene
> (e.g., 1024
> limit for the token length in early version of Lucene). Generally speaking,
> the documentation is pretty good and helpful. But without documenting
> subtle issues like this, they may only manifest themselves in production
> when the real data come in and they are "big".
>
> On Wednesday, July 6, 2016, Erick Erickson <er...@gmail.com>
> wrote:
>
> > Well, if you must sort on a 32K single value (although I think this is
> > extremely silly, _nobody_ will notice that two docs are out of order
> > because they were identical up until the 30,000th character but the
> > 30,001st character isn't sorted correctly), do as Mike suggests and
> > chop it off before sending it to Lucene.
> >
> > Best,
> > Erick
> >
> > On Wed, Jul 6, 2016 at 3:53 PM, Sheng <shengcer@gmail.com
> <javascript:;>>
> > wrote:
> > > You misunderstand. I have many fields, and unfortunately a few of them
> > are
> > > quite big, i.e. exceeding the 32k limit. In order to make these "big"
> > > fields sortable, they have to be stored as SortedDocValueField. Or that
> > is
> > > wrong, one can actually sort the search result by a "big" field without
> > > indexing it to a SortedDocValueField. Suggestion ?
> > >
> > > On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> > <javascript:;>> wrote:
> > >
> > >> bq: In this case, we
> > >> have to index a particular data structure which has bunch of fields
> and
> > >> each of them is promised to be searchable and search-sortable to the
> > user
> > >>
> > >> If I'm reading this right, you have some structure. You say
> > >> "each of them is promised to be searchable and search-sortable"
> > >>
> > >> It _sounds_ like what you want to do is break these fields out
> > >> into separate fields each of which is searchable and sortable
> > >> independently. But from what you've described, putting the entire
> > >> thing into a single DV field isn't useful.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>
> > >>
> > >> On Wed, Jul 6, 2016 at 3:10 PM, Sheng <shengcer@gmail.com
> > <javascript:;> <javascript:;>>
> > >> wrote:
> > >> > To be clear, the "field" is indeed tokenized, which is accompanied
> > with a
> > >> > SortedDocValueField so that it is sortable too. Am I making the
> wrong
> > >> > assumption here ?
> > >> >
> > >> > On Wednesday, July 6, 2016, Sheng <shengcer@gmail.com
> <javascript:;>
> > <javascript:;>>
> > >> wrote:
> > >> >
> > >> >> Hi Eric,
> > >> >>
> > >> >> I am refactoring a legacy system. One of the most annoying things
> is
> > I
> > >> >> have to keep the old feature even though it makes little sense. In
> > this
> > >> >> case, we have to index a particular data structure which has bunch
> of
> > >> >> fields and each of them is promised to be searchable and
> > >> search-sortable to
> > >> >> the user. Turns out one field is notoriously large. I think the old
> > >> >> implementation uses some quite clumsy way to make it happen. But
> > since
> > >> we
> > >> >> decide to refactor the system with all the goodies from Lucene, we
> > want
> > >> to
> > >> >> do the sorting right, and here we are at this issue... :-(
> > >> >>
> > >> >> On Wednesday, July 6, 2016, Erick Erickson <
> erickerickson@gmail.com
> > <javascript:;>
> > >> <javascript:;>
> > >> >> <javascript:_e(%7B%7D,'cvml','erickerickson@gmail.com
> <javascript:;>
> > <javascript:;>');>>
> > >> wrote:
> > >> >>
> > >> >>> Is this an "XY" problem? Meaning, why do you need DV fields larger
> > than
> > >> >>> 32K?
> > >> >>>
> > >> >>> You can't search it as text as it's not tokenized. Faceting and
> > sorting
> > >> >>> by a 32K
> > >> >>> field doesn't seem very useful. You may have a perfectly valid
> > reason,
> > >> >>> but it's
> > >> >>> not obvious what use-case you're serving from this thread so
> far....
> > >> >>>
> > >> >>> Nobody has yet put forth a compelling use-case for such large
> > fields,
> > >> >>> perhaps
> > >> >>> this would be one.
> > >> >>>
> > >> >>> Best,
> > >> >>> Erick
> > >> >>>
> > >> >>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com
> > <javascript:;>
> > >> <javascript:;>> wrote:
> > >> >>> > Mike - Thanks for the prompt response. Is there a way to bypass
> > this
> > >> >>> > constraint for SortedDocValueField ? Or we have to live with it,
> > >> >>> meaning no
> > >> >>> > fix even in future release?
> > >> >>> >
> > >> >>> > On Wednesday, July 6, 2016, Michael McCandless <
> > >> >>> lucene@mikemccandless.com <javascript:;> <javascript:;>>
> > >> >>> > wrote:
> > >> >>> >
> > >> >>> >> I believe only binary DVs can be larger than 32K bytes.
> > >> >>> >>
> > >> >>> >> Mike McCandless
> > >> >>> >>
> > >> >>> >> http://blog.mikemccandless.com
> > >> >>> >>
> > >> >>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
> > <javascript:;>
> > >> <javascript:;>
> > >> >>> <javascript:;>>
> > >> >>> >> wrote:
> > >> >>> >>
> > >> >>> >> > Hi,
> > >> >>> >> >
> > >> >>> >> > I am getting an IAE indicating one of the SortedDocValueField
> > is
> > >> too
> > >> >>> >> large,
> > >> >>> >> > > 32k
> > >> >>> >> >
> > >> >>> >> > I googled a bit, and it seems like #Lucene-4583 has addressed
> > this
> > >> >>> issue
> > >> >>> >> in
> > >> >>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss
> > or
> > >> >>> >> > misunderstand anything ?
> > >> >>> >> >
> > >> >>> >> > Thanks,
> > >> >>> >> >
> > >> >>> >>
> > >> >>>
> > >> >>>
> > ---------------------------------------------------------------------
> > >> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > >> <javascript:;>
> > >> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> > >> <javascript:;>
> > >> >>>
> > >> >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > >> <javascript:;>
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> > >> <javascript:;>
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> >
> >
>

Re: dv field is too large

Posted by Sheng <sh...@gmail.com>.
I agree. That said, wouldn't it also make sense to clearly point it out by
adding the comments to the corresponding classes. This is not the first
time I am running into this "magic number" pitfall when using Lucene
(e.g., 1024
limit for the token length in early version of Lucene). Generally speaking,
the documentation is pretty good and helpful. But without documenting
subtle issues like this, they may only manifest themselves in production
when the real data come in and they are "big".

On Wednesday, July 6, 2016, Erick Erickson <er...@gmail.com> wrote:

> Well, if you must sort on a 32K single value (although I think this is
> extremely silly, _nobody_ will notice that two docs are out of order
> because they were identical up until the 30,000th character but the
> 30,001st character isn't sorted correctly), do as Mike suggests and
> chop it off before sending it to Lucene.
>
> Best,
> Erick
>
> On Wed, Jul 6, 2016 at 3:53 PM, Sheng <shengcer@gmail.com <javascript:;>>
> wrote:
> > You misunderstand. I have many fields, and unfortunately a few of them
> are
> > quite big, i.e. exceeding the 32k limit. In order to make these "big"
> > fields sortable, they have to be stored as SortedDocValueField. Or that
> is
> > wrong, one can actually sort the search result by a "big" field without
> > indexing it to a SortedDocValueField. Suggestion ?
> >
> > On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> <javascript:;>> wrote:
> >
> >> bq: In this case, we
> >> have to index a particular data structure which has bunch of fields and
> >> each of them is promised to be searchable and search-sortable to the
> user
> >>
> >> If I'm reading this right, you have some structure. You say
> >> "each of them is promised to be searchable and search-sortable"
> >>
> >> It _sounds_ like what you want to do is break these fields out
> >> into separate fields each of which is searchable and sortable
> >> independently. But from what you've described, putting the entire
> >> thing into a single DV field isn't useful.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >> On Wed, Jul 6, 2016 at 3:10 PM, Sheng <shengcer@gmail.com
> <javascript:;> <javascript:;>>
> >> wrote:
> >> > To be clear, the "field" is indeed tokenized, which is accompanied
> with a
> >> > SortedDocValueField so that it is sortable too. Am I making the wrong
> >> > assumption here ?
> >> >
> >> > On Wednesday, July 6, 2016, Sheng <shengcer@gmail.com <javascript:;>
> <javascript:;>>
> >> wrote:
> >> >
> >> >> Hi Eric,
> >> >>
> >> >> I am refactoring a legacy system. One of the most annoying things is
> I
> >> >> have to keep the old feature even though it makes little sense. In
> this
> >> >> case, we have to index a particular data structure which has bunch of
> >> >> fields and each of them is promised to be searchable and
> >> search-sortable to
> >> >> the user. Turns out one field is notoriously large. I think the old
> >> >> implementation uses some quite clumsy way to make it happen. But
> since
> >> we
> >> >> decide to refactor the system with all the goodies from Lucene, we
> want
> >> to
> >> >> do the sorting right, and here we are at this issue... :-(
> >> >>
> >> >> On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> <javascript:;>
> >> <javascript:;>
> >> >> <javascript:_e(%7B%7D,'cvml','erickerickson@gmail.com <javascript:;>
> <javascript:;>');>>
> >> wrote:
> >> >>
> >> >>> Is this an "XY" problem? Meaning, why do you need DV fields larger
> than
> >> >>> 32K?
> >> >>>
> >> >>> You can't search it as text as it's not tokenized. Faceting and
> sorting
> >> >>> by a 32K
> >> >>> field doesn't seem very useful. You may have a perfectly valid
> reason,
> >> >>> but it's
> >> >>> not obvious what use-case you're serving from this thread so far....
> >> >>>
> >> >>> Nobody has yet put forth a compelling use-case for such large
> fields,
> >> >>> perhaps
> >> >>> this would be one.
> >> >>>
> >> >>> Best,
> >> >>> Erick
> >> >>>
> >> >>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com
> <javascript:;>
> >> <javascript:;>> wrote:
> >> >>> > Mike - Thanks for the prompt response. Is there a way to bypass
> this
> >> >>> > constraint for SortedDocValueField ? Or we have to live with it,
> >> >>> meaning no
> >> >>> > fix even in future release?
> >> >>> >
> >> >>> > On Wednesday, July 6, 2016, Michael McCandless <
> >> >>> lucene@mikemccandless.com <javascript:;> <javascript:;>>
> >> >>> > wrote:
> >> >>> >
> >> >>> >> I believe only binary DVs can be larger than 32K bytes.
> >> >>> >>
> >> >>> >> Mike McCandless
> >> >>> >>
> >> >>> >> http://blog.mikemccandless.com
> >> >>> >>
> >> >>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
> <javascript:;>
> >> <javascript:;>
> >> >>> <javascript:;>>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >> > Hi,
> >> >>> >> >
> >> >>> >> > I am getting an IAE indicating one of the SortedDocValueField
> is
> >> too
> >> >>> >> large,
> >> >>> >> > > 32k
> >> >>> >> >
> >> >>> >> > I googled a bit, and it seems like #Lucene-4583 has addressed
> this
> >> >>> issue
> >> >>> >> in
> >> >>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss
> or
> >> >>> >> > misunderstand anything ?
> >> >>> >> >
> >> >>> >> > Thanks,
> >> >>> >> >
> >> >>> >>
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> >> <javascript:;>
> >> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
> >> <javascript:;>
> >> >>>
> >> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> >> <javascript:;>
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
> >> <javascript:;>
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
>
>

Re: dv field is too large

Posted by Erick Erickson <er...@gmail.com>.
Well, if you must sort on a 32K single value (although I think this is
extremely silly, _nobody_ will notice that two docs are out of order
because they were identical up until the 30,000th character but the
30,001st character isn't sorted correctly), do as Mike suggests and
chop it off before sending it to Lucene.

Best,
Erick

On Wed, Jul 6, 2016 at 3:53 PM, Sheng <sh...@gmail.com> wrote:
> You misunderstand. I have many fields, and unfortunately a few of them are
> quite big, i.e. exceeding the 32k limit. In order to make these "big"
> fields sortable, they have to be stored as SortedDocValueField. Or that is
> wrong, one can actually sort the search result by a "big" field without
> indexing it to a SortedDocValueField. Suggestion ?
>
> On Wednesday, July 6, 2016, Erick Erickson <er...@gmail.com> wrote:
>
>> bq: In this case, we
>> have to index a particular data structure which has bunch of fields and
>> each of them is promised to be searchable and search-sortable to the user
>>
>> If I'm reading this right, you have some structure. You say
>> "each of them is promised to be searchable and search-sortable"
>>
>> It _sounds_ like what you want to do is break these fields out
>> into separate fields each of which is searchable and sortable
>> independently. But from what you've described, putting the entire
>> thing into a single DV field isn't useful.
>>
>> Best,
>> Erick
>>
>>
>>
>> On Wed, Jul 6, 2016 at 3:10 PM, Sheng <shengcer@gmail.com <javascript:;>>
>> wrote:
>> > To be clear, the "field" is indeed tokenized, which is accompanied with a
>> > SortedDocValueField so that it is sortable too. Am I making the wrong
>> > assumption here ?
>> >
>> > On Wednesday, July 6, 2016, Sheng <shengcer@gmail.com <javascript:;>>
>> wrote:
>> >
>> >> Hi Eric,
>> >>
>> >> I am refactoring a legacy system. One of the most annoying things is I
>> >> have to keep the old feature even though it makes little sense. In this
>> >> case, we have to index a particular data structure which has bunch of
>> >> fields and each of them is promised to be searchable and
>> search-sortable to
>> >> the user. Turns out one field is notoriously large. I think the old
>> >> implementation uses some quite clumsy way to make it happen. But since
>> we
>> >> decide to refactor the system with all the goodies from Lucene, we want
>> to
>> >> do the sorting right, and here we are at this issue... :-(
>> >>
>> >> On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
>> <javascript:;>
>> >> <javascript:_e(%7B%7D,'cvml','erickerickson@gmail.com <javascript:;>');>>
>> wrote:
>> >>
>> >>> Is this an "XY" problem? Meaning, why do you need DV fields larger than
>> >>> 32K?
>> >>>
>> >>> You can't search it as text as it's not tokenized. Faceting and sorting
>> >>> by a 32K
>> >>> field doesn't seem very useful. You may have a perfectly valid reason,
>> >>> but it's
>> >>> not obvious what use-case you're serving from this thread so far....
>> >>>
>> >>> Nobody has yet put forth a compelling use-case for such large fields,
>> >>> perhaps
>> >>> this would be one.
>> >>>
>> >>> Best,
>> >>> Erick
>> >>>
>> >>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com
>> <javascript:;>> wrote:
>> >>> > Mike - Thanks for the prompt response. Is there a way to bypass this
>> >>> > constraint for SortedDocValueField ? Or we have to live with it,
>> >>> meaning no
>> >>> > fix even in future release?
>> >>> >
>> >>> > On Wednesday, July 6, 2016, Michael McCandless <
>> >>> lucene@mikemccandless.com <javascript:;>>
>> >>> > wrote:
>> >>> >
>> >>> >> I believe only binary DVs can be larger than 32K bytes.
>> >>> >>
>> >>> >> Mike McCandless
>> >>> >>
>> >>> >> http://blog.mikemccandless.com
>> >>> >>
>> >>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
>> <javascript:;>
>> >>> <javascript:;>>
>> >>> >> wrote:
>> >>> >>
>> >>> >> > Hi,
>> >>> >> >
>> >>> >> > I am getting an IAE indicating one of the SortedDocValueField is
>> too
>> >>> >> large,
>> >>> >> > > 32k
>> >>> >> >
>> >>> >> > I googled a bit, and it seems like #Lucene-4583 has addressed this
>> >>> issue
>> >>> >> in
>> >>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
>> >>> >> > misunderstand anything ?
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> >
>> >>> >>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> <javascript:;>
>> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> <javascript:;>
>> >>>
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> <javascript:;>
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> <javascript:;>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: dv field is too large

Posted by Sheng <sh...@gmail.com>.
You misunderstand. I have many fields, and unfortunately a few of them are
quite big, i.e. exceeding the 32k limit. In order to make these "big"
fields sortable, they have to be stored as SortedDocValueField. Or that is
wrong, one can actually sort the search result by a "big" field without
indexing it to a SortedDocValueField. Suggestion ?

On Wednesday, July 6, 2016, Erick Erickson <er...@gmail.com> wrote:

> bq: In this case, we
> have to index a particular data structure which has bunch of fields and
> each of them is promised to be searchable and search-sortable to the user
>
> If I'm reading this right, you have some structure. You say
> "each of them is promised to be searchable and search-sortable"
>
> It _sounds_ like what you want to do is break these fields out
> into separate fields each of which is searchable and sortable
> independently. But from what you've described, putting the entire
> thing into a single DV field isn't useful.
>
> Best,
> Erick
>
>
>
> On Wed, Jul 6, 2016 at 3:10 PM, Sheng <shengcer@gmail.com <javascript:;>>
> wrote:
> > To be clear, the "field" is indeed tokenized, which is accompanied with a
> > SortedDocValueField so that it is sortable too. Am I making the wrong
> > assumption here ?
> >
> > On Wednesday, July 6, 2016, Sheng <shengcer@gmail.com <javascript:;>>
> wrote:
> >
> >> Hi Eric,
> >>
> >> I am refactoring a legacy system. One of the most annoying things is I
> >> have to keep the old feature even though it makes little sense. In this
> >> case, we have to index a particular data structure which has bunch of
> >> fields and each of them is promised to be searchable and
> search-sortable to
> >> the user. Turns out one field is notoriously large. I think the old
> >> implementation uses some quite clumsy way to make it happen. But since
> we
> >> decide to refactor the system with all the goodies from Lucene, we want
> to
> >> do the sorting right, and here we are at this issue... :-(
> >>
> >> On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> <javascript:;>
> >> <javascript:_e(%7B%7D,'cvml','erickerickson@gmail.com <javascript:;>');>>
> wrote:
> >>
> >>> Is this an "XY" problem? Meaning, why do you need DV fields larger than
> >>> 32K?
> >>>
> >>> You can't search it as text as it's not tokenized. Faceting and sorting
> >>> by a 32K
> >>> field doesn't seem very useful. You may have a perfectly valid reason,
> >>> but it's
> >>> not obvious what use-case you're serving from this thread so far....
> >>>
> >>> Nobody has yet put forth a compelling use-case for such large fields,
> >>> perhaps
> >>> this would be one.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com
> <javascript:;>> wrote:
> >>> > Mike - Thanks for the prompt response. Is there a way to bypass this
> >>> > constraint for SortedDocValueField ? Or we have to live with it,
> >>> meaning no
> >>> > fix even in future release?
> >>> >
> >>> > On Wednesday, July 6, 2016, Michael McCandless <
> >>> lucene@mikemccandless.com <javascript:;>>
> >>> > wrote:
> >>> >
> >>> >> I believe only binary DVs can be larger than 32K bytes.
> >>> >>
> >>> >> Mike McCandless
> >>> >>
> >>> >> http://blog.mikemccandless.com
> >>> >>
> >>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
> <javascript:;>
> >>> <javascript:;>>
> >>> >> wrote:
> >>> >>
> >>> >> > Hi,
> >>> >> >
> >>> >> > I am getting an IAE indicating one of the SortedDocValueField is
> too
> >>> >> large,
> >>> >> > > 32k
> >>> >> >
> >>> >> > I googled a bit, and it seems like #Lucene-4583 has addressed this
> >>> issue
> >>> >> in
> >>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
> >>> >> > misunderstand anything ?
> >>> >> >
> >>> >> > Thanks,
> >>> >> >
> >>> >>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
> >>>
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
>
>

Re: dv field is too large

Posted by Erick Erickson <er...@gmail.com>.
bq: In this case, we
have to index a particular data structure which has bunch of fields and
each of them is promised to be searchable and search-sortable to the user

If I'm reading this right, you have some structure. You say
"each of them is promised to be searchable and search-sortable"

It _sounds_ like what you want to do is break these fields out
into separate fields each of which is searchable and sortable
independently. But from what you've described, putting the entire
thing into a single DV field isn't useful.

Best,
Erick



On Wed, Jul 6, 2016 at 3:10 PM, Sheng <sh...@gmail.com> wrote:
> To be clear, the "field" is indeed tokenized, which is accompanied with a
> SortedDocValueField so that it is sortable too. Am I making the wrong
> assumption here ?
>
> On Wednesday, July 6, 2016, Sheng <sh...@gmail.com> wrote:
>
>> Hi Eric,
>>
>> I am refactoring a legacy system. One of the most annoying things is I
>> have to keep the old feature even though it makes little sense. In this
>> case, we have to index a particular data structure which has bunch of
>> fields and each of them is promised to be searchable and search-sortable to
>> the user. Turns out one field is notoriously large. I think the old
>> implementation uses some quite clumsy way to make it happen. But since we
>> decide to refactor the system with all the goodies from Lucene, we want to
>> do the sorting right, and here we are at this issue... :-(
>>
>> On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
>> <javascript:_e(%7B%7D,'cvml','erickerickson@gmail.com');>> wrote:
>>
>>> Is this an "XY" problem? Meaning, why do you need DV fields larger than
>>> 32K?
>>>
>>> You can't search it as text as it's not tokenized. Faceting and sorting
>>> by a 32K
>>> field doesn't seem very useful. You may have a perfectly valid reason,
>>> but it's
>>> not obvious what use-case you're serving from this thread so far....
>>>
>>> Nobody has yet put forth a compelling use-case for such large fields,
>>> perhaps
>>> this would be one.
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <sh...@gmail.com> wrote:
>>> > Mike - Thanks for the prompt response. Is there a way to bypass this
>>> > constraint for SortedDocValueField ? Or we have to live with it,
>>> meaning no
>>> > fix even in future release?
>>> >
>>> > On Wednesday, July 6, 2016, Michael McCandless <
>>> lucene@mikemccandless.com>
>>> > wrote:
>>> >
>>> >> I believe only binary DVs can be larger than 32K bytes.
>>> >>
>>> >> Mike McCandless
>>> >>
>>> >> http://blog.mikemccandless.com
>>> >>
>>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
>>> <javascript:;>>
>>> >> wrote:
>>> >>
>>> >> > Hi,
>>> >> >
>>> >> > I am getting an IAE indicating one of the SortedDocValueField is too
>>> >> large,
>>> >> > > 32k
>>> >> >
>>> >> > I googled a bit, and it seems like #Lucene-4583 has addressed this
>>> issue
>>> >> in
>>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
>>> >> > misunderstand anything ?
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: dv field is too large

Posted by Sheng <sh...@gmail.com>.
To be clear, the "field" is indeed tokenized, which is accompanied with a
SortedDocValueField so that it is sortable too. Am I making the wrong
assumption here ?

On Wednesday, July 6, 2016, Sheng <sh...@gmail.com> wrote:

> Hi Eric,
>
> I am refactoring a legacy system. One of the most annoying things is I
> have to keep the old feature even though it makes little sense. In this
> case, we have to index a particular data structure which has bunch of
> fields and each of them is promised to be searchable and search-sortable to
> the user. Turns out one field is notoriously large. I think the old
> implementation uses some quite clumsy way to make it happen. But since we
> decide to refactor the system with all the goodies from Lucene, we want to
> do the sorting right, and here we are at this issue... :-(
>
> On Wednesday, July 6, 2016, Erick Erickson <erickerickson@gmail.com
> <javascript:_e(%7B%7D,'cvml','erickerickson@gmail.com');>> wrote:
>
>> Is this an "XY" problem? Meaning, why do you need DV fields larger than
>> 32K?
>>
>> You can't search it as text as it's not tokenized. Faceting and sorting
>> by a 32K
>> field doesn't seem very useful. You may have a perfectly valid reason,
>> but it's
>> not obvious what use-case you're serving from this thread so far....
>>
>> Nobody has yet put forth a compelling use-case for such large fields,
>> perhaps
>> this would be one.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <sh...@gmail.com> wrote:
>> > Mike - Thanks for the prompt response. Is there a way to bypass this
>> > constraint for SortedDocValueField ? Or we have to live with it,
>> meaning no
>> > fix even in future release?
>> >
>> > On Wednesday, July 6, 2016, Michael McCandless <
>> lucene@mikemccandless.com>
>> > wrote:
>> >
>> >> I believe only binary DVs can be larger than 32K bytes.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
>> <javascript:;>>
>> >> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I am getting an IAE indicating one of the SortedDocValueField is too
>> >> large,
>> >> > > 32k
>> >> >
>> >> > I googled a bit, and it seems like #Lucene-4583 has addressed this
>> issue
>> >> in
>> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
>> >> > misunderstand anything ?
>> >> >
>> >> > Thanks,
>> >> >
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

Re: dv field is too large

Posted by Sheng <sh...@gmail.com>.
Hi Eric,

I am refactoring a legacy system. One of the most annoying things is I have
to keep the old feature even though it makes little sense. In this case, we
have to index a particular data structure which has bunch of fields and
each of them is promised to be searchable and search-sortable to the user.
Turns out one field is notoriously large. I think the old implementation
uses some quite clumsy way to make it happen. But since we decide to
refactor the system with all the goodies from Lucene, we want to do the
sorting right, and here we are at this issue... :-(

On Wednesday, July 6, 2016, Erick Erickson <er...@gmail.com> wrote:

> Is this an "XY" problem? Meaning, why do you need DV fields larger than
> 32K?
>
> You can't search it as text as it's not tokenized. Faceting and sorting by
> a 32K
> field doesn't seem very useful. You may have a perfectly valid reason, but
> it's
> not obvious what use-case you're serving from this thread so far....
>
> Nobody has yet put forth a compelling use-case for such large fields,
> perhaps
> this would be one.
>
> Best,
> Erick
>
> On Wed, Jul 6, 2016 at 2:24 PM, Sheng <shengcer@gmail.com <javascript:;>>
> wrote:
> > Mike - Thanks for the prompt response. Is there a way to bypass this
> > constraint for SortedDocValueField ? Or we have to live with it, meaning
> no
> > fix even in future release?
> >
> > On Wednesday, July 6, 2016, Michael McCandless <
> lucene@mikemccandless.com <javascript:;>>
> > wrote:
> >
> >> I believe only binary DVs can be larger than 32K bytes.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com
> <javascript:;> <javascript:;>>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am getting an IAE indicating one of the SortedDocValueField is too
> >> large,
> >> > > 32k
> >> >
> >> > I googled a bit, and it seems like #Lucene-4583 has addressed this
> issue
> >> in
> >> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
> >> > misunderstand anything ?
> >> >
> >> > Thanks,
> >> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
>
>

Re: dv field is too large

Posted by Erick Erickson <er...@gmail.com>.
Is this an "XY" problem? Meaning, why do you need DV fields larger than 32K?

You can't search it as text as it's not tokenized. Faceting and sorting by a 32K
field doesn't seem very useful. You may have a perfectly valid reason, but it's
not obvious what use-case you're serving from this thread so far....

Nobody has yet put forth a compelling use-case for such large fields, perhaps
this would be one.

Best,
Erick

On Wed, Jul 6, 2016 at 2:24 PM, Sheng <sh...@gmail.com> wrote:
> Mike - Thanks for the prompt response. Is there a way to bypass this
> constraint for SortedDocValueField ? Or we have to live with it, meaning no
> fix even in future release?
>
> On Wednesday, July 6, 2016, Michael McCandless <lu...@mikemccandless.com>
> wrote:
>
>> I believe only binary DVs can be larger than 32K bytes.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com <javascript:;>>
>> wrote:
>>
>> > Hi,
>> >
>> > I am getting an IAE indicating one of the SortedDocValueField is too
>> large,
>> > > 32k
>> >
>> > I googled a bit, and it seems like #Lucene-4583 has addressed this issue
>> in
>> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
>> > misunderstand anything ?
>> >
>> > Thanks,
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: dv field is too large

Posted by Sheng <sh...@gmail.com>.
Mike - Thanks for the prompt response. Is there a way to bypass this
constraint for SortedDocValueField ? Or we have to live with it, meaning no
fix even in future release?

On Wednesday, July 6, 2016, Michael McCandless <lu...@mikemccandless.com>
wrote:

> I believe only binary DVs can be larger than 32K bytes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jul 6, 2016 at 10:31 AM, Sheng <shengcer@gmail.com <javascript:;>>
> wrote:
>
> > Hi,
> >
> > I am getting an IAE indicating one of the SortedDocValueField is too
> large,
> > > 32k
> >
> > I googled a bit, and it seems like #Lucene-4583 has addressed this issue
> in
> > 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
> > misunderstand anything ?
> >
> > Thanks,
> >
>

Re: dv field is too large

Posted by Michael McCandless <lu...@mikemccandless.com>.
I believe only binary DVs can be larger than 32K bytes.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 6, 2016 at 10:31 AM, Sheng <sh...@gmail.com> wrote:

> Hi,
>
> I am getting an IAE indicating one of the SortedDocValueField is too large,
> > 32k
>
> I googled a bit, and it seems like #Lucene-4583 has addressed this issue in
> 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or
> misunderstand anything ?
>
> Thanks,
>