You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by geeky2 <ge...@hotmail.com> on 2012/11/02 20:15:38 UTC
large text blobs in string field
hello
environment - solr 3.5
i would like to know if anyone is using the technique of placing large text
blobs in to a "non-indexed" string field and if so - are there any good/bad
aspects to consider?
we are thinking of doing this to represent a 1:M relationship with the
"Many" being represented as a string in the schema (probably comprised
either of xml or json objects).
we are looking at the classic part : model scenario, where the client would
look up a part and the document would contain a string field with
potentially 200+ model numbers. edge cases for this could be 400+ model
numbers.
thx
--
View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: large text blobs in string field
Posted by geeky2 <ge...@hotmail.com>.
Erick,
thanks for the insight.
FWIW and to add to the context of this discussion,
if we do decide to add the previously mentioned content as a multivalued
field, we would likely use a DIH hooked to our database schema (this is
currently how we add ALL content to our core) and within the DIH, use a
sub-entity to pull the "many" rows for each parent row.
thx
mark
--
View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018355.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: large text blobs in string field
Posted by Erick Erickson <er...@gmail.com>.
The only thing "special" about a multiValued field is that it can have
non-consecutive positions due to the incrementGap. So, if you set the
incrementGap=1, adding 10,000,000 words in one go is the same as adding 1
word at a time 10,000,000 times to a mutliValued field.
I think the only practical is that you're _probably_ going to have problems
if (total tokens added) + (increment_gap * number of entries) > 2B or so...
FWIW
Erick
On Mon, Nov 5, 2012 at 1:40 PM, geeky2 <ge...@hotmail.com> wrote:
> is there any documented limit (or practical limit) on how many values in a
> multi-valued field?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018335.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: large text blobs in string field
Posted by geeky2 <ge...@hotmail.com>.
is there any documented limit (or practical limit) on how many values in a
multi-valued field?
--
View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018335.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: large text blobs in string field
Posted by Gora Mohanty <go...@mimirtech.com>.
On 5 November 2012 22:26, geeky2 <ge...@hotmail.com> wrote:
> Gora,
>
> currently our core does use mult-valued fields. however the exsiting
> multi-valued fields in the schema are will only result in 3 - 10 values.
>
> we are thinking of using the text blob approach primarily because of the
> large number of possible values in this field.
>
> if we were to use a multi-valued field, it is likely that the MV field would
> have 200+ values and in some edge cases 400+ values.
>
> are you saying that the MV field approach to represent the data (given the
> scale previously indicated) is the best design solution?
Yes. I do not have direct experience with so many values per multi-valued
field, but as per people who know better 400-odd values should not be a
problem. This is probably better than indexing, retrieving, and parsing a
text blob.
Regards,
Gora
Re: large text blobs in string field
Posted by geeky2 <ge...@hotmail.com>.
Gora,
currently our core does use mult-valued fields. however the exsiting
multi-valued fields in the schema are will only result in 3 - 10 values.
we are thinking of using the text blob approach primarily because of the
large number of possible values in this field.
if we were to use a multi-valued field, it is likely that the MV field would
have 200+ values and in some edge cases 400+ values.
are you saying that the MV field approach to represent the data (given the
scale previously indicated) is the best design solution?
--
View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018315.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: large text blobs in string field
Posted by Gora Mohanty <go...@mimirtech.com>.
On 3 November 2012 00:45, geeky2 <ge...@hotmail.com> wrote:
[...]
>
> we are thinking of doing this to represent a 1:M relationship with the
> "Many" being represented as a string in the schema (probably comprised
> either of xml or json objects).
>
> we are looking at the classic part : model scenario, where the client would
> look up a part and the document would contain a string field with
> potentially 200+ model numbers. edge cases for this could be 400+ model
> numbers.
>
Why would you want to do this over having a multi-valued field for
the model number?
Regards,
Gora