You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by geeky2 <ge...@hotmail.com> on 2012/11/02 20:15:38 UTC

large text blobs in string field

hello 

environment - solr 3.5

i would like to know if anyone is using the technique of placing large text
blobs in to a "non-indexed" string field and if so - are there any good/bad
aspects to consider?

we are thinking of doing this to represent a 1:M relationship with the
"Many" being represented as a string in the schema (probably comprised
either of xml or json objects).

we are looking at the classic part : model scenario, where the client would
look up a part and the document would contain a string field with
potentially 200+ model numbers.  edge cases for this could be 400+ model
numbers.

thx

 



--
View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: large text blobs in string field

Posted by geeky2 <ge...@hotmail.com>.
Erick,

thanks for the insight.

FWIW and to add to the context of this discussion,

if we do decide to add the previously mentioned content as a multivalued
field,  we would likely use a DIH hooked to our database schema (this is
currently how we add ALL content to our core) and within the DIH, use a
sub-entity to pull the "many" rows for each parent row.

thx
mark




--
View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018355.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: large text blobs in string field

Posted by Erick Erickson <er...@gmail.com>.
The only thing "special" about a multiValued field is that it can have
non-consecutive positions due to the incrementGap. So, if you set the
incrementGap=1, adding 10,000,000 words in one go is the same as adding 1
word at a time 10,000,000 times to a mutliValued field.

I think the only practical is that you're _probably_ going to have problems
if (total tokens added) + (increment_gap * number of entries) > 2B or so...

FWIW
Erick


On Mon, Nov 5, 2012 at 1:40 PM, geeky2 <ge...@hotmail.com> wrote:

> is there any documented limit (or practical limit) on how many values in a
> multi-valued field?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018335.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: large text blobs in string field

Posted by geeky2 <ge...@hotmail.com>.
is there any documented limit (or practical limit) on how many values in a
multi-valued field?



--
View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018335.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: large text blobs in string field

Posted by Gora Mohanty <go...@mimirtech.com>.
On 5 November 2012 22:26, geeky2 <ge...@hotmail.com> wrote:
> Gora,
>
> currently our core does use mult-valued fields.  however the exsiting
> multi-valued fields in the schema are will only result in 3 - 10 values.
>
> we are thinking of using the text blob approach primarily because of the
> large number of possible values in this field.
>
> if we were to use a multi-valued field, it is likely that the MV field would
> have 200+ values and in some edge cases 400+ values.
>
> are you saying that the MV field approach to represent the data (given the
> scale previously indicated) is the best design solution?

Yes. I do not have direct experience with so many values per multi-valued
field, but as per people who know better 400-odd values should not be a
problem. This is probably better than indexing, retrieving, and parsing a
text blob.

Regards,
Gora

Re: large text blobs in string field

Posted by geeky2 <ge...@hotmail.com>.
Gora,

currently our core does use mult-valued fields.  however the exsiting
multi-valued fields in the schema are will only result in 3 - 10 values.

we are thinking of using the text blob approach primarily because of the
large number of possible values in this field.  

if we were to use a multi-valued field, it is likely that the MV field would
have 200+ values and in some edge cases 400+ values.

are you saying that the MV field approach to represent the data (given the
scale previously indicated) is the best design solution?






--
View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018315.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: large text blobs in string field

Posted by Gora Mohanty <go...@mimirtech.com>.
On 3 November 2012 00:45, geeky2 <ge...@hotmail.com> wrote:
[...]

>
> we are thinking of doing this to represent a 1:M relationship with the
> "Many" being represented as a string in the schema (probably comprised
> either of xml or json objects).
>
> we are looking at the classic part : model scenario, where the client would
> look up a part and the document would contain a string field with
> potentially 200+ model numbers.  edge cases for this could be 400+ model
> numbers.
>

Why would you want to do this over having a multi-valued field for
the model number?

Regards,
Gora