You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Steven White <sw...@gmail.com> on 2020/09/17 01:13:42 UTC

Doing what does using SolrJ API

Hi everyone,

I want to avoid creating a <copyField dest="CatchAll"
source="OneFieldOfMany"/> in my schema (there will be over 1000 of them and
maybe more so managing it will be a pain).  Instead, I want to use SolrJ
API to do what <copyField/> does.  Any example of how I can do this?  If
there is an example online, that would be great.

Thanks in advance.

Steven

Re: Doing what does using SolrJ API

Posted by Steven White <sw...@gmail.com>.
Thank you all for your feedback.  They are very helpful.

@Walther, out of the 1000 fields in Solr's schema, only 5 are set as
"required" fields and the Solr doc that I create and then send to Solr for
indexing, contains only those fields that have data to be indexed.  So some
docs will have 10 fields, some 50, etc.

Steven

On Thu, Sep 17, 2020 at 1:55 PM Erick Erickson <er...@gmail.com>
wrote:

> The script can actually be written an any number of scripting languages,
> python, groovy,
> javascript etc. but Alexandre’s comments about javascript are well taken.
>
> It all depends here on whether you every want to search the fields
> individually. If you do,
> you need to have them in your index as well as the copyField.
>
> > On Sep 17, 2020, at 1:37 PM, Walter Underwood <wu...@wunderwood.org>
> wrote:
> >
> > If you want to ignore a field being sent to Solr, you can set
> indexed=false and
> > stored=false for that field in schema.xml. It will take up room in
> schema.xml but
> > zero room on disk.
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
> >>
> >> Solr has a whole pipeline that you can run during document ingesting
> before
> >> the actual indexing happens. It is called Update Request Processor (URP)
> >> and is defined in solrconfig.xml or in an override file. Obviously,
> since
> >> you are indexing from SolrJ client, you have even more flexibility, but
> it
> >> is good to know about anyway.
> >>
> >> You can read all about it at:
> >> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html
> and
> >> see the extensive list of processors you can leverage. The specific
> >> mentioned one is this one:
> >>
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
> >>
> >> Just a word of warning that Stateless URP is using Javascript, which is
> >> getting a bit of a complicated story as underlying JVM is upgraded
> (Oracle
> >> dropped their javascript engine in JDK 14). So if one of the simpler
> URPs
> >> will do the job or a chain of them, that may be a better path to take.
> >>
> >> Regards,
> >>  Alex.
> >>
> >>
> >> On Thu, 17 Sep 2020 at 13:13, Steven White <sw...@gmail.com>
> wrote:
> >>
> >>> Thanks Erick.  Where can I learn more about "stateless script update
> >>> processor factory".  I don't know what you mean by this.
> >>>
> >>> Steven
> >>>
> >>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson <
> erickerickson@gmail.com>
> >>> wrote:
> >>>
> >>>> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
> >>> really
> >>>> doubt you'll notice. That said, are these fields used for searching?
> >>>> Because you do have control over what gous into the index if you can
> put
> >>> a
> >>>> "stateless script update processor factory" in your update chain.
> There
> >>> you
> >>>> can do whatever you want, including combine all the fields into one
> and
> >>>> delete the original fields. There's no point in having your index
> >>> cluttered
> >>>> with unused fields, OTOH, it may not be worth the effort just to
> satisfy
> >>> my
> >>>> sense of aesthetics 😉
> >>>>
> >>>> On Thu, Sep 17, 2020, 12:59 Steven White <sw...@gmail.com>
> wrote:
> >>>>
> >>>>> Hi Eric,
> >>>>>
> >>>>> Yes, this is coming from a DB.  Unfortunately I have no control over
> >>> the
> >>>>> list of fields.  Out of the 1000 fields that there maybe, no
> document,
> >>>> that
> >>>>> gets indexed into Solr will use more then about 50 and since i'm
> >>> copying
> >>>>> the values of those fields to the catch-all field and the catch-all
> >>> field
> >>>>> is my default search field, I don't expect any problem for having
> 1000
> >>>>> fields in Solr's schema, or should I?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> Steven
> >>>>>
> >>>>>
> >>>>> On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
> >>> erickerickson@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> “there over 1000 of them[fields]”
> >>>>>>
> >>>>>> This is often a red flag in my experience. Solr will handle that
> many
> >>>>>> fields, I’ve seen many more. But this is often a result of
> >>>>>> “database thinking”, i.e. your mental model of how all this data
> >>>>>> is from a DB perspective rather than a search perspective.
> >>>>>>
> >>>>>> It’s unwieldy to have that many fields. Obviously I don’t know the
> >>>>>> particulars of
> >>>>>> your app, and maybe that’s the best design. Particularly if many of
> >>> the
> >>>>>> fields
> >>>>>> are sparsely populated, i.e. only a small percentage of the
> documents
> >>>> in
> >>>>>> your
> >>>>>> corpus have any value for that field then taking a step back and
> >>>> looking
> >>>>>> at the design might save you some grief down the line.
> >>>>>>
> >>>>>> For instance, I’ve seen designs where instead of
> >>>>>> field1:some_value
> >>>>>> field2:other_value….
> >>>>>>
> >>>>>> you use a single field with _tokens_ like:
> >>>>>> field:field1_some_value
> >>>>>> field:field2_other_value
> >>>>>>
> >>>>>> that drops the complexity and increases performance.
> >>>>>>
> >>>>>> Anyway, just a thought you might want to consider.
> >>>>>>
> >>>>>> Best,
> >>>>>> Erick
> >>>>>>
> >>>>>>> On Sep 16, 2020, at 9:31 PM, Steven White <sw...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> I figured it out.  It is as simple as creating a List<String> and
> >>>> using
> >>>>>>> that as the value part for SolrInputDocument.addField() API.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Steven
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Sep 16, 2020 at 9:13 PM Steven White <swhite4141@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> I want to avoid creating a <copyField dest="CatchAll"
> >>>>>>>> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
> >>>>> them
> >>>>>> and
> >>>>>>>> maybe more so managing it will be a pain).  Instead, I want to use
> >>>>> SolrJ
> >>>>>>>> API to do what <copyField/> does.  Any example of how I can do
> >>> this?
> >>>>> If
> >>>>>>>> there is an example online, that would be great.
> >>>>>>>>
> >>>>>>>> Thanks in advance.
> >>>>>>>>
> >>>>>>>> Steven
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >
>
>

Re: Doing what does using SolrJ API

Posted by Erick Erickson <er...@gmail.com>.
The script can actually be written an any number of scripting languages, python, groovy,
javascript etc. but Alexandre’s comments about javascript are well taken.

It all depends here on whether you every want to search the fields individually. If you do,
you need to have them in your index as well as the copyField.

> On Sep 17, 2020, at 1:37 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> If you want to ignore a field being sent to Solr, you can set indexed=false and 
> stored=false for that field in schema.xml. It will take up room in schema.xml but
> zero room on disk.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch <ar...@gmail.com> wrote:
>> 
>> Solr has a whole pipeline that you can run during document ingesting before
>> the actual indexing happens. It is called Update Request Processor (URP)
>> and is defined in solrconfig.xml or in an override file. Obviously, since
>> you are indexing from SolrJ client, you have even more flexibility, but it
>> is good to know about anyway.
>> 
>> You can read all about it at:
>> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and
>> see the extensive list of processors you can leverage. The specific
>> mentioned one is this one:
>> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
>> 
>> Just a word of warning that Stateless URP is using Javascript, which is
>> getting a bit of a complicated story as underlying JVM is upgraded (Oracle
>> dropped their javascript engine in JDK 14). So if one of the simpler URPs
>> will do the job or a chain of them, that may be a better path to take.
>> 
>> Regards,
>>  Alex.
>> 
>> 
>> On Thu, 17 Sep 2020 at 13:13, Steven White <sw...@gmail.com> wrote:
>> 
>>> Thanks Erick.  Where can I learn more about "stateless script update
>>> processor factory".  I don't know what you mean by this.
>>> 
>>> Steven
>>> 
>>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson <er...@gmail.com>
>>> wrote:
>>> 
>>>> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
>>> really
>>>> doubt you'll notice. That said, are these fields used for searching?
>>>> Because you do have control over what gous into the index if you can put
>>> a
>>>> "stateless script update processor factory" in your update chain. There
>>> you
>>>> can do whatever you want, including combine all the fields into one and
>>>> delete the original fields. There's no point in having your index
>>> cluttered
>>>> with unused fields, OTOH, it may not be worth the effort just to satisfy
>>> my
>>>> sense of aesthetics 😉
>>>> 
>>>> On Thu, Sep 17, 2020, 12:59 Steven White <sw...@gmail.com> wrote:
>>>> 
>>>>> Hi Eric,
>>>>> 
>>>>> Yes, this is coming from a DB.  Unfortunately I have no control over
>>> the
>>>>> list of fields.  Out of the 1000 fields that there maybe, no document,
>>>> that
>>>>> gets indexed into Solr will use more then about 50 and since i'm
>>> copying
>>>>> the values of those fields to the catch-all field and the catch-all
>>> field
>>>>> is my default search field, I don't expect any problem for having 1000
>>>>> fields in Solr's schema, or should I?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Steven
>>>>> 
>>>>> 
>>>>> On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
>>> erickerickson@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> “there over 1000 of them[fields]”
>>>>>> 
>>>>>> This is often a red flag in my experience. Solr will handle that many
>>>>>> fields, I’ve seen many more. But this is often a result of
>>>>>> “database thinking”, i.e. your mental model of how all this data
>>>>>> is from a DB perspective rather than a search perspective.
>>>>>> 
>>>>>> It’s unwieldy to have that many fields. Obviously I don’t know the
>>>>>> particulars of
>>>>>> your app, and maybe that’s the best design. Particularly if many of
>>> the
>>>>>> fields
>>>>>> are sparsely populated, i.e. only a small percentage of the documents
>>>> in
>>>>>> your
>>>>>> corpus have any value for that field then taking a step back and
>>>> looking
>>>>>> at the design might save you some grief down the line.
>>>>>> 
>>>>>> For instance, I’ve seen designs where instead of
>>>>>> field1:some_value
>>>>>> field2:other_value….
>>>>>> 
>>>>>> you use a single field with _tokens_ like:
>>>>>> field:field1_some_value
>>>>>> field:field2_other_value
>>>>>> 
>>>>>> that drops the complexity and increases performance.
>>>>>> 
>>>>>> Anyway, just a thought you might want to consider.
>>>>>> 
>>>>>> Best,
>>>>>> Erick
>>>>>> 
>>>>>>> On Sep 16, 2020, at 9:31 PM, Steven White <sw...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> I figured it out.  It is as simple as creating a List<String> and
>>>> using
>>>>>>> that as the value part for SolrInputDocument.addField() API.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Steven
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Sep 16, 2020 at 9:13 PM Steven White <swhite4141@gmail.com
>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi everyone,
>>>>>>>> 
>>>>>>>> I want to avoid creating a <copyField dest="CatchAll"
>>>>>>>> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
>>>>> them
>>>>>> and
>>>>>>>> maybe more so managing it will be a pain).  Instead, I want to use
>>>>> SolrJ
>>>>>>>> API to do what <copyField/> does.  Any example of how I can do
>>> this?
>>>>> If
>>>>>>>> there is an example online, that would be great.
>>>>>>>> 
>>>>>>>> Thanks in advance.
>>>>>>>> 
>>>>>>>> Steven
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
> 


Re: Doing what does using SolrJ API

Posted by Walter Underwood <wu...@wunderwood.org>.
If you want to ignore a field being sent to Solr, you can set indexed=false and 
stored=false for that field in schema.xml. It will take up room in schema.xml but
zero room on disk.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 17, 2020, at 10:23 AM, Alexandre Rafalovitch <ar...@gmail.com> wrote:
> 
> Solr has a whole pipeline that you can run during document ingesting before
> the actual indexing happens. It is called Update Request Processor (URP)
> and is defined in solrconfig.xml or in an override file. Obviously, since
> you are indexing from SolrJ client, you have even more flexibility, but it
> is good to know about anyway.
> 
> You can read all about it at:
> https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and
> see the extensive list of processors you can leverage. The specific
> mentioned one is this one:
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
> 
> Just a word of warning that Stateless URP is using Javascript, which is
> getting a bit of a complicated story as underlying JVM is upgraded (Oracle
> dropped their javascript engine in JDK 14). So if one of the simpler URPs
> will do the job or a chain of them, that may be a better path to take.
> 
> Regards,
>   Alex.
> 
> 
> On Thu, 17 Sep 2020 at 13:13, Steven White <sw...@gmail.com> wrote:
> 
>> Thanks Erick.  Where can I learn more about "stateless script update
>> processor factory".  I don't know what you mean by this.
>> 
>> Steven
>> 
>> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson <er...@gmail.com>
>> wrote:
>> 
>>> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
>> really
>>> doubt you'll notice. That said, are these fields used for searching?
>>> Because you do have control over what gous into the index if you can put
>> a
>>> "stateless script update processor factory" in your update chain. There
>> you
>>> can do whatever you want, including combine all the fields into one and
>>> delete the original fields. There's no point in having your index
>> cluttered
>>> with unused fields, OTOH, it may not be worth the effort just to satisfy
>> my
>>> sense of aesthetics 😉
>>> 
>>> On Thu, Sep 17, 2020, 12:59 Steven White <sw...@gmail.com> wrote:
>>> 
>>>> Hi Eric,
>>>> 
>>>> Yes, this is coming from a DB.  Unfortunately I have no control over
>> the
>>>> list of fields.  Out of the 1000 fields that there maybe, no document,
>>> that
>>>> gets indexed into Solr will use more then about 50 and since i'm
>> copying
>>>> the values of those fields to the catch-all field and the catch-all
>> field
>>>> is my default search field, I don't expect any problem for having 1000
>>>> fields in Solr's schema, or should I?
>>>> 
>>>> Thanks
>>>> 
>>>> Steven
>>>> 
>>>> 
>>>> On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
>> erickerickson@gmail.com>
>>>> wrote:
>>>> 
>>>>> “there over 1000 of them[fields]”
>>>>> 
>>>>> This is often a red flag in my experience. Solr will handle that many
>>>>> fields, I’ve seen many more. But this is often a result of
>>>>> “database thinking”, i.e. your mental model of how all this data
>>>>> is from a DB perspective rather than a search perspective.
>>>>> 
>>>>> It’s unwieldy to have that many fields. Obviously I don’t know the
>>>>> particulars of
>>>>> your app, and maybe that’s the best design. Particularly if many of
>> the
>>>>> fields
>>>>> are sparsely populated, i.e. only a small percentage of the documents
>>> in
>>>>> your
>>>>> corpus have any value for that field then taking a step back and
>>> looking
>>>>> at the design might save you some grief down the line.
>>>>> 
>>>>> For instance, I’ve seen designs where instead of
>>>>> field1:some_value
>>>>> field2:other_value….
>>>>> 
>>>>> you use a single field with _tokens_ like:
>>>>> field:field1_some_value
>>>>> field:field2_other_value
>>>>> 
>>>>> that drops the complexity and increases performance.
>>>>> 
>>>>> Anyway, just a thought you might want to consider.
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>>> On Sep 16, 2020, at 9:31 PM, Steven White <sw...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> I figured it out.  It is as simple as creating a List<String> and
>>> using
>>>>>> that as the value part for SolrInputDocument.addField() API.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Steven
>>>>>> 
>>>>>> 
>>>>>> On Wed, Sep 16, 2020 at 9:13 PM Steven White <swhite4141@gmail.com
>>> 
>>>>> wrote:
>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> I want to avoid creating a <copyField dest="CatchAll"
>>>>>>> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
>>>> them
>>>>> and
>>>>>>> maybe more so managing it will be a pain).  Instead, I want to use
>>>> SolrJ
>>>>>>> API to do what <copyField/> does.  Any example of how I can do
>> this?
>>>> If
>>>>>>> there is an example online, that would be great.
>>>>>>> 
>>>>>>> Thanks in advance.
>>>>>>> 
>>>>>>> Steven
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: Doing what does using SolrJ API

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Solr has a whole pipeline that you can run during document ingesting before
the actual indexing happens. It is called Update Request Processor (URP)
and is defined in solrconfig.xml or in an override file. Obviously, since
you are indexing from SolrJ client, you have even more flexibility, but it
is good to know about anyway.

You can read all about it at:
https://lucene.apache.org/solr/guide/8_6/update-request-processors.html and
see the extensive list of processors you can leverage. The specific
mentioned one is this one:
https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html

Just a word of warning that Stateless URP is using Javascript, which is
getting a bit of a complicated story as underlying JVM is upgraded (Oracle
dropped their javascript engine in JDK 14). So if one of the simpler URPs
will do the job or a chain of them, that may be a better path to take.

Regards,
   Alex.


On Thu, 17 Sep 2020 at 13:13, Steven White <sw...@gmail.com> wrote:

> Thanks Erick.  Where can I learn more about "stateless script update
> processor factory".  I don't know what you mean by this.
>
> Steven
>
> On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson <er...@gmail.com>
> wrote:
>
> > 1000 fields is fine, you'll waste some cycles on bookkeeping, but I
> really
> > doubt you'll notice. That said, are these fields used for searching?
> > Because you do have control over what gous into the index if you can put
> a
> > "stateless script update processor factory" in your update chain. There
> you
> > can do whatever you want, including combine all the fields into one and
> > delete the original fields. There's no point in having your index
> cluttered
> > with unused fields, OTOH, it may not be worth the effort just to satisfy
> my
> > sense of aesthetics 😉
> >
> > On Thu, Sep 17, 2020, 12:59 Steven White <sw...@gmail.com> wrote:
> >
> > > Hi Eric,
> > >
> > > Yes, this is coming from a DB.  Unfortunately I have no control over
> the
> > > list of fields.  Out of the 1000 fields that there maybe, no document,
> > that
> > > gets indexed into Solr will use more then about 50 and since i'm
> copying
> > > the values of those fields to the catch-all field and the catch-all
> field
> > > is my default search field, I don't expect any problem for having 1000
> > > fields in Solr's schema, or should I?
> > >
> > > Thanks
> > >
> > > Steven
> > >
> > >
> > > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <
> erickerickson@gmail.com>
> > > wrote:
> > >
> > > > “there over 1000 of them[fields]”
> > > >
> > > > This is often a red flag in my experience. Solr will handle that many
> > > > fields, I’ve seen many more. But this is often a result of
> > > > “database thinking”, i.e. your mental model of how all this data
> > > > is from a DB perspective rather than a search perspective.
> > > >
> > > > It’s unwieldy to have that many fields. Obviously I don’t know the
> > > > particulars of
> > > > your app, and maybe that’s the best design. Particularly if many of
> the
> > > > fields
> > > > are sparsely populated, i.e. only a small percentage of the documents
> > in
> > > > your
> > > > corpus have any value for that field then taking a step back and
> > looking
> > > > at the design might save you some grief down the line.
> > > >
> > > > For instance, I’ve seen designs where instead of
> > > > field1:some_value
> > > > field2:other_value….
> > > >
> > > > you use a single field with _tokens_ like:
> > > > field:field1_some_value
> > > > field:field2_other_value
> > > >
> > > > that drops the complexity and increases performance.
> > > >
> > > > Anyway, just a thought you might want to consider.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > > On Sep 16, 2020, at 9:31 PM, Steven White <sw...@gmail.com>
> > > wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > I figured it out.  It is as simple as creating a List<String> and
> > using
> > > > > that as the value part for SolrInputDocument.addField() API.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Steven
> > > > >
> > > > >
> > > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White <swhite4141@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> I want to avoid creating a <copyField dest="CatchAll"
> > > > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
> > > them
> > > > and
> > > > >> maybe more so managing it will be a pain).  Instead, I want to use
> > > SolrJ
> > > > >> API to do what <copyField/> does.  Any example of how I can do
> this?
> > > If
> > > > >> there is an example online, that would be great.
> > > > >>
> > > > >> Thanks in advance.
> > > > >>
> > > > >> Steven
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: Doing what does using SolrJ API

Posted by Steven White <sw...@gmail.com>.
Thanks Erick.  Where can I learn more about "stateless script update
processor factory".  I don't know what you mean by this.

Steven

On Thu, Sep 17, 2020 at 1:08 PM Erick Erickson <er...@gmail.com>
wrote:

> 1000 fields is fine, you'll waste some cycles on bookkeeping, but I really
> doubt you'll notice. That said, are these fields used for searching?
> Because you do have control over what gous into the index if you can put a
> "stateless script update processor factory" in your update chain. There you
> can do whatever you want, including combine all the fields into one and
> delete the original fields. There's no point in having your index cluttered
> with unused fields, OTOH, it may not be worth the effort just to satisfy my
> sense of aesthetics 😉
>
> On Thu, Sep 17, 2020, 12:59 Steven White <sw...@gmail.com> wrote:
>
> > Hi Eric,
> >
> > Yes, this is coming from a DB.  Unfortunately I have no control over the
> > list of fields.  Out of the 1000 fields that there maybe, no document,
> that
> > gets indexed into Solr will use more then about 50 and since i'm copying
> > the values of those fields to the catch-all field and the catch-all field
> > is my default search field, I don't expect any problem for having 1000
> > fields in Solr's schema, or should I?
> >
> > Thanks
> >
> > Steven
> >
> >
> > On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <er...@gmail.com>
> > wrote:
> >
> > > “there over 1000 of them[fields]”
> > >
> > > This is often a red flag in my experience. Solr will handle that many
> > > fields, I’ve seen many more. But this is often a result of
> > > “database thinking”, i.e. your mental model of how all this data
> > > is from a DB perspective rather than a search perspective.
> > >
> > > It’s unwieldy to have that many fields. Obviously I don’t know the
> > > particulars of
> > > your app, and maybe that’s the best design. Particularly if many of the
> > > fields
> > > are sparsely populated, i.e. only a small percentage of the documents
> in
> > > your
> > > corpus have any value for that field then taking a step back and
> looking
> > > at the design might save you some grief down the line.
> > >
> > > For instance, I’ve seen designs where instead of
> > > field1:some_value
> > > field2:other_value….
> > >
> > > you use a single field with _tokens_ like:
> > > field:field1_some_value
> > > field:field2_other_value
> > >
> > > that drops the complexity and increases performance.
> > >
> > > Anyway, just a thought you might want to consider.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Sep 16, 2020, at 9:31 PM, Steven White <sw...@gmail.com>
> > wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I figured it out.  It is as simple as creating a List<String> and
> using
> > > > that as the value part for SolrInputDocument.addField() API.
> > > >
> > > > Thanks,
> > > >
> > > > Steven
> > > >
> > > >
> > > > On Wed, Sep 16, 2020 at 9:13 PM Steven White <sw...@gmail.com>
> > > wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I want to avoid creating a <copyField dest="CatchAll"
> > > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
> > them
> > > and
> > > >> maybe more so managing it will be a pain).  Instead, I want to use
> > SolrJ
> > > >> API to do what <copyField/> does.  Any example of how I can do this?
> > If
> > > >> there is an example online, that would be great.
> > > >>
> > > >> Thanks in advance.
> > > >>
> > > >> Steven
> > > >>
> > >
> > >
> >
>

Re: Doing what does using SolrJ API

Posted by Erick Erickson <er...@gmail.com>.
1000 fields is fine, you'll waste some cycles on bookkeeping, but I really
doubt you'll notice. That said, are these fields used for searching?
Because you do have control over what gous into the index if you can put a
"stateless script update processor factory" in your update chain. There you
can do whatever you want, including combine all the fields into one and
delete the original fields. There's no point in having your index cluttered
with unused fields, OTOH, it may not be worth the effort just to satisfy my
sense of aesthetics 😉

On Thu, Sep 17, 2020, 12:59 Steven White <sw...@gmail.com> wrote:

> Hi Eric,
>
> Yes, this is coming from a DB.  Unfortunately I have no control over the
> list of fields.  Out of the 1000 fields that there maybe, no document, that
> gets indexed into Solr will use more then about 50 and since i'm copying
> the values of those fields to the catch-all field and the catch-all field
> is my default search field, I don't expect any problem for having 1000
> fields in Solr's schema, or should I?
>
> Thanks
>
> Steven
>
>
> On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <er...@gmail.com>
> wrote:
>
> > “there over 1000 of them[fields]”
> >
> > This is often a red flag in my experience. Solr will handle that many
> > fields, I’ve seen many more. But this is often a result of
> > “database thinking”, i.e. your mental model of how all this data
> > is from a DB perspective rather than a search perspective.
> >
> > It’s unwieldy to have that many fields. Obviously I don’t know the
> > particulars of
> > your app, and maybe that’s the best design. Particularly if many of the
> > fields
> > are sparsely populated, i.e. only a small percentage of the documents in
> > your
> > corpus have any value for that field then taking a step back and looking
> > at the design might save you some grief down the line.
> >
> > For instance, I’ve seen designs where instead of
> > field1:some_value
> > field2:other_value….
> >
> > you use a single field with _tokens_ like:
> > field:field1_some_value
> > field:field2_other_value
> >
> > that drops the complexity and increases performance.
> >
> > Anyway, just a thought you might want to consider.
> >
> > Best,
> > Erick
> >
> > > On Sep 16, 2020, at 9:31 PM, Steven White <sw...@gmail.com>
> wrote:
> > >
> > > Hi everyone,
> > >
> > > I figured it out.  It is as simple as creating a List<String> and using
> > > that as the value part for SolrInputDocument.addField() API.
> > >
> > > Thanks,
> > >
> > > Steven
> > >
> > >
> > > On Wed, Sep 16, 2020 at 9:13 PM Steven White <sw...@gmail.com>
> > wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I want to avoid creating a <copyField dest="CatchAll"
> > >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of
> them
> > and
> > >> maybe more so managing it will be a pain).  Instead, I want to use
> SolrJ
> > >> API to do what <copyField/> does.  Any example of how I can do this?
> If
> > >> there is an example online, that would be great.
> > >>
> > >> Thanks in advance.
> > >>
> > >> Steven
> > >>
> >
> >
>

Re: Doing what does using SolrJ API

Posted by Steven White <sw...@gmail.com>.
Hi Eric,

Yes, this is coming from a DB.  Unfortunately I have no control over the
list of fields.  Out of the 1000 fields that there maybe, no document, that
gets indexed into Solr will use more then about 50 and since i'm copying
the values of those fields to the catch-all field and the catch-all field
is my default search field, I don't expect any problem for having 1000
fields in Solr's schema, or should I?

Thanks

Steven


On Thu, Sep 17, 2020 at 8:23 AM Erick Erickson <er...@gmail.com>
wrote:

> “there over 1000 of them[fields]”
>
> This is often a red flag in my experience. Solr will handle that many
> fields, I’ve seen many more. But this is often a result of
> “database thinking”, i.e. your mental model of how all this data
> is from a DB perspective rather than a search perspective.
>
> It’s unwieldy to have that many fields. Obviously I don’t know the
> particulars of
> your app, and maybe that’s the best design. Particularly if many of the
> fields
> are sparsely populated, i.e. only a small percentage of the documents in
> your
> corpus have any value for that field then taking a step back and looking
> at the design might save you some grief down the line.
>
> For instance, I’ve seen designs where instead of
> field1:some_value
> field2:other_value….
>
> you use a single field with _tokens_ like:
> field:field1_some_value
> field:field2_other_value
>
> that drops the complexity and increases performance.
>
> Anyway, just a thought you might want to consider.
>
> Best,
> Erick
>
> > On Sep 16, 2020, at 9:31 PM, Steven White <sw...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > I figured it out.  It is as simple as creating a List<String> and using
> > that as the value part for SolrInputDocument.addField() API.
> >
> > Thanks,
> >
> > Steven
> >
> >
> > On Wed, Sep 16, 2020 at 9:13 PM Steven White <sw...@gmail.com>
> wrote:
> >
> >> Hi everyone,
> >>
> >> I want to avoid creating a <copyField dest="CatchAll"
> >> source="OneFieldOfMany"/> in my schema (there will be over 1000 of them
> and
> >> maybe more so managing it will be a pain).  Instead, I want to use SolrJ
> >> API to do what <copyField/> does.  Any example of how I can do this?  If
> >> there is an example online, that would be great.
> >>
> >> Thanks in advance.
> >>
> >> Steven
> >>
>
>

Re: Doing what does using SolrJ API

Posted by Erick Erickson <er...@gmail.com>.
“there over 1000 of them[fields]”

This is often a red flag in my experience. Solr will handle that many 
fields, I’ve seen many more. But this is often a result of 
“database thinking”, i.e. your mental model of how all this data
is from a DB perspective rather than a search perspective.

It’s unwieldy to have that many fields. Obviously I don’t know the particulars of
your app, and maybe that’s the best design. Particularly if many of the fields
are sparsely populated, i.e. only a small percentage of the documents in your
corpus have any value for that field then taking a step back and looking
at the design might save you some grief down the line.

For instance, I’ve seen designs where instead of
field1:some_value
field2:other_value….

you use a single field with _tokens_ like:
field:field1_some_value
field:field2_other_value

that drops the complexity and increases performance.

Anyway, just a thought you might want to consider.

Best,
Erick

> On Sep 16, 2020, at 9:31 PM, Steven White <sw...@gmail.com> wrote:
> 
> Hi everyone,
> 
> I figured it out.  It is as simple as creating a List<String> and using
> that as the value part for SolrInputDocument.addField() API.
> 
> Thanks,
> 
> Steven
> 
> 
> On Wed, Sep 16, 2020 at 9:13 PM Steven White <sw...@gmail.com> wrote:
> 
>> Hi everyone,
>> 
>> I want to avoid creating a <copyField dest="CatchAll"
>> source="OneFieldOfMany"/> in my schema (there will be over 1000 of them and
>> maybe more so managing it will be a pain).  Instead, I want to use SolrJ
>> API to do what <copyField/> does.  Any example of how I can do this?  If
>> there is an example online, that would be great.
>> 
>> Thanks in advance.
>> 
>> Steven
>> 


Re: Doing what does using SolrJ API

Posted by Steven White <sw...@gmail.com>.
Hi everyone,

I figured it out.  It is as simple as creating a List<String> and using
that as the value part for SolrInputDocument.addField() API.

Thanks,

Steven


On Wed, Sep 16, 2020 at 9:13 PM Steven White <sw...@gmail.com> wrote:

> Hi everyone,
>
> I want to avoid creating a <copyField dest="CatchAll"
> source="OneFieldOfMany"/> in my schema (there will be over 1000 of them and
> maybe more so managing it will be a pain).  Instead, I want to use SolrJ
> API to do what <copyField/> does.  Any example of how I can do this?  If
> there is an example online, that would be great.
>
> Thanks in advance.
>
> Steven
>