You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Paras Lehana <pa...@indiamart.com> on 2019/10/23 11:55:01 UTC

copyField - why source should contain * when dest contains *?

Hi Community,

I was just going through *Solr Ref Guide 8.1* from scratch and I was
reading about* copyFields
<https://lucene.apache.org/solr/guide/8_0/copying-fields.html>*. We have
been working on copyFields in 6.6 since a year. I just wanted to refresh
what we know and what we should before we upgrade to 8.2.

The last quote on the page mentions:

*The copyField command can use a wildcard (*) character in the dest
> parameter only if the source parameter contains one as well. copyField uses
> the matching glob from the source field for the dest field name into which
> the source content is copied.*


*Why do we have this restriction? *Can't we have one source field
information that is copied into different fields? The second statement is
probably the explanation for the first (if not, please help me understand
that as well) but I cannot relate it.

Thanks in advance.

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.

Re: copyField - why source should contain * when dest contains *?

Posted by Paras Lehana <pa...@indiamart.com>.
Hey Community,

I think I have got the answer to my query.

This statement about *copyFields
<https://lucene.apache.org/solr/guide/8_0/copying-fields.html>*:

*The copyField command can use a wildcard (*) character in the dest
> parameter only if the source parameter contains one as well. copyField uses
> the matching glob from the source field for the dest field name into which
> the source content is copied.*


So, the glob here means the *something_** pattern. copyField doesn't
support chaining and similarly, does *NOT support copying into multiple
destinations from a single or multiple field*. The whole point of
supporting wildcard in dest when it is also present in source is to
actually make *one-to-one mapping* from the matching glob. For example,
consider having this config:

<copyField src=“title_*” dest=“text_*”/>


and these fields in the schema:

  <field name="title_en" .../>
>   <field name="title_fr" .../>
>   <field name="title_tr" .../>


>   <field name="text_en" .../>
>   <field name="text_fr" .../>
>   <field name="text_tr" .../>


And if you index some information for title_*en*, it will be copied into
text_*en* ONLY. Note the one-to-one mapping here:


> *title_fr      to       text_fr**title_tr    to       text_tr*


Note that information of title_*en *will *NOT* copy into text_*fr*.

I guess Erick and Chris were actually making me understand this. I have
tried my best to explain this to any possible uninformed. Thank you,
everyone! :)


On Thu, 24 Oct 2019 at 12:50, Paras Lehana <pa...@indiamart.com>
wrote:

> Hey Chris,
>
> Awesome explanation.
>
> ...then solr has no idea what full field name to use as the destination
>> when it seees values in a field "foo" ... should it be "1_bar" ?
>> "aaa_bar" ? ... "zzzzzzzzzzzzzzzzzzzzzzzzz_bar" ? all three?
>
>
> But how does Solr get the idea what full field name to use as the
> destination when we provide wildcard in source as well? Seems I'm missing
> something.
>
>
> but using a wildcard in the dest only woks with a one-to-one mapping
>
>
> So, I think, this restriction could be more related to the source code
> flow instead of a logical reason. I'll try to understand the code about
> this.
>
> I was actually curious if there's any logical restriction that I had been
> missing.
>
> Many thanks. :)
>
> On Thu, 24 Oct 2019 at 03:47, Chris Hostetter <ho...@fucit.org>
> wrote:
>
>>
>> : Documentation says that we can copy multiple fields using wildcard to
>> one
>> : or more than one fields.
>>
>> correct ... the limitation is in the syntax and the ambiguity that would
>> be unresolvable if you had a wildcard in the dest but not in the source.
>>
>> the wildcard is essentially a variable.  if you have...
>>
>>    source="foo" desc="*_bar"
>>
>> ...then solr has no idea what full field name to use as the destination
>> when it seees values in a field "foo" ... should it be "1_bar" ?
>> "aaa_bar" ? ... "zzzzzzzzzzzzzzzzzzzzzzzzz_bar" ? all three?
>>
>> : Yes, that's what hit me initially. But, "*_x" while indexing (in XMLs)
>> : doesn't mean anything, right? It's only used in dynamicFields while
>> : defining schema to let Solr know that we would have some undeclared
>> fields
>>
>> use of wildcards in copyField is not contstrained to only
>> using dynamicFields, this would be a perfectly valid copyField using
>> wildcards, even if these are the only fields in the schema, and it had
>> no dynamicFields at all...
>>
>>   <field name="title_en" .../>
>>   <field name="title_fr" .../>
>>   <field name="title_tr" .../>
>>
>>   <field name="text_en" .../>
>>   <field name="text_fr" .../>
>>   <field name="text_tr" .../>
>>
>>   <copyField source="title_*" dest="text_*" />
>>
>> : having names like this. Also, according to the documentation, we can
>> have
>> : dest="*_x" when source="*_x" if I'm right. In this case, there's support
>> : for multiple destinations when there are multiple source.
>>
>> correct.  there is support for copying from one field to another
>> via a *MAPPING* -- so a single copyField declaration can go from multiple
>> sources to multiple destiations, but using a wildcard in the dest
>> only woks with a one-to-one mapping when the wildcard also exists in the
>> source.
>>
>> on the flip side however, you have have a many-to-one mapping by using a
>> wildcard *only* in the source....
>>
>>   <field name="title_en" .../>
>>   <field name="title_fr" .../>
>>   <field name="title_tr" .../>
>>
>>   <field name="text" .../>
>>
>>   <copyField source="title_*" dest="text" />
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.

Re: copyField - why source should contain * when dest contains *?

Posted by Paras Lehana <pa...@indiamart.com>.
Hey Chris,

Awesome explanation.

...then solr has no idea what full field name to use as the destination
> when it seees values in a field "foo" ... should it be "1_bar" ?
> "aaa_bar" ? ... "zzzzzzzzzzzzzzzzzzzzzzzzz_bar" ? all three?


But how does Solr get the idea what full field name to use as the
destination when we provide wildcard in source as well? Seems I'm missing
something.


but using a wildcard in the dest only woks with a one-to-one mapping


So, I think, this restriction could be more related to the source code flow
instead of a logical reason. I'll try to understand the code about this.

I was actually curious if there's any logical restriction that I had been
missing.

Many thanks. :)

On Thu, 24 Oct 2019 at 03:47, Chris Hostetter <ho...@fucit.org>
wrote:

>
> : Documentation says that we can copy multiple fields using wildcard to one
> : or more than one fields.
>
> correct ... the limitation is in the syntax and the ambiguity that would
> be unresolvable if you had a wildcard in the dest but not in the source.
>
> the wildcard is essentially a variable.  if you have...
>
>    source="foo" desc="*_bar"
>
> ...then solr has no idea what full field name to use as the destination
> when it seees values in a field "foo" ... should it be "1_bar" ?
> "aaa_bar" ? ... "zzzzzzzzzzzzzzzzzzzzzzzzz_bar" ? all three?
>
> : Yes, that's what hit me initially. But, "*_x" while indexing (in XMLs)
> : doesn't mean anything, right? It's only used in dynamicFields while
> : defining schema to let Solr know that we would have some undeclared
> fields
>
> use of wildcards in copyField is not contstrained to only
> using dynamicFields, this would be a perfectly valid copyField using
> wildcards, even if these are the only fields in the schema, and it had
> no dynamicFields at all...
>
>   <field name="title_en" .../>
>   <field name="title_fr" .../>
>   <field name="title_tr" .../>
>
>   <field name="text_en" .../>
>   <field name="text_fr" .../>
>   <field name="text_tr" .../>
>
>   <copyField source="title_*" dest="text_*" />
>
> : having names like this. Also, according to the documentation, we can have
> : dest="*_x" when source="*_x" if I'm right. In this case, there's support
> : for multiple destinations when there are multiple source.
>
> correct.  there is support for copying from one field to another
> via a *MAPPING* -- so a single copyField declaration can go from multiple
> sources to multiple destiations, but using a wildcard in the dest
> only woks with a one-to-one mapping when the wildcard also exists in the
> source.
>
> on the flip side however, you have have a many-to-one mapping by using a
> wildcard *only* in the source....
>
>   <field name="title_en" .../>
>   <field name="title_fr" .../>
>   <field name="title_tr" .../>
>
>   <field name="text" .../>
>
>   <copyField source="title_*" dest="text" />
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.

Re: copyField - why source should contain * when dest contains *?

Posted by Chris Hostetter <ho...@fucit.org>.
: Documentation says that we can copy multiple fields using wildcard to one
: or more than one fields.

correct ... the limitation is in the syntax and the ambiguity that would 
be unresolvable if you had a wildcard in the dest but not in the source.  

the wildcard is essentially a variable.  if you have...

   source="foo" desc="*_bar"

...then solr has no idea what full field name to use as the destination 
when it seees values in a field "foo" ... should it be "1_bar" ? 
"aaa_bar" ? ... "zzzzzzzzzzzzzzzzzzzzzzzzz_bar" ? all three?

: Yes, that's what hit me initially. But, "*_x" while indexing (in XMLs)
: doesn't mean anything, right? It's only used in dynamicFields while
: defining schema to let Solr know that we would have some undeclared fields

use of wildcards in copyField is not contstrained to only 
using dynamicFields, this would be a perfectly valid copyField using 
wildcards, even if these are the only fields in the schema, and it had 
no dynamicFields at all...

  <field name="title_en" .../>
  <field name="title_fr" .../>
  <field name="title_tr" .../>

  <field name="text_en" .../>
  <field name="text_fr" .../>
  <field name="text_tr" .../>  

  <copyField source="title_*" dest="text_*" />

: having names like this. Also, according to the documentation, we can have
: dest="*_x" when source="*_x" if I'm right. In this case, there's support
: for multiple destinations when there are multiple source.

correct.  there is support for copying from one field to another 
via a *MAPPING* -- so a single copyField declaration can go from multiple 
sources to multiple destiations, but using a wildcard in the dest
only woks with a one-to-one mapping when the wildcard also exists in the 
source.

on the flip side however, you have have a many-to-one mapping by using a 
wildcard *only* in the source....

  <field name="title_en" .../>
  <field name="title_fr" .../>
  <field name="title_tr" .../>

  <field name="text" .../>

  <copyField source="title_*" dest="text" />



-Hoss
http://www.lucidworks.com/

Re: copyField - why source should contain * when dest contains *?

Posted by Paras Lehana <pa...@indiamart.com>.
Hey Erick,

Thanks for addressing.

Copyfields are intended to copy exactly one field in the input into exactly
> one field in the destination, not multiple ones at the same time.


Documentation says that we can copy multiple fields using wildcard to one
or more than one fields.



Remember that Solr is also dealing with dynamic fields. In this case, what
> does “*_x” mean?


Yes, that's what hit me initially. But, "*_x" while indexing (in XMLs)
doesn't mean anything, right? It's only used in dynamicFields while
defining schema to let Solr know that we would have some undeclared fields
having names like this. Also, according to the documentation, we can have
dest="*_x" when source="*_x" if I'm right. In this case, there's support
for multiple destinations when there are multiple source.



 Or is this mostly curiosity?


I'm just curious what exactly restricts multiple destination and single
source.



 And what use-case do you want to solve?


Yes, it does seem not too practical. Maybe impossibility of chaining
copyFields is the reason here. I'm just curious about the implementation -
there should be a catch.


Anyways, thanks for replying, Erick. :)

On Wed, 23 Oct 2019 at 17:41, Erick Erickson <er...@gmail.com>
wrote:

> So how would that work? Copyfields are intended to copy exactly one field
> in the input into exactly one field in the destination, not multiple ones
> at the same time. If you need to do that, define multiple copyField
> directives.
>
> I don’t even see how that would work. <copyField src=“whatever”
> dest=“*_x”/>. Remember that Solr is also dealing with dynamic fields. In
> this case, what does “*_x” mean? Create N new fields?
>
> And what use-case do you want to solve? Or is this mostly curiosity?
>
> Best,
> Erick
>
> > On Oct 23, 2019, at 7:55 AM, Paras Lehana <pa...@indiamart.com>
> wrote:
> >
> > Can't we have one source field
> > information that is copied into different fields
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.

Re: copyField - why source should contain * when dest contains *?

Posted by Erick Erickson <er...@gmail.com>.
So how would that work? Copyfields are intended to copy exactly one field in the input into exactly one field in the destination, not multiple ones at the same time. If you need to do that, define multiple copyField directives.

I don’t even see how that would work. <copyField src=“whatever” dest=“*_x”/>. Remember that Solr is also dealing with dynamic fields. In this case, what does “*_x” mean? Create N new fields?

And what use-case do you want to solve? Or is this mostly curiosity?

Best,
Erick

> On Oct 23, 2019, at 7:55 AM, Paras Lehana <pa...@indiamart.com> wrote:
> 
> Can't we have one source field
> information that is copied into different fields