You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Ryan McKinley <ry...@squid-labs.com> on 2007/05/03 01:10:53 UTC

dynamic copyFields

I'm looking for a way to copy from a dynamic field to another dynamic field.

I found this post from september:
http://www.nabble.com/copyField-to-a-dynamic-field-tf2300115.html#a6419101

Essentially, I have:
  <field name="tag_*"   type="string" ... />
  <field name="text_*"  type="text"   ... />

and want:
  <copyField source="tag_(.*)" dest="text_\1" />

Any thoughts about how to implement this?

Matching the pattern looks straight forward and would not adversely 
affect the speed for anything that does not use patterns, but generating 
a dynamic field would require changing the final targetField variable in 
IndexSchema.DynamicCopy to a function.

There is a comment that says:  (line 655)
// Instead of storing a type, this could be implemented as a hierarchy
// with a virtual matches().
// Given how often a search will be done, however, speed is the overriding
// concern and I'm not sure which is faster.

Any lasting concerns?


thanks
ryan






Re: dynamic copyFields

Posted by Ryan McKinley <ry...@gmail.com>.
Chris Hostetter wrote:
> : Syntax aside, the major implication is that DynamicCopy would need a
> : virtual function:
> :   SchemaField getTargetField()
> 
> I don't think i've ever looked at DynamicField before today ... but i see
> what you're talking about, you mean that "final SchemaField targetField"
> would need to be replaced with "SchemaField getTargetField(String
> sourceField)" right?
> 

exactly.


> yeah that seems simple enough, i'm not sure what Yonik ment by this
> comment...
> 
>   // Instead of storing a type, this could be implemented as a hierarchy
>   // with a virtual matches().
>   // Given how often a search will be done, however, speed is the overriding
>   // concern and I'm not sure which is faster.
> 
> ... i don't see how this ever comes into play with search.
> 

I don't either... I think it only happens at indexing.  ResponseWriters 
do not know (or care) if a field is from a copy field or not.


> on the issue of syntax and regex vs glob, i would leave it as a glob for
> now since that's already supported by the syntax and the impl ... 

agreed.


> if we want to support regexes that should be done seperately in
> DynamicReplacement where it can be leveraged by both <copyField> and
> <dynamicField>
> 

glob is fine for what i need.


Thanks for the feedback, i'll post something on JIRA soon.

ryan

Re: dynamic copyFields

Posted by Yonik Seeley <yo...@apache.org>.
On 5/4/07, Chris Hostetter <ho...@fucit.org> wrote:
> yeah that seems simple enough, i'm not sure what Yonik ment by this
> comment...
>
>   // Instead of storing a type, this could be implemented as a hierarchy
>   // with a virtual matches().
>   // Given how often a search will be done, however, speed is the overriding
>   // concern and I'm not sure which is faster.
>
> ... i don't see how this ever comes into play with search.

DynamicField lookup (matching) needs to be done almost everywhere if a
field name doesn't match a non-dynamic field.  That includes parsing
queries (to get the analyzer), and writing responses (the field type
needs to be known).

The comment should really be next to the matches() method.  It was on
the class containing it in the past, but a refactor made to support
dynamic copyField moved it even further away.

-Yonik

Re: dynamic copyFields

Posted by Chris Hostetter <ho...@fucit.org>.
: Syntax aside, the major implication is that DynamicCopy would need a
: virtual function:
:   SchemaField getTargetField()

I don't think i've ever looked at DynamicField before today ... but i see
what you're talking about, you mean that "final SchemaField targetField"
would need to be replaced with "SchemaField getTargetField(String
sourceField)" right?

yeah that seems simple enough, i'm not sure what Yonik ment by this
comment...

  // Instead of storing a type, this could be implemented as a hierarchy
  // with a virtual matches().
  // Given how often a search will be done, however, speed is the overriding
  // concern and I'm not sure which is faster.

... i don't see how this ever comes into play with search.

on the issue of syntax and regex vs glob, i would leave it as a glob for
now since that's already supported by the syntax and the impl ... if we
want to support regexes that should be done seperately in
DynamicReplacement where it can be leveraged by both <copyField> and
<dynamicField>



-Hoss


Re: dynamic copyFields

Posted by Ryan McKinley <ry...@gmail.com>.
> perhaps
> 
> <copyField re_source="(.*)_s" dest="\1_t"/>
> 

how about:
<copyField source="tag_(.*)" dest="text_\1" regex="true" />

useRegex="true" ?

Syntax aside, the major implication is that DynamicCopy would need a 
virtual function:
  SchemaField getTargetField()

rather then direct access to a final SchemaField.  I don't have any real 
sense if that is a big deal or not, but it seems ok to me ;)


ryan










Re: dynamic copyFields

Posted by Walter Underwood <wu...@netflix.com>.
That syntax is from the "ed" editor. I learned it in 1975
on Unix v6/PWB, running on a PDP-11/70. --wunder

On 5/2/07 5:04 PM, "Mike Klaas" <mi...@gmail.com> wrote:

> On 5/2/07, Ryan McKinley <ry...@gmail.com> wrote:
> 
>> How about Mike's other suggestion:
>>   <copyField regexp="s/(.*)_s/\1_t/" />
>> 
>> this would keep the glob style for "source" and "dest", but use "regex"
>> to transform a sorce -> dest
> 
> Wow, I didn't even remember suggesting that.  I agree (with Hoss) that
> backward compatibility is important, but I disagree (with myself) that
> the above syntax is nice.  Outside of perl, I'm not sure how common
> the s/ / / syntax is (is it used in java?)
> 
> perhaps
> 
> <copyField re_source="(.*)_s" dest="\1_t"/>
> 
> ?
> 
> -Mike


Re: dynamic copyFields

Posted by Mike Klaas <mi...@gmail.com>.
On 5/2/07, Ryan McKinley <ry...@gmail.com> wrote:

> How about Mike's other suggestion:
>   <copyField regexp="s/(.*)_s/\1_t/" />
>
> this would keep the glob style for "source" and "dest", but use "regex"
> to transform a sorce -> dest

Wow, I didn't even remember suggesting that.  I agree (with Hoss) that
backward compatibility is important, but I disagree (with myself) that
the above syntax is nice.  Outside of perl, I'm not sure how common
the s/ / / syntax is (is it used in java?)

perhaps

<copyField re_source="(.*)_s" dest="\1_t"/>

?

-Mike

Re: dynamic copyFields

Posted by Ryan McKinley <ry...@gmail.com>.
Chris Hostetter wrote:
> : Essentially, I have:
> :   <field name="tag_*"   type="string" ... />
> :   <field name="text_*"  type="text"   ... />
> :
> : and want:
> :   <copyField source="tag_(.*)" dest="text_\1" />
> 
> i haven't thought about the underlying impl at all, but from an
> API/configuration standpoint one tough issue is that fact that dynamic
> fields and the "source" of copyField have always been based on glob style
> expressions, switching to regexes to support matching semantics would be
> tricky to do while remaining backwards compatible.
> 

How about Mike's other suggestion:
  <copyField regexp="s/(.*)_s/\1_t/" />

this would keep the glob style for "source" and "dest", but use "regex" 
to transform a sorce -> dest


Re: dynamic copyFields

Posted by Chris Hostetter <ho...@fucit.org>.
: Essentially, I have:
:   <field name="tag_*"   type="string" ... />
:   <field name="text_*"  type="text"   ... />
:
: and want:
:   <copyField source="tag_(.*)" dest="text_\1" />

i haven't thought about the underlying impl at all, but from an
API/configuration standpoint one tough issue is that fact that dynamic
fields and the "source" of copyField have always been based on glob style
expressions, switching to regexes to support matching semantics would be
tricky to do while remaining backwards compatible.


-Hoss


Re: dynamic copyFields

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 2, 2007, at 7:10 PM, Ryan McKinley wrote:
> and want:
>  <copyField source="tag_(.*)" dest="text_\1" />

Why even bother with regexs at all?

	<copyField source="tag_*" dest="text_*" />

simply replace the * match in the source in the * position in the  
dest.  Granted it doesn't have the power of regex to morph things  
across, but maybe a simple glob/replace is all that is needed?

	Erik