You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Ryan McKinley <ry...@squid-labs.com> on 2007/05/03 01:10:53 UTC
dynamic copyFields
I'm looking for a way to copy from a dynamic field to another dynamic field.
I found this post from september:
http://www.nabble.com/copyField-to-a-dynamic-field-tf2300115.html#a6419101
Essentially, I have:
<field name="tag_*" type="string" ... />
<field name="text_*" type="text" ... />
and want:
<copyField source="tag_(.*)" dest="text_\1" />
Any thoughts about how to implement this?
Matching the pattern looks straight forward and would not adversely
affect the speed for anything that does not use patterns, but generating
a dynamic field would require changing the final targetField variable in
IndexSchema.DynamicCopy to a function.
There is a comment that says: (line 655)
// Instead of storing a type, this could be implemented as a hierarchy
// with a virtual matches().
// Given how often a search will be done, however, speed is the overriding
// concern and I'm not sure which is faster.
Any lasting concerns?
thanks
ryan
Re: dynamic copyFields
Posted by Ryan McKinley <ry...@gmail.com>.
Chris Hostetter wrote:
> : Syntax aside, the major implication is that DynamicCopy would need a
> : virtual function:
> : SchemaField getTargetField()
>
> I don't think i've ever looked at DynamicField before today ... but i see
> what you're talking about, you mean that "final SchemaField targetField"
> would need to be replaced with "SchemaField getTargetField(String
> sourceField)" right?
>
exactly.
> yeah that seems simple enough, i'm not sure what Yonik ment by this
> comment...
>
> // Instead of storing a type, this could be implemented as a hierarchy
> // with a virtual matches().
> // Given how often a search will be done, however, speed is the overriding
> // concern and I'm not sure which is faster.
>
> ... i don't see how this ever comes into play with search.
>
I don't either... I think it only happens at indexing. ResponseWriters
do not know (or care) if a field is from a copy field or not.
> on the issue of syntax and regex vs glob, i would leave it as a glob for
> now since that's already supported by the syntax and the impl ...
agreed.
> if we want to support regexes that should be done seperately in
> DynamicReplacement where it can be leveraged by both <copyField> and
> <dynamicField>
>
glob is fine for what i need.
Thanks for the feedback, i'll post something on JIRA soon.
ryan
Re: dynamic copyFields
Posted by Yonik Seeley <yo...@apache.org>.
On 5/4/07, Chris Hostetter <ho...@fucit.org> wrote:
> yeah that seems simple enough, i'm not sure what Yonik ment by this
> comment...
>
> // Instead of storing a type, this could be implemented as a hierarchy
> // with a virtual matches().
> // Given how often a search will be done, however, speed is the overriding
> // concern and I'm not sure which is faster.
>
> ... i don't see how this ever comes into play with search.
DynamicField lookup (matching) needs to be done almost everywhere if a
field name doesn't match a non-dynamic field. That includes parsing
queries (to get the analyzer), and writing responses (the field type
needs to be known).
The comment should really be next to the matches() method. It was on
the class containing it in the past, but a refactor made to support
dynamic copyField moved it even further away.
-Yonik
Re: dynamic copyFields
Posted by Chris Hostetter <ho...@fucit.org>.
: Syntax aside, the major implication is that DynamicCopy would need a
: virtual function:
: SchemaField getTargetField()
I don't think i've ever looked at DynamicField before today ... but i see
what you're talking about, you mean that "final SchemaField targetField"
would need to be replaced with "SchemaField getTargetField(String
sourceField)" right?
yeah that seems simple enough, i'm not sure what Yonik ment by this
comment...
// Instead of storing a type, this could be implemented as a hierarchy
// with a virtual matches().
// Given how often a search will be done, however, speed is the overriding
// concern and I'm not sure which is faster.
... i don't see how this ever comes into play with search.
on the issue of syntax and regex vs glob, i would leave it as a glob for
now since that's already supported by the syntax and the impl ... if we
want to support regexes that should be done seperately in
DynamicReplacement where it can be leveraged by both <copyField> and
<dynamicField>
-Hoss
Re: dynamic copyFields
Posted by Ryan McKinley <ry...@gmail.com>.
> perhaps
>
> <copyField re_source="(.*)_s" dest="\1_t"/>
>
how about:
<copyField source="tag_(.*)" dest="text_\1" regex="true" />
useRegex="true" ?
Syntax aside, the major implication is that DynamicCopy would need a
virtual function:
SchemaField getTargetField()
rather then direct access to a final SchemaField. I don't have any real
sense if that is a big deal or not, but it seems ok to me ;)
ryan
Re: dynamic copyFields
Posted by Walter Underwood <wu...@netflix.com>.
That syntax is from the "ed" editor. I learned it in 1975
on Unix v6/PWB, running on a PDP-11/70. --wunder
On 5/2/07 5:04 PM, "Mike Klaas" <mi...@gmail.com> wrote:
> On 5/2/07, Ryan McKinley <ry...@gmail.com> wrote:
>
>> How about Mike's other suggestion:
>> <copyField regexp="s/(.*)_s/\1_t/" />
>>
>> this would keep the glob style for "source" and "dest", but use "regex"
>> to transform a sorce -> dest
>
> Wow, I didn't even remember suggesting that. I agree (with Hoss) that
> backward compatibility is important, but I disagree (with myself) that
> the above syntax is nice. Outside of perl, I'm not sure how common
> the s/ / / syntax is (is it used in java?)
>
> perhaps
>
> <copyField re_source="(.*)_s" dest="\1_t"/>
>
> ?
>
> -Mike
Re: dynamic copyFields
Posted by Mike Klaas <mi...@gmail.com>.
On 5/2/07, Ryan McKinley <ry...@gmail.com> wrote:
> How about Mike's other suggestion:
> <copyField regexp="s/(.*)_s/\1_t/" />
>
> this would keep the glob style for "source" and "dest", but use "regex"
> to transform a sorce -> dest
Wow, I didn't even remember suggesting that. I agree (with Hoss) that
backward compatibility is important, but I disagree (with myself) that
the above syntax is nice. Outside of perl, I'm not sure how common
the s/ / / syntax is (is it used in java?)
perhaps
<copyField re_source="(.*)_s" dest="\1_t"/>
?
-Mike
Re: dynamic copyFields
Posted by Ryan McKinley <ry...@gmail.com>.
Chris Hostetter wrote:
> : Essentially, I have:
> : <field name="tag_*" type="string" ... />
> : <field name="text_*" type="text" ... />
> :
> : and want:
> : <copyField source="tag_(.*)" dest="text_\1" />
>
> i haven't thought about the underlying impl at all, but from an
> API/configuration standpoint one tough issue is that fact that dynamic
> fields and the "source" of copyField have always been based on glob style
> expressions, switching to regexes to support matching semantics would be
> tricky to do while remaining backwards compatible.
>
How about Mike's other suggestion:
<copyField regexp="s/(.*)_s/\1_t/" />
this would keep the glob style for "source" and "dest", but use "regex"
to transform a sorce -> dest
Re: dynamic copyFields
Posted by Chris Hostetter <ho...@fucit.org>.
: Essentially, I have:
: <field name="tag_*" type="string" ... />
: <field name="text_*" type="text" ... />
:
: and want:
: <copyField source="tag_(.*)" dest="text_\1" />
i haven't thought about the underlying impl at all, but from an
API/configuration standpoint one tough issue is that fact that dynamic
fields and the "source" of copyField have always been based on glob style
expressions, switching to regexes to support matching semantics would be
tricky to do while remaining backwards compatible.
-Hoss
Re: dynamic copyFields
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 2, 2007, at 7:10 PM, Ryan McKinley wrote:
> and want:
> <copyField source="tag_(.*)" dest="text_\1" />
Why even bother with regexs at all?
<copyField source="tag_*" dest="text_*" />
simply replace the * match in the source in the * position in the
dest. Granted it doesn't have the power of regex to morph things
across, but maybe a simple glob/replace is all that is needed?
Erik