You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wesley Small <We...@mtvstaff.com> on 2009/03/31 17:50:27 UTC
DIH; Hardcode field value/replacement based on source column
I am trying to find a clean way to *hardcode* a field/column to a specific
value during the DIH process. It does seems to be possible but I am getting
an slightly invalid constant value in my index.
<field column="content_type_s" sourceColName="title_t" regex="(.*)"
replaceWith="Video" />
However, the value in the index was set to "VideoVideo" for all documents.
Any idea why this DIH instruction would see constant value appear twice??
Thanks,
Wesley.
Re: DIH; Hardcode field value/replacement based on source column
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
use TemplateTransformer
<field column="content_type_s" template="Video" />
On Tue, Mar 31, 2009 at 9:20 PM, Wesley Small <We...@mtvstaff.com> wrote:
> I am trying to find a clean way to *hardcode* a field/column to a specific
> value during the DIH process. It does seems to be possible but I am getting
> an slightly invalid constant value in my index.
>
> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
> replaceWith="Video" />
>
> However, the value in the index was set to "VideoVideo" for all documents.
>
> Any idea why this DIH instruction would see constant value appear twice??
>
> Thanks,
> Wesley.
>
>
>
--
--Noble Paul
Re: DIH; Hardcode field value/replacement based on source column
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Mar 31, 2009 at 9:20 PM, Wesley Small <We...@mtvstaff.com>wrote:
> I am trying to find a clean way to *hardcode* a field/column to a specific
> value during the DIH process. It does seems to be possible but I am
> getting
> an slightly invalid constant value in my index.
>
> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
> replaceWith="Video" />
>
> However, the value in the index was set to "VideoVideo" for all documents.
>
> Any idea why this DIH instruction would see constant value appear twice??
>
Not sure. Let me run a test and get back.
However, this is a job for super....ahem, TemplateTransformer!
http://wiki.apache.org/solr/DataImportHandlerFaq
--
Regards,
Shalin Shekhar Mangar.
Re: DIH; Hardcode field value/replacement based on source column
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
iterestingly
<field column="id" regex="^(.*)$" replaceWith="$1#${x.imgvurl}"
sourceColName="fileAbsolutePath"/>
seems to work
On Wed, Apr 1, 2009 at 12:13 AM, Fergus McMenemie <fe...@twig.me.uk> wrote:
> Hmmm, I am sure I have seen this as well!
>
> <field column="id" regex="(.*)" replaceWith="$1#${x.imgvurl}" sourceColName="fileAbsolutePath"/>
>
> I get the #${x.imgvurl} added twice.
>
> Fergus.
>
>>On 3/31/09 11:50 AM, "Wesley Small" <We...@mtvstaff.com> wrote:
>>
>>> I am trying to find a clean way to *hardcode* a field/column to a specific
>>> value during the DIH process. It does seems to be possible but I am getting
>>> an slightly invalid constant value in my index.
>>>
>>> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
>>> replaceWith="Video" />
>>>
>>> However, the value in the index was set to "VideoVideo" for all documents.
>>>
>>> Any idea why this DIH instruction would see constant value appear twice??
>>>
>
> --
>
> ===============================================================
> Fergus McMenemie Email:fergus@twig.me.uk
> Techmore Ltd Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets Analyst Programmer
> ===============================================================
>
--
--Noble Paul
Re: DIH; Hardcode field value/replacement based on source column
Posted by Fergus McMenemie <fe...@twig.me.uk>.
>: Indeed. I wrote the following test:
>:
>: Pattern p = Pattern.compile("(.*)");
>: Matcher m = p.matcher("xyz");
>: Assert.assertEquals("", "Video", m.replaceAll("Video"));
>:
>: The test fails. It gives "VideoVideo" as the actual result. I guess there is
>: something about Matcher.replaceAll that I don't know. Off to read the
>: javadocs then.
>
>".*" matches the empty string (for that matter any regex clause with the
>"*" modifier applied matches the empty string), and iterating over pattern
>matches (ie: what happens if you call Matcher.find() or
>Matcher.replaceAll()) always advances to "first character not matched by
>[the previous] match." (ie: let prev = m.end(); if (m.find) then prev <=
>m.start()).
>
>So ".*" always matches twice on any given String x ... once when it
>matches from 0 to x.length()-1, and one when it matches the empty string
>starting and ending at x.length()-1.
>
>That's why using "^.*" doesn't have this problem ... "*" is greedy so it
>only matches once at the start of the string and then there can't be any
>more matches. Conversly: ".*$" and ".*\z" will still have this problem,
>because any number of matches can have the same ending offset.
>
>
>-Hoss
Hmmm, given the chance perl behaves the same. Although attempting
to use /*/ fails. Another lesson learnt!
#! /usr/local/bin/perl
use strict;
my($s)="cat mat rat hat";
my($c)=0;
print " a-match", ++$c, "='$1'\n" while( $s =~ m/(at)/g );
$c=0;
print " b-match", ++$c, "='$1'\n" while( $s =~ m/(.*)/g );
$c=0;
print " c-match", ++$c, "='$1'\n" while( $s =~ m/^(.*)/g );
$c=0;
print " d-match", ++$c, "='$1'\n" while( $s =~ m/(.*)$/g );
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================
Re: DIH; Hardcode field value/replacement based on source column
Posted by Chris Hostetter <ho...@fucit.org>.
: Indeed. I wrote the following test:
:
: Pattern p = Pattern.compile("(.*)");
: Matcher m = p.matcher("xyz");
: Assert.assertEquals("", "Video", m.replaceAll("Video"));
:
: The test fails. It gives "VideoVideo" as the actual result. I guess there is
: something about Matcher.replaceAll that I don't know. Off to read the
: javadocs then.
".*" matches the empty string (for that matter any regex clause with the
"*" modifier applied matches the empty string), and iterating over pattern
matches (ie: what happens if you call Matcher.find() or
Matcher.replaceAll()) always advances to "first character not matched by
[the previous] match." (ie: let prev = m.end(); if (m.find) then prev <=
m.start()).
So ".*" always matches twice on any given String x ... once when it
matches from 0 to x.length()-1, and one when it matches the empty string
starting and ending at x.length()-1.
That's why using "^.*" doesn't have this problem ... "*" is greedy so it
only matches once at the start of the string and then there can't be any
more matches. Conversly: ".*$" and ".*\z" will still have this problem,
because any number of matches can have the same ending offset.
-Hoss
Re: DIH; Hardcode field value/replacement based on source column
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Apr 1, 2009 at 12:13 AM, Fergus McMenemie <fe...@twig.me.uk> wrote:
> Hmmm, I am sure I have seen this as well!
>
> <field column="id" regex="(.*)" replaceWith="$1#${x.imgvurl}"
> sourceColName="fileAbsolutePath"/>
>
> I get the #${x.imgvurl} added twice.
>
Indeed. I wrote the following test:
Pattern p = Pattern.compile("(.*)");
Matcher m = p.matcher("xyz");
Assert.assertEquals("", "Video", m.replaceAll("Video"));
The test fails. It gives "VideoVideo" as the actual result. I guess there is
something about Matcher.replaceAll that I don't know. Off to read the
javadocs then.
--
Regards,
Shalin Shekhar Mangar.
Re: DIH; Hardcode field value/replacement based on source column
Posted by Fergus McMenemie <fe...@twig.me.uk>.
Hmmm, I am sure I have seen this as well!
<field column="id" regex="(.*)" replaceWith="$1#${x.imgvurl}" sourceColName="fileAbsolutePath"/>
I get the #${x.imgvurl} added twice.
Fergus.
>On 3/31/09 11:50 AM, "Wesley Small" <We...@mtvstaff.com> wrote:
>
>> I am trying to find a clean way to *hardcode* a field/column to a specific
>> value during the DIH process. It does seems to be possible but I am getting
>> an slightly invalid constant value in my index.
>>
>> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
>> replaceWith="Video" />
>>
>> However, the value in the index was set to "VideoVideo" for all documents.
>>
>> Any idea why this DIH instruction would see constant value appear twice??
>>
--
===============================================================
Fergus McMenemie Email:fergus@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================
Re: DIH; Hardcode field value/replacement based on source column
Posted by Vernon Chapman <ch...@gmail.com>.
Wesley,
I'm not sure but if what you want is to have Video in the field I would do
something in the sql query.
I use postgres and I needed all my calendar event items to have "event" in
the field named content. So I used the following as my sql query:
select *,'event' as content from events
That way each row has a "content column" with the value "event" for all
records and the rest is taken care of by the mapping i.e.
<field column="content" name="content"/>
Hope that helps
Vernon
On 3/31/09 11:50 AM, "Wesley Small" <We...@mtvstaff.com> wrote:
> I am trying to find a clean way to *hardcode* a field/column to a specific
> value during the DIH process. It does seems to be possible but I am getting
> an slightly invalid constant value in my index.
>
> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
> replaceWith="Video" />
>
> However, the value in the index was set to "VideoVideo" for all documents.
>
> Any idea why this DIH instruction would see constant value appear twice??
>
> Thanks,
> Wesley.
>
>